ENHANCING ROBOT PERCEPTION USING OPTICAL FLOW
rit · 2017
Pratik Sajnani — Advisor: Prof. Zack Butler
src_origin: Rochester Institute of Technology // year: 2017
Robot perception is a key part of the decision making process of agents that work inside a mechanical system to determine how a robot responds to stimulus from a real-time dynamic environment. This paper discusses the plausibility of sparse and dense optical flow estimation techniques. We employ the Lucas-Kanade algorithm on high-resolution images to establish a common baseline for comparison. The paper also discusses dense optical flow using Farneback's algorithm and the optimization strategy therein.
Robot perception is a widely researched topic in computer vision and robot navigation. This paper is a survey into two popular ways of using optical flow estimation to perceive changes in the linear subspace generated by image frames. The paper consists of 5 sections. The first section discusses the motivation of the problem. The second section is a general introduction to optical flow including the basic idea, known constraints, and the Horn-Schunck method. The third section introduces the Lucas-Kanade method. The fifth section discusses implementation details.
Robot perception is the way a robot understands its surroundings. A mobile robot has the ability to move around and needs to understand its own motion as well as predict the motion of objects it may encounter. For safety, finding the safest path is important. This study focuses on utilizing tracking knowledge to form an integrated real-time system.
Object tracking is the process of locating objects as they move. In computer vision, the source of information is image matrices with intensity and color values. Most algorithms work for specific contexts with constraints frequently violated in the real world.
3.1 Optical Flow
Optical flow tracks instantaneous velocities of pixels across two subsequent images. It relies on motion apparent from visual differences across subsequent images. The technique generates a linear subspace of vectors from positions in the first image directed towards estimated motion in the second.
Figure 2: The aperture problem in computer vision
3.2 Derivation of Flow Equation
Let I (first image) and J (second image) be two grayscale intensity images. Let 't' be the temporal difference. Applying a Taylor approximation to the brightness constancy assumption yields the optical flow constraint equation. The equation has two unknowns and cannot be solved directly from a single point, leading to the aperture problem.
where: Vx, Vy — horizontal and vertical velocity components ∂I/∂x — spatial gradient in x ∂I/∂y — spatial gradient in y ∂I/∂t — temporal gradient between frames
3.3 Aperture Problem
When estimating motion, the aperture size of the viewing system determines how one perceives homogeneous motion. Only the component of motion perpendicular to an edge can be recovered from local measurements. The popular barber's pole illusion is a canonical example.
Figure 3: A barber's pole
3.4 Constraints
To make the optical flow problem tractable, additional constraints are imposed:
- 1.Brightness Constancy — the intensity of a pixel remains constant between frames: I(x, y, t) = J(x + Vx·dt, y + Vy·dt, t + dt)
- 2.Spatial Smoothness — nearby pixels have similar velocity vectors; motion varies smoothly across the image
- 3.Temporal Smoothness — consecutive frames are captured close in time, so motion changes are small
3.5 Horn-Schunck Method
The Horn-Schunck method produces a dense flow field by assuming global smoothness. It minimizes a combined energy functional that penalizes both deviation from the brightness constancy equation and large spatial gradients in the flow field. The regularization constant \u03B1 controls the trade-off between data fidelity and smoothness.
where: u, v — horizontal and vertical flow components Ix, Iy — spatial image gradients It — temporal gradient α — regularization constant (smoothness weight)
The Lucas-Kanade method assumes constant flow within a fixed-size neighborhood window centered on each pixel. This local assumption converts the underdetermined single-point system into an overdetermined least-squares problem that can be solved directly. It is widely used because it couples feature detection with tracking, providing both accuracy and robustness.
4.1 Accuracy vs. Robustness Trade-off
Small windows ensure finer details are not missed (accuracy) but increase computational cost and sensitivity to noise. A small window is preferred for robot perception — real-time feedback is critical and coarse estimates are corrected by the pyramidal refinement. Large windows improve robustness against noise but blur boundaries between independently moving regions.
4.2 Pyramidal Scheme
To handle large inter-frame displacements, Lucas-Kanade uses an image pyramid. The input frame is progressively down-sampled across multiple levels, and flow is estimated coarse-to-fine. Each level's estimate initializes the next, refining the result iteratively.
Level 0 (original): 1920 × 1080 px Level 1: 960 × 540 px Level 2: 480 × 270 px Level 3: 240 × 135 px Level 4 (coarsest): 120 × 67 px Flow estimated at level 4, refined back up to level 0.
Figure 6: Pyramidal scheme for Lucas-Kanade
The system used a phone camera mounted on the COROBOT platform operating at 30 Hz with 1920\u00D71080 resolution. The target environment was indoor hallways containing a mix of obstacles: humans, CS students, and other autonomous machines.
5.1 Camera Ego-Motion and Bright Points
A significant challenge arose from bright points at the far ends of hallways. Even when the camera is stationary, these high-contrast regions attract the Shi-Tomasi feature detector and produce spurious flow vectors. When the camera is moving (ego-motion), these spurious detections compound with real motion signals, making it difficult to distinguish obstacles from background.
Figure 7: Image from original video file
Figure 8: Bright points problem
Figure 9: Spurious camera motion effect
Dense optical flow computes a velocity vector for every pixel rather than a sparse set of tracked features. This avoids the feature detection bias problem entirely — there is no selection step that can preferentially latch onto bright background points. Dense flow tracks moving boundaries of image regions directly.
6.1 Farneback's Algorithm
Farneback's method approximates the neighborhood of each pixel in both frames using a quadratic polynomial. The displacement that best aligns the two polynomial approximations is taken as the flow at that point. Coefficients are estimated via a weighted least-squares fit, and the algorithm iterates across a pyramid.
I(x) ≈ xᵀ A x + bᵀx + c (quadratic polynomial model) J(x) ≈ I(x - d) (second frame = first shifted by d) Solving for displacement d: d = -½ A⁻¹ (b₂ - b₁) where A, b₁, b₂ are polynomial coefficients estimated from a Gaussian-weighted least-squares fit.
6.2 Optimization Strategy
Dense flow over a full 1920\u00D71080 frame is computationally expensive at 30 Hz. The proposed optimization restricts flow computation to the border pixels of previously identified moving contour regions, rather than the entire frame.
- ·Frame 1: full dense flow + contour extraction (run once or at keyframes)
- ·Frame N: flow computed only along border pixels of the extracted contours
- ·Contour borders updated when flow vectors indicate significant shape change
- ·Interior pixels of stationary regions skipped entirely
Figure 12: Dense Flow
Figure 13: HSV visualization of dense flow
Figure 14: Edge image of dense flow
Lucas-Kanade is a well-established sparse optical flow method with excellent theoretical properties. Its pyramidal implementation handles large displacements efficiently. However, in indoor hallway environments, the Shi-Tomasi feature detector is sensitive to bright background points which confound obstacle detection under camera ego-motion.
Farneback's dense algorithm avoids feature selection entirely and tracks contour boundaries more reliably in the same environment. The proposed optimization — computing dense flow only along border pixels of moving contour regions — reduces per-frame computation substantially, making dense flow feasible for real-time robot navigation.
- ·Background subtraction conditioned on ground-plane and wall geometry to suppress static scene flow
- ·Adaptive contour border tracking with keyframe-triggered full recomputation
- ·Integration with KCF (Kernelized Correlation Filters) for confirmed obstacle state across frames
- ·Evaluation on the full COROBOT hardware loop at 30 Hz with latency profiling
14-week independent study with weekly milestones, 6\u201310 hours per week.
- ·Weeks 1–2: Literature review — optical flow fundamentals, Horn-Schunck, Lucas-Kanade
- ·Weeks 3–4: Environment setup — COROBOT platform, OpenCV, Python pipeline
- ·Weeks 5–6: L-K sparse flow implementation and baseline evaluation
- ·Weeks 7–8: Ego-motion analysis, bright-point problem characterization
- ·Weeks 9–10: Farneback dense flow implementation and qualitative comparison
- ·Weeks 11–12: Contour border optimization design and prototype
- ·Week 13: KCF vs. Lucas-Kanade head-to-head comparison
- ·Week 14: Paper writing, figures, and final presentation