ENHANCING ROBOT PERCEPTION USING OPTICAL FLOW

rit · 2017

Pratik Sajnani — Advisor: Prof. Zack Butler

src_origin: Rochester Institute of Technology // year: 2017

───Abstract

Robot perception is a key part of the decision making process of agents that work inside a mechanical system to determine how a robot responds to stimulus from a real-time dynamic environment. This paper discusses the plausibility of sparse and dense optical flow estimation techniques. We employ the Lucas-Kanade algorithm on high-resolution images to establish a common baseline for comparison. The paper also discusses dense optical flow using Farneback's algorithm and the optimization strategy therein.

───1. Overview

Robot perception is a widely researched topic in computer vision and robot navigation. This paper is a survey into two popular ways of using optical flow estimation to perceive changes in the linear subspace generated by image frames. The paper consists of 5 sections. The first section discusses the motivation of the problem. The second section is a general introduction to optical flow including the basic idea, known constraints, and the Horn-Schunck method. The third section introduces the Lucas-Kanade method. The fifth section discusses implementation details.

───2. Introduction

Robot perception is the way a robot understands its surroundings. A mobile robot has the ability to move around and needs to understand its own motion as well as predict the motion of objects it may encounter. For safety, finding the safest path is important. This study focuses on utilizing tracking knowledge to form an integrated real-time system.

───3. Tracking Objects

Object tracking is the process of locating objects as they move. In computer vision, the source of information is image matrices with intensity and color values. Most algorithms work for specific contexts with constraints frequently violated in the real world.

3.1 Optical Flow

Optical flow tracks instantaneous velocities of pixels across two subsequent images. It relies on motion apparent from visual differences across subsequent images. The technique generates a linear subspace of vectors from positions in the first image directed towards estimated motion in the second.

Figure 2: The aperture problem in computer vision

3.2 Derivation of Flow Equation

Let I (first image) and J (second image) be two grayscale intensity images. Let 't' be the temporal difference. Applying a Taylor approximation to the brightness constancy assumption yields the optical flow constraint equation. The equation has two unknowns and cannot be solved directly from a single point, leading to the aperture problem.

\frac{\partial I}{\partial x} V_x + \frac{\partial I}{\partial y} V_y + \frac{\partial I}{\partial t} = 0

where:
  Vx, Vy  — horizontal and vertical velocity components
  ∂I/∂x  — spatial gradient in x
  ∂I/∂y  — spatial gradient in y
  ∂I/∂t  — temporal gradient between frames

\epsilon(\mathbf{d}) = \epsilon(d_x, d_y) = \sum_{x=u_x-\omega_x}^{u_x+\omega_x} \sum_{y=u_y-\omega_y}^{u_y+\omega_y} (I(x,y) - J(x+d_x, y+d_y))^2

3.3 Aperture Problem

When estimating motion, the aperture size of the viewing system determines how one perceives homogeneous motion. Only the component of motion perpendicular to an edge can be recovered from local measurements. The popular barber's pole illusion is a canonical example.

Figure 3: A barber's pole

3.4 Constraints

To make the optical flow problem tractable, additional constraints are imposed:

1.Brightness Constancy — the intensity of a pixel remains constant between frames: I(x, y, t) = J(x + Vx·dt, y + Vy·dt, t + dt)
2.Spatial Smoothness — nearby pixels have similar velocity vectors; motion varies smoothly across the image
3.Temporal Smoothness — consecutive frames are captured close in time, so motion changes are small

3.5 Horn-Schunck Method

The Horn-Schunck method produces a dense flow field by assuming global smoothness. It minimizes a combined energy functional that penalizes both deviation from the brightness constancy equation and large spatial gradients in the flow field. The regularization constant \u03B1 controls the trade-off between data fidelity and smoothness.

E = \iint \left[(I_x u + I_y v + I_t)^2 + \alpha^2(\|\nabla u\|^2 + \|\nabla v\|^2)\right] \mathrm{d}x\mathrm{d}y

where:
  u, v     — horizontal and vertical flow components
  Ix, Iy   — spatial image gradients
  It       — temporal gradient
  α        — regularization constant (smoothness weight)

───4. Lucas-Kanade Algorithm

The Lucas-Kanade method assumes constant flow within a fixed-size neighborhood window centered on each pixel. This local assumption converts the underdetermined single-point system into an overdetermined least-squares problem that can be solved directly. It is widely used because it couples feature detection with tracking, providing both accuracy and robustness.

4.1 Accuracy vs. Robustness Trade-off

Small windows ensure finer details are not missed (accuracy) but increase computational cost and sensitivity to noise. A small window is preferred for robot perception — real-time feedback is critical and coarse estimates are corrected by the pyramidal refinement. Large windows improve robustness against noise but blur boundaries between independently moving regions.

f_1(\mathbf{x}) = \mathbf{x}^T \mathbf{A}_1 \mathbf{x} + \mathbf{b}_1^T \mathbf{x} + c_1

4.2 Pyramidal Scheme

To handle large inter-frame displacements, Lucas-Kanade uses an image pyramid. The input frame is progressively down-sampled across multiple levels, and flow is estimated coarse-to-fine. Each level's estimate initializes the next, refining the result iteratively.

Level 0 (original):   1920 × 1080 px
Level 1:               960 ×  540 px
Level 2:               480 ×  270 px
Level 3:               240 ×  135 px
Level 4 (coarsest):    120 ×   67 px

Flow estimated at level 4, refined back up to level 0.

Figure 6: Pyramidal scheme for Lucas-Kanade

f_2(\mathbf{x}) = \mathbf{x}^T \mathbf{A}_2 \mathbf{x} + \mathbf{b}_2^T \mathbf{x} + c_2

───5. Implementation Details

The system used a phone camera mounted on the COROBOT platform operating at 30 Hz with 1920\u00D71080 resolution. The target environment was indoor hallways containing a mix of obstacles: humans, CS students, and other autonomous machines.

5.1 Camera Ego-Motion and Bright Points

A significant challenge arose from bright points at the far ends of hallways. Even when the camera is stationary, these high-contrast regions attract the Shi-Tomasi feature detector and produce spurious flow vectors. When the camera is moving (ego-motion), these spurious detections compound with real motion signals, making it difficult to distinguish obstacles from background.

Figure 7: Image from original video file

Figure 8: Bright points problem

Figure 9: Spurious camera motion effect

───6. Dense Optical Flow Estimation

Dense optical flow computes a velocity vector for every pixel rather than a sparse set of tracked features. This avoids the feature detection bias problem entirely — there is no selection step that can preferentially latch onto bright background points. Dense flow tracks moving boundaries of image regions directly.

6.1 Farneback's Algorithm

Farneback's method approximates the neighborhood of each pixel in both frames using a quadratic polynomial. The displacement that best aligns the two polynomial approximations is taken as the flow at that point. Coefficients are estimated via a weighted least-squares fit, and the algorithm iterates across a pyramid.

I(x) ≈ xᵀ A x + bᵀx + c        (quadratic polynomial model)
J(x) ≈ I(x - d)               (second frame = first shifted by d)

Solving for displacement d:
  d = -½ A⁻¹ (b₂ - b₁)

where A, b₁, b₂ are polynomial coefficients estimated
from a Gaussian-weighted least-squares fit.

6.2 Optimization Strategy

Dense flow over a full 1920\u00D71080 frame is computationally expensive at 30 Hz. The proposed optimization restricts flow computation to the border pixels of previously identified moving contour regions, rather than the entire frame.

·Frame 1: full dense flow + contour extraction (run once or at keyframes)
·Frame N: flow computed only along border pixels of the extracted contours
·Contour borders updated when flow vectors indicate significant shape change
·Interior pixels of stationary regions skipped entirely

Figure 12: Dense Flow

Figure 13: HSV visualization of dense flow

Figure 14: Edge image of dense flow

───7. Conclusion

Lucas-Kanade is a well-established sparse optical flow method with excellent theoretical properties. Its pyramidal implementation handles large displacements efficiently. However, in indoor hallway environments, the Shi-Tomasi feature detector is sensitive to bright background points which confound obstacle detection under camera ego-motion.

Farneback's dense algorithm avoids feature selection entirely and tracks contour boundaries more reliably in the same environment. The proposed optimization — computing dense flow only along border pixels of moving contour regions — reduces per-frame computation substantially, making dense flow feasible for real-time robot navigation.

───8. Future Work

·Background subtraction conditioned on ground-plane and wall geometry to suppress static scene flow
·Adaptive contour border tracking with keyframe-triggered full recomputation
·Integration with KCF (Kernelized Correlation Filters) for confirmed obstacle state across frames
·Evaluation on the full COROBOT hardware loop at 30 Hz with latency profiling

───9. References

[REF_01]B.K.P. Horn and B.G. Schunck. Determining optical flow. Artificial Intelligence, 17:185\u2013203, 1981.

[REF_02]B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. IJCAI, 1981.

[REF_03]J.-Y. Bouguet. Pyramidal implementation of the Lucas-Kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 2001.

[REF_04]G. Farneback. Two-frame motion estimation based on polynomial expansion. SCIA, LNCS 2749, pp. 363\u2013370, 2003.

[REF_05]J. Shi and C. Tomasi. Good features to track. CVPR, pp. 593\u2013600, 1994.

[REF_06]D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE PAMI, 24(5):603\u2013619, 2002.

[REF_07]J. Henriques et al. High-speed tracking with kernelized correlation filters. IEEE PAMI, 37(3):583\u2013596, 2015.

[REF_08]OpenCV documentation. Optical flow. https://docs.opencv.org/master/d4/dee/tutorial_optical_flow.html

[REF_09]Z. Butler, P. Sajnani. Independent Study: Obstacle Detection on COROBOT. RIT, 2017.

───10. Appendix — Breakdown of Work

14-week independent study with weekly milestones, 6\u201310 hours per week.

·Weeks 1–2: Literature review — optical flow fundamentals, Horn-Schunck, Lucas-Kanade
·Weeks 3–4: Environment setup — COROBOT platform, OpenCV, Python pipeline
·Weeks 5–6: L-K sparse flow implementation and baseline evaluation
·Weeks 7–8: Ego-motion analysis, bright-point problem characterization
·Weeks 9–10: Farneback dense flow implementation and qualitative comparison
·Weeks 11–12: Contour border optimization design and prototype
·Week 13: KCF vs. Lucas-Kanade head-to-head comparison
·Week 14: Paper writing, figures, and final presentation