Impact Factor: 2.3†
The control of self-driving cars has received growing attention recently. While existing research shows promising results in vehicle control using video from a monocular dash camera, there has been very limited work on directly learning vehicle control from motion-based cues. Such cues are powerful features for visual representations, as they encode the per-pixel movement between two consecutive images, allowing a system to effectively map the features into the control signal. We propose a new framework that exploits the use of a motion-based feature known as optical flow extracted from the dash camera, and demonstrates that such a feature is effective in significantly improving the accuracy of the control signals. Our proposed framework involves two main components. The flow predictor, as a self-supervised deep network, models the underlying scene structure from consecutive frames and generates the optical flow. The controller, as a supervised multi-task deep network, predicts both steer angle and speed. We demonstrate that the proposed framework using the optical flow features can effectively predict control signals from a dash camera video. Using the Cityscapes dataset, we validate that the system prediction has errors as low as 0.0130 rad/s on steer angle and 0.0615 m/s on speed, outperforming existing research.