MemFlow: Optical Flow Estimation and Prediction with Memory

CVPR 2024

Fudan University

EPE on Sintel (clean) vs. inference time (ms) and model size (M). MemFlow(-T) (x it) indicates running our network with only x iterations of GRU.

Comparison with Previous SOTA

Please zoom in for better visualization.

Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.

Super Efficient/Effective though with Same Architecture

Our method outperforms 15-iteration SKFlow’s performance, after using only 2 iterations.

Beyond Optical Flow Estimation: Future Prediction

Repurposing MemFlow for optical flow future prediction. Following videos show the qualitative results of future prediction (one time step ahead). From left to right are: Predicted optical flow into next frame superimposed on the video frame, Synthesized next video frame based on our predicted flow, and Groundtruth next frame.



Optical flow is a classical task that is important to the vision community. Classical optical flow estimation uses two frames as input, whilst some recent methods consider multiple frames to explicitly model long-range information. The former ones limit their ability to fully leverage temporal coherence along the video sequence; and the latter ones incur heavy computational overhead, typically not possible for real-time flow estimation. Some multi-frame-based approaches even necessitate unseen future frames for current estimation, compromising real-time applicability in safety-critical scenarios. To this end, we present MemFlow, a real-time method for optical flow estimation and prediction with memory. Our method enables memory read-out and update modules for aggregating historical motion information in real-time. Furthermore, we integrate resolution-adaptive re-scaling to accommodate diverse video resolutions. Besides, our approach seamlessly extends to the future prediction of optical flow based on past observations. Leveraging effective historical motion aggregation, our method outperforms VideoFlow with fewer parameters and faster inference speed on Sintel and KITTI-15 datasets in terms of generalization performance. At the time of submission, MemFlow also leads in performance on the 1080p Spring dataset.

Superior Generalization Performance

Generalization performance of optical flow estimation on Sintel and KITTI-15 after trained on FlyingChairs and FlyingThings3D.

Comparable with SOTA on Standard Benchmark

Optical flow finetuning evaluation on the public benchmark.

Test on HD Video Benchmark (Spring)

Optical flow generalization and finetuning results on 1080p Spring dataset.

Results of Future Prediction

Left: EPE of flow prediction on FlyingThings3D (Final), Sintel (Final), and KITTI-15. Right: Comparison of next frame prediction on KITTI test set (256x832). Note that our method is not trained for video prediction specifically.


MemFlow maintains a memory buffer to store historical motion states of video, together with an efficient update and read-out process that retrieves useful motion information for the current frame’s optical flow estimation.


      title={MemFlow: Optical Flow Estimation and Prediction with Memory},
      author={Dong, Qiaole and Fu, Yanwei},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},