Rethinking Optical Flow from Geometric Matching Consistent Perspective

CVPR 2023

Fudan University

Overview of our MatchFlow. The simplified training pipeline is shown at the top, while details of each stage are specifically listed below. H,W indicate 1/8 height and width of the input image respectively. Here the GMA means global motion aggregation module.


Optical flow estimation is a challenging problem remaining unsolved. Recent deep learning based optical flow models have achieved considerable success. However, these models often train networks from the scratch on standard optical flow data, which restricts their ability to robustly and geometrically match image features. In this paper, we propose a rethinking to previous optical flow estimation. We particularly leverage Geometric Image Matching (GIM) as a pre-training task for the optical flow estimation (MatchFlow) with better feature representations, as GIM shares some common challenges as optical flow estimation, and with massive labeled real-world data. Thus, matching static scenes helps to learn more fundamental feature correlations of objects and scenes with consistent displacements. Specifically, the proposed MatchFlow model employs a QuadTree attention-based network pretrained on MegaDepth to extract coarse features for further flow regression. Extensive experiments show that our model has great cross-dataset generalization. Our method achieves 11.5% and 10.1% error reduction from GMA on Sintel clean pass and KITTI test set. At the time of anonymous submission, our MatchFlow(G) enjoys state-of-theart performance on Sintel clean and final pass compared to published approaches with comparable computation and memory footprint.

Compared with Other Methods

Quantitative comparison on standard benchmark. ‘A’ indicates the AutoFlow dataset. ‘C+T’: Succeeding training on FlyingChairs (C) and FlyingThings3D (T), we evaluate the capacity of generalization on Sintel (S) and KITTI (K) training sets. ‘C+T+S+K+H’: Training samples from T, S, K, and HD1K (H) are included in our training set for further finetuning. Results on training set are shown in parentheses. The top and second-place results are bolded and underlined, respectively. † indicates tile technique. And ⋆ indicates evaluating with RAFT’s “warm-start” strategy.

Interpolate start reference image.

Qualitative Results

Interpolate start reference image.

Qualitative comparison on Sintel test set. First two rows are from clean pass, and the last two from final pass. Notable areas are pointed out by arrows. Please zoom in for details.



        title={Rethinking Optical Flow from Geometric Matching Consistent  Perspective},
        author={Dong, Qiaole and Cao, Chenjie and Fu, Yanwei},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},