MVImgNet2.0: A Larger-scale Dataset of Multi-view Images

SIGGRAPH Asia 2024 & TOG
1FNii, CUHKSZ, 2SSE, CUHKSZ, 3Alibaba Group
#Equal contribution *Corresponding author

We introduce MVImgnet2.0, a larger-scale dataset of multi-view images, which enjoys 3D-aware signals from multi-view consistency. MVImgnet2.0 expands its last version into a total of 520k objects from 515 categories, and also provide higher-quality annotations. The extremely rich geometry and texture information in the real-world objects leads to MVImgNet2.0’s great potential in supporting large-scale learning in the 3D domain.

Abstract

MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting that makes a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360\degree views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) we additionally reconstruct high-quality dense point clouds via advanced methods for objects captured in 360\degree views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public, including all 520k object multi-view images and the reconstructed high-quality point clouds, hoping to inspire the broader vision community.

Dataset

Summary

MVImgNet2.0 contains ∼300k real-world objects in 347 classes, of which 277 are new categories not covered by MVImgNet. MVImgNet2.0 expands MVImgNet to a total of ~520k real-life objects and 515 categories. The annotation comprehensively covers object masks, camera parameters, and point clouds.

The pipeline of multi-view image acquisition and annotation.

Samples of MVImgNet2.0 images with their corresponding annotations.

Category taxonomy.

More Samples

A variety of multi-view images in MVImgNet2.0.

Point clouds in MVImgNet2.0.

BibTeX

To cite MVImgNet2.0, you can use the following BibTeX entry:


      @article{10.1145/3687973,
        author = {Han, Xiaoguang and Wu, Yushuang and Shi, Luyue and Liu, Haolin and Liao, Hongjie and Qiu, Lingteng and Yuan, Weihao and Gu, Xiaodong and Dong, Zilong and Cui, Shuguang},
        title = {MVImgNet2.0: A Larger-scale Dataset of Multi-view Images},
        journal = {ACM Transactions on Graphics (TOG)},
        volume = {43},
        number = {6},
        year = {2024},
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        doi = {10.1145/3687973},
        }        
    

MVImgNet is also available at GitHub, cite with:


      @inproceedings{yu2023mvimgnet,
        title     = {MVImgNet: A Large-scale Dataset of Multi-view Images},
        author    = {Yu, Xianggang and Xu, Mutian and Zhang, Yidan and Liu, Haolin and Ye, Chongjie and Wu, Yushuang and Yan, Zizheng and Liang, Tianyou and Chen, Guanying and Cui, Shuguang, and Han, Xiaoguang},
        booktitle = {CVPR},
        year      = {2023}
    }