1,411
Views
0
CrossRef citations to date
0
Altmetric
Full Papers

Learning global spatial information for multi-view object-centric models

, &
Pages 828-839 | Received 16 Sep 2022, Accepted 04 Feb 2023, Published online: 10 Mar 2023

References

  • Devin C, Abbeel P, Darrell T, et al. Deep object-centric representations for generalizable robot learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA); 2018. p. 7111–7118. IEEE.
  • Veerapaneni R, Co-Reyes JD, Chang M, et al. Entity abstraction in visual model-based reinforcement learning. In: Conference on Robot Learning; 2019. Osaka, Japan
  • Kulkarni T, Gupta A, Ionescu C, et al. Unsupervised learning of object keypoints for perception and control. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems; 2019. Vancouver, Canada. Curran Associates, Inc.
  • Ding D, Hill F, Santoro A, et al. Attention over learned object embeddings enables complex visual reasoning. CoRR. 2020. arXiv:2012.08508.
  • Watters N, Matthey L, Bosnjak M, et al. COBRA: data-efficient model-based RL through unsupervised object discovery and curiosity-driven exploration. CoRR. 2019. arXiv:1905.09275.
  • Greff K, van Steenkiste S, Schmidhuber J. Neural expectation maximization. In: Guyon I, Luxburg UV, Bengio S, et al., editors. Advances in Neural Information Processing Systems; Vol. 30. Curran Associates, Inc.; 2017. Long Beach, CA, USA.
  • Locatello F, Weissenborn D, Unterthiner T, et al. Object-centric learning with slot attention. CoRR. 2020. arXiv:2006.15055.
  • Burgess CP, Matthey L, Watters N, et al. Monet: unsupervised scene decomposition and representation. CoRR. 2019. arXiv:1901.11390.
  • Greff K, Kaufmann RL, Kabra R, et al. Multi-object representation learning with iterative variational inference. CoRR. 2019. arXiv:1903.00450.
  • Engelcke M, Kosiorek AR, Jones OP, et al. Genesis: generative scene inference and sampling with object-centric latent representations. In: International Conference on Learning Representations; 2020.
  • Eslami SA, Heess N, Weber T, et al. Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems; 2016. p. 3225–3233.  Barcelona Spain.
  • Crawford E, Pineau J. Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 33; 2019. p. 3412–3420. Honolulu, Hawaii, USA.
  • Jiang J, Ahn S. Generative neurosymbolic machines. Advances in Neural Information Processing Systems. 2020;33.
  • Ha D, Schmidhuber J. World models. arXiv preprint arXiv:180310122. 2018.
  • Kawato M. Internal models for motor control and trajectory planning. Curr Opin Neurobiol. 1999;9(6):718–727.
  • Hafner D, Lillicrap T, Ba J, et al. Dream to control: learning behaviors by latent imagination. In: International Conference on Learning Representations; 2019. New Orleans, LA, USA.
  • Wu P, Escontrela A, Hafner D, et al. Daydreamer: world models for physical robot learning. In: Conference on Robot Learning. 2022. Auckland, New Zealand.
  • Nanbo L, Eastwood C, Fisher RB. Learning object-centric representations of multi-object scenes from multiple views. In: 34th Conference on Neural Information Processing Systems; 2020. Curran Associates, Inc.
  • Chen C, Deng F, Ahn S. Roots: object-centric representation and rendering of 3D scenes. CoRR. 2021. arXiv:2006.06130.
  • Eslami SA, Rezende DJ, Besse F, et al. Neural scene representation and rendering. Science. 2018;360(6394):1204–1210.
  • Schmidt T. Perception: the binding problem and the coherence of perception. In: Banks WP, editor. Encyclopedia of consciousness. Oxford: Academic Press; 2009. p. 147–158.
  • Treisman AM, Gelade G. A feature-integration theory of attention. Cogn Psychol. 1980;12(1):97–136.
  • Hinton GE. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002;14(8):1771–1800.
  • Rezende D, Mohamed S. Variational inference with normalizing flows. In: Bach F, Blei D, editors. Proceedings of the 32th International Conference on Machine Learning; Vol. 37; 07–09 Jul; Lille; 2015. p. 1530–1538.
  • Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. CoRR. 2017. arXiv:1706.03762.
  • Kosiorek AR, Strathmann H, Zoran D, et al. Nerf-vae: a geometry aware 3D scene generative model. In: Meila M, Zhang T, editors. Proceedings of the 38th International Conference on Machine Learning; Vol. 139; 2021. p. 5742–5752. PMLR.
  • Florence P, Manuelli L, Tedrake R. Self-supervised correspondence in visuomotor policy learning. IEEE Robot Autom Lett. 2019;5(2):492–499.
  • Kingma DP, Welling M. Auto-encoding variational Bayes. CoRR. 2013. arXiv:1312.6114.
  • Dai S, Li X, Wang L, et al. Learning segmentation masks with the independence prior. In: Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 33; Jul.; 2019. p. 3429–3436. Honolulu, Hawaii, USA.
  • Nguyen-Phuoc T, Richardt C, Mai L, et al. Blockgan: learning 3D object-aware scene representations from unlabelled images. Advances in Neural Information Processing Systems. Nov 2020;33:6767–6778.
  • Niemeyer M, Geiger AG. Representing scenes as compositional generative neural feature fields. In: Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2021. IEEE.
  • Yao Y, Luo Z, Li S, et al. BlendedMVS: a large-scale dataset for generalized multi-view stereo networks, Computer Vision and Pattern Recognition (CVPR), 2020.
  • Kuo J, Muglikar M, Zhang Z, et al. Redesigning slam for arbitrary multi-camera systems. In: 2020 IEEE International Conference on Robotics and Automation (ICRA); 2020. p. 2116–2122. IEEE.
  • Lin Y, Tremblay J, Tyree S, et al. Multi-view fusion for multi-level robotic scene understanding. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2021. p. 6817–6824. IEEE.
  • Mildenhall B, Srinivasan PP, Tancik M, et al. NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV; 2020.
  • Tobin J, Zaremba W, Abbeel P. Geometry-aware neural rendering. Adv Neural Inf Process Syst. 2019;32:11559–11569.
  • Henderson P, Lampert CH. Unsupervised object-centric video generation and decomposition in 3D. CoRR. 2020. arXiv:2007.06705.
  • Stelzner K, Kersting K, Kosiorek AR. Decomposing 3D scenes into objects via unsupervised volume segmentation. CoRR. 2021. arXiv:2104.01148.
  • Yu HX, Guibas LJ, Wu J. Unsupervised discovery of object radiance fields. CoRR. 2021. arXiv:2107.07905.
  • Engelcke M, Jones OP, Posner I. Genesis-v2: inferring unordered object representations without iterative refinement. CoRR. 2021. arXiv:2104.09958.
  • Vasco M, Melo FS, Paiva A. MHVAE: a human-inspired deep hierarchical generative model for multimodal representation learning. CoRR. 2020. arXiv:2006.02991.
  • Akuzawa K, Iwasawa Y, Matsuo Y. Information-theoretic regularization for learning global features by sequential vae. Mach Learn. 2021;110, 2239–2266:1–28.
  • Watters N, Matthey L, Burgess CP, et al. Spatial broadcast decoder: a simple architecture for learning disentangled representations in VAEs. CoRR. 2019. arXiv:abs/1901.07017.
  • Wu M, Goodman N. Multimodal generative models for scalable weakly-supervised learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems; 2018. p. 5580–5590. Montréal, Canada.
  • Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997 Nov;9(8):1735–1780.
  • Marino J, Yue Y, Mandt S. Iterative amortized inference. CoRR. 2018. arXiv:1807.09356.
  • Emami P, He P, Ranka S, et al. Efficient iterative amortized inference for learning symmetric and disentangled multi-object representations. In: Meila M, Zhang T, editors. Proceedings of the 38th International Conference on Machine Learning; Vol. 139; 18–24 Jul. 2021. p. 2970–2981. PMLR.
  • Rezende DJ, Viola F. Taming vaes. 2018. arXiv:1810.00597.
  • Johnson J, Hariharan B, Van Der Maaten L, et al. Clevr: a diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2901–2910. IEEE.