363
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Contour wavelet diffusion – a fast and high-quality facial expression generation model

&
Article: 2316023 | Received 11 Sep 2023, Accepted 02 Feb 2024, Published online: 14 Feb 2024

References

  • Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space? 2019 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2019.00453
  • Bamberger, R. H., & Smith, M. J. (1992). A filter bank for the directional decomposition of images: Theory and design. IEEE Transactions on Signal Processing, 40(4), 882–893. https://doi.org/10.1109/78.127960
  • Batista, J. C., Albiero, V., Bellon, O. R., & Silva, L. (2017). Aumpnet: Simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017.
  • Brophy, E., Wang, Z., She, Q., & Ward, T. (2023). Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55(10), Article 199. https://doi.org/10.1145/3559540
  • Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., & Lundberg, S. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:.12712.
  • Carvalho, T. G., Thiberge, S., Sakamoto, H., & Ménard, R. (2004). Conditional mutagenesis using site-specific recombination in Plasmodium berghei. Proceedings of the National Academy of Sciences, 101(41), 14931–14936. https://doi.org/10.1073/pnas.0404416101
  • Croitoru, F.-A., Hondru, V., Ionescu, R. T., & Shah, M. (2023). Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10850–10869. https://doi.org/10.1109/tpami.2023.3261988
  • Da Cunha, A. L., Zhou, J., & Do, M. N. (2006). The nonsubsampled contourlet transform: Theory, design, and applications. IEEE Transactions on Image Processing, 15(10), 3089–3101. https://doi.org/10.1109/TIP.2006.877507
  • Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 8780–8794.
  • Do, M. N., & Vetterli, M. (2003). The finite ridgelet transform for image representation. IEEE Transactions on Image Processing, 12(1), 16–28. https://doi.org/10.1109/TIP.2002.806252
  • Ekman, P. (2004). Emotions revealed. BMJ, 328(Suppl S5), 0405184. https://doi.org/10.1136/sbmj.0405184
  • Ekman, P., & Friesen, W. V. (1978). Facial action coding system [dataset]. In PsycTESTS Dataset. American Psychological Association (APA). https://doi.org/10.1037/t27734-000
  • Ekman, P., Friesen, W. V., & Tomkins, S. S. (1971). Facial affect scoring technique: A first validity study.
  • Eslami, R., & Radha, H. (2004). Wavelet-based contourlet transform and its application to image coding. 2004 International Conference on Image Processing, 2004. ICIP'04.
  • Esser, P., Rombach, R., Blattmann, A., & Ommer, B. (2021). Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis. Advances in Neural Information Processing Systems, 34, 3518–3532.
  • Freeman, W. T., & Adelson, E. H. (1991). The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9), 891–906. https://doi.org/10.1109/34.93808
  • Gao, R., Song, Y., Poole, B., Wu, Y. N., & Kingma, D. P. (2020). Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:.08125.
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Stat, 10, 1050.
  • Gudi, A., Tasli, H. E., Den Uyl, T. M., & Maroulis, A. (2015). Deep learning based facs action unit occurrence and intensity estimation. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
  • Gupta, G., Khan, S., Guleria, V., Almjally, A., Alabduallah, B. I., Siddiqui, T., Albahlal, B. M., Alajlan, S. A., & Al-Subaie, M. (2023). DDPM: A dengue disease prediction and diagnosis model using sentiment analysis and machine learning algorithms. Diagnostics, 13(6), 1093. https://doi.org/10.3390/diagnostics13061093
  • Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
  • Huang, J., Cui, K., Guan, D., Xiao, A., Zhan, F., Lu, S., Liao, S., & Xing, E. (2022). Masked generative adversarial networks are data-efficient generation learners. Advances in Neural Information Processing Systems, 35, 2154–2167.
  • Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:.10196.
  • Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  • Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  • Kaur, G., Agarwal, R., & Patidar, V. (2020). Semi-blind robust watermarking with dual complex tree wavelet based hybrid transform and SVD. 2020 IEEE 17th India Council International Conference (INDICON).
  • Kingma, D. P., & Dhariwal, P. (2018, December). Glow: Generative flow with invertible 1 × 1 convolutions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 10236–10245).
  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
  • Kong, Z., & Ping, W. (2021). On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:.00132.
  • Lu, Y., Chen, D., Olaniyi, E., & Huang, Y. (2022). Generative adversarial networks (GANs) for image augmentation in agriculture: A systematic review. Computers and Electronics in Agriculture, 200, 107208. https://doi.org/10.1016/j.compag.2022.107208
  • Luhman, E., & Luhman, T. (2021). Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:.02388.
  • Ma, H., Zhang, L., Zhu, X., & Feng, J. (2022). Accelerating score-based generative models with preconditioned diffusion sampling. European Conference on Computer Vision.
  • Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:.10741.
  • Obukhov, A., & Krasnyanskiy, M. (2020). Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance. Advances in Intelligent Systems and Computing, 102–114. https://doi.org/10.1007/978-3-030-63322-6_8
  • Oord, A. V. D., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016, December). Conditional image generation with PixelCNN decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems (pp. 4797–4805).
  • Pandzic, I. S. (2002). MPEG-4 facial animation framework for the web and mobile applications. In MPEG-4 Facial Animation (pp. 65–79). Portico. https://doi.org/10.1002/0470854626.ch4
  • Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:.06125, 1(2), 3.
  • Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. International Conference on Machine Learning.
  • Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  • Salimans, T., & Ho, J. (2022). Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:.00512.
  • Shensa, M. J. (1992). The discrete wavelet transform: Wedding the a trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10), 2464–2482. https://doi.org/10.1109/78.157290
  • Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning.
  • Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:.02502.
  • Song, Y., Durkan, C., Murray, I., & Ermon, S. (2021). Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34, 1415–1428.
  • Song, Y., & Ermon, S. (2020). Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 12438–12448.
  • Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:.13456.
  • Taubman, D. S., Marcellin, M. W., & Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286–287. https://doi.org/10.1117/1.1469618
  • Tran, D. L., Walecki, R., Rudovic, O., Eleftheriadis, S., Schuller, B., & Pantic, M. (2017). Deepcoder: Semi-parametric variational autoencoders for automatic facial action coding. 2017 IEEE International Conference on Computer Vision (ICCV).
  • Vahdat, A., & Kautz, J. J. A. i. n. i. p. s. (2020). NVAE: A deep hierarchical variational autoencoder. 33, 19667–19679.
  • Vahdat, A., Kreis, K., & Kautz, J. (2021). Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34, 11287–11302.
  • Van Erven, T., & Harremos, P. (2014). Rényi divergence and Kullback-Leibler divergence. IEEE Transactions on Information Theory, 60(7), 3797–3820. https://doi.org/10.1109/TIT.2014.2320500
  • Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674. https://doi.org/10.1162/NECO_a_00142
  • Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2016). Copula ordinal regression for joint estimation of facial action unit intensity. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  • Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2017). Variable-state latent conditional random field models for facial expression analysis. Image and Vision Computing, 58, 25–37. https://doi.org/10.1016/j.imavis.2016.04.009
  • Wang, Z., Zheng, H., He, P., Chen, W., & Zhou, M. (2022). Diffusion-gan: Training gans with diffusion. arXiv preprint arXiv:.02262.
  • Xiao, Z., Kreis, K., Kautz, J., & Vahdat, A. (2020). Vaebm: A symbiosis between variational autoencoders and energy-based models. arXiv preprint arXiv:.00654.
  • Yang, M., Wang, Z., Chi, Z., & Feng, W. (2022). WaveGAN: Frequency-aware GAN for high-fidelity few-shot image generation. European Conference on Computer Vision.
  • Zhang, S., Li, L., & Zhao, Z. (2012). Facial expression recognition based on Gabor wavelets and sparse representation. 2012 IEEE 11th International Conference on Signal Processing.