Contour wavelet diffusion – a fast and high-quality facial expression generation model

Chenwei Xua School of Design and Art, Communication University of Zhejiang, Hangzhou, ChinaView further author information

Yuntao Zoub School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, ChinaCorrespondence[email protected]
View further author information

Article: 2316023 | Received 11 Sep 2023, Accepted 02 Feb 2024, Published online: 14 Feb 2024

Cite this article
https://doi.org/10.1080/09540091.2024.2316023
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Abdal, R., Qin, Y., & Wonka, P. (2019). Image2stylegan: How to embed images into the stylegan latent space? 2019 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2019.00453
Google Scholar
Bamberger, R. H., & Smith, M. J. (1992). A filter bank for the directional decomposition of images: Theory and design. IEEE Transactions on Signal Processing, 40(4), 882–893. https://doi.org/10.1109/78.127960
Web of Science ®Google Scholar
Batista, J. C., Albiero, V., Bellon, O. R., & Silva, L. (2017). Aumpnet: Simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network. 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2017.
Google Scholar
Brophy, E., Wang, Z., She, Q., & Ward, T. (2023). Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55(10), Article 199. https://doi.org/10.1145/3559540
Web of Science ®Google Scholar
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., & Lundberg, S. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:.12712.
Google Scholar
Carvalho, T. G., Thiberge, S., Sakamoto, H., & Ménard, R. (2004). Conditional mutagenesis using site-specific recombination in Plasmodium berghei. Proceedings of the National Academy of Sciences, 101(41), 14931–14936. https://doi.org/10.1073/pnas.0404416101
PubMed Web of Science ®Google Scholar
Croitoru, F.-A., Hondru, V., Ionescu, R. T., & Shah, M. (2023). Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 10850–10869. https://doi.org/10.1109/tpami.2023.3261988
PubMed Web of Science ®Google Scholar
Da Cunha, A. L., Zhou, J., & Do, M. N. (2006). The nonsubsampled contourlet transform: Theory, design, and applications. IEEE Transactions on Image Processing, 15(10), 3089–3101. https://doi.org/10.1109/TIP.2006.877507
PubMed Web of Science ®Google Scholar
Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34, 8780–8794.
Google Scholar
Do, M. N., & Vetterli, M. (2003). The finite ridgelet transform for image representation. IEEE Transactions on Image Processing, 12(1), 16–28. https://doi.org/10.1109/TIP.2002.806252
PubMed Web of Science ®Google Scholar
Ekman, P. (2004). Emotions revealed. BMJ, 328(Suppl S5), 0405184. https://doi.org/10.1136/sbmj.0405184
Google Scholar
Ekman, P., & Friesen, W. V. (1978). Facial action coding system [dataset]. In PsycTESTS Dataset. American Psychological Association (APA). https://doi.org/10.1037/t27734-000
Google Scholar
Ekman, P., Friesen, W. V., & Tomkins, S. S. (1971). Facial affect scoring technique: A first validity study.
Google Scholar
Eslami, R., & Radha, H. (2004). Wavelet-based contourlet transform and its application to image coding. 2004 International Conference on Image Processing, 2004. ICIP'04.
Google Scholar
Esser, P., Rombach, R., Blattmann, A., & Ommer, B. (2021). Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis. Advances in Neural Information Processing Systems, 34, 3518–3532.
Google Scholar
Freeman, W. T., & Adelson, E. H. (1991). The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9), 891–906. https://doi.org/10.1109/34.93808
Web of Science ®Google Scholar
Gao, R., Song, Y., Poole, B., Wu, Y. N., & Kingma, D. P. (2020). Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:.08125.
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Stat, 10, 1050.
Google Scholar
Gudi, A., Tasli, H. E., Den Uyl, T. M., & Maroulis, A. (2015). Deep learning based facs action unit occurrence and intensity estimation. 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
Google Scholar
Gupta, G., Khan, S., Guleria, V., Almjally, A., Alabduallah, B. I., Siddiqui, T., Albahlal, B. M., Alajlan, S. A., & Al-Subaie, M. (2023). DDPM: A dengue disease prediction and diagnosis model using sentiment analysis and machine learning algorithms. Diagnostics, 13(6), 1093. https://doi.org/10.3390/diagnostics13061093
Web of Science ®Google Scholar
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.
Google Scholar
Huang, J., Cui, K., Guan, D., Xiao, A., Zhan, F., Lu, S., Liao, S., & Xing, E. (2022). Masked generative adversarial networks are data-efficient generation learners. Advances in Neural Information Processing Systems, 35, 2154–2167.
Google Scholar
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:.10196.
Google Scholar
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Google Scholar
Kaur, G., Agarwal, R., & Patidar, V. (2020). Semi-blind robust watermarking with dual complex tree wavelet based hybrid transform and SVD. 2020 IEEE 17th India Council International Conference (INDICON).
Google Scholar
Kingma, D. P., & Dhariwal, P. (2018, December). Glow: Generative flow with invertible 1 × 1 convolutions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 10236–10245).
Google Scholar
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Google Scholar
Kong, Z., & Ping, W. (2021). On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:.00132.
Google Scholar
Lu, Y., Chen, D., Olaniyi, E., & Huang, Y. (2022). Generative adversarial networks (GANs) for image augmentation in agriculture: A systematic review. Computers and Electronics in Agriculture, 200, 107208. https://doi.org/10.1016/j.compag.2022.107208
Web of Science ®Google Scholar
Luhman, E., & Luhman, T. (2021). Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:.02388.
Google Scholar
Ma, H., Zhang, L., Zhu, X., & Feng, J. (2022). Accelerating score-based generative models with preconditioned diffusion sampling. European Conference on Computer Vision.
Google Scholar
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I., & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:.10741.
Google Scholar
Obukhov, A., & Krasnyanskiy, M. (2020). Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance. Advances in Intelligent Systems and Computing, 102–114. https://doi.org/10.1007/978-3-030-63322-6_8
Google Scholar
Oord, A. V. D., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016, December). Conditional image generation with PixelCNN decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems (pp. 4797–4805).
Google Scholar
Pandzic, I. S. (2002). MPEG-4 facial animation framework for the web and mobile applications. In MPEG-4 Facial Animation (pp. 65–79). Portico. https://doi.org/10.1002/0470854626.ch4
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:.06125, 1(2), 3.
Google Scholar
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. International Conference on Machine Learning.
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Google Scholar
Salimans, T., & Ho, J. (2022). Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:.00512.
Google Scholar
Shensa, M. J. (1992). The discrete wavelet transform: Wedding the a trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10), 2464–2482. https://doi.org/10.1109/78.157290
Web of Science ®Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning.
Google Scholar
Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. arXiv preprint arXiv:.02502.
Google Scholar
Song, Y., Durkan, C., Murray, I., & Ermon, S. (2021). Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34, 1415–1428.
Google Scholar
Song, Y., & Ermon, S. (2020). Improved techniques for training score-based generative models. Advances in Neural Information Processing Systems, 33, 12438–12448.
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:.13456.
Google Scholar
Taubman, D. S., Marcellin, M. W., & Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286–287. https://doi.org/10.1117/1.1469618
Google Scholar
Tran, D. L., Walecki, R., Rudovic, O., Eleftheriadis, S., Schuller, B., & Pantic, M. (2017). Deepcoder: Semi-parametric variational autoencoders for automatic facial action coding. 2017 IEEE International Conference on Computer Vision (ICCV).
Google Scholar
Vahdat, A., & Kautz, J. J. A. i. n. i. p. s. (2020). NVAE: A deep hierarchical variational autoencoder. 33, 19667–19679.
Google Scholar
Vahdat, A., Kreis, K., & Kautz, J. (2021). Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34, 11287–11302.
Google Scholar
Van Erven, T., & Harremos, P. (2014). Rényi divergence and Kullback-Leibler divergence. IEEE Transactions on Information Theory, 60(7), 3797–3820. https://doi.org/10.1109/TIT.2014.2320500
Web of Science ®Google Scholar
Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674. https://doi.org/10.1162/NECO_a_00142
PubMed Web of Science ®Google Scholar
Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2016). Copula ordinal regression for joint estimation of facial action unit intensity. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Google Scholar
Walecki, R., Rudovic, O., Pavlovic, V., & Pantic, M. (2017). Variable-state latent conditional random field models for facial expression analysis. Image and Vision Computing, 58, 25–37. https://doi.org/10.1016/j.imavis.2016.04.009
Web of Science ®Google Scholar
Wang, Z., Zheng, H., He, P., Chen, W., & Zhou, M. (2022). Diffusion-gan: Training gans with diffusion. arXiv preprint arXiv:.02262.
Google Scholar
Xiao, Z., Kreis, K., Kautz, J., & Vahdat, A. (2020). Vaebm: A symbiosis between variational autoencoders and energy-based models. arXiv preprint arXiv:.00654.
Google Scholar
Yang, M., Wang, Z., Chi, Z., & Feng, W. (2022). WaveGAN: Frequency-aware GAN for high-fidelity few-shot image generation. European Conference on Computer Vision.
Google Scholar
Zhang, S., Li, L., & Zhao, Z. (2012). Facial expression recognition based on Gabor wavelets and sparse representation. 2012 IEEE 11th International Conference on Signal Processing.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Contour wavelet diffusion – a fast and high-quality facial expression generation model

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Contour wavelet diffusion – a fast and high-quality facial expression generation model

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date