| ||||
| ||||
![]() Title:Using Instruction-Following LLM Hidden States as Conditioning for Video Diffusion Model Conference:ECAI-2025 Tags:ARTIFICIAL INTELLIGENCE, CLIP SCORE, DIFFUSION, FVD, GENERATIVE AI, HIDDEN STATES, LARGE LANGUAGE MODEL, LATENT, MULTIMODAL, PROMPT ENGINEERING, UNET, VARIATIONAL AUTOENCODER and VIDEO GENERATION Abstract: Video generation has applications in several fields. With the advent of Generative AI, we see extensive research being conducted on video generation using AI. Through this project, we experiment the usage of LLM Hidden states as conditioning to train a Video Latent Diffusion Model to study their ability of passing richer semantic information about the video samples. We performed a comparative study of context retention abilities of LLMs in case of embeddings and hidden states separately. We create a pipeline with three major components - the LLM, a custom Bridge Network and the Diffusion UNet. We conduct our study using two different datasets - the Captioned Moving MNIST and a subset of the Sakuga-42M dataset. We conclude by evaluating our model variants on standard benchmarks and metrics, and state our findings, which could serve as ground for future work. Using Instruction-Following LLM Hidden States as Conditioning for Video Diffusion Model ![]() Using Instruction-Following LLM Hidden States as Conditioning for Video Diffusion Model | ||||
Copyright © 2002 – 2025 EasyChair |