E2D2
Diffusion models generate data by iteratively improving generated content, going from unintelligible noise to something realistic, a process that underpins today’s state-of-the-art image and video generators.
While less mature in language modeling, discrete diffusion models provide an alternative to autoregressive (AR) text generation. Unlike AR models, which produce tokens sequentially, diffusion models refine entire sequences in parallel, offering potential gains in inference efficiency and global coherence. They also achieve competitive training efficiency relative to AR models and offer new generation modes beyond left-to-right decoding. These models enable new capabilities such as parallel refinement and guided generation that open up the possibility of more efficient and tunable LLM applications.
The state-of-the-art in discrete diffusion language models is LLaDA, which has been published by the Ant Group and the Renmin University of China. This is an 8B parameter model that’s competitive with AR-based 8B models. A lab at Cornell has introduced a diffusion model architecture (known as E2D2) that leverages encoders alongside the decoders typically used to date. E2D2 may be more than 2x as efficient for training and inference.
