Lucid

Minecraft World Model

2024.11.11•25fps on RTX 4090•8xH100 training

The first real-time interactive world model. A fully neural approach to generating playable Minecraft environments at 30fps on consumer hardware.

Extreme Compression

The breakthrough lies in radical compression. While typical world models require hundreds of tokens per frame, Lucid v1 compresses each Minecraft frame to just 15 tokens. This 600x reduction in computational complexity enables real-time inference on consumer hardware without sacrificing visual fidelity or temporal consistency.

Our VAE architecture achieves this through aggressive latent space compression combined with GAN-based perceptual loss, preserving essential game mechanics while discarding redundant visual information.

Causal World Modeling

The model employs a causally-trained diffusion transformer that learns not just visual patterns but the underlying physics and logic of the game world. By conditioning each frame on past observations and actions, it maintains coherent cause-and-effect relationships across extended gameplay sessions.

We perform rollouts using a diffusion forcing autoregressive technique, allowing the model to perform endless generation of gameplay.

Real-Time Demo

Live gameplay demonstration showing real-time neural world generation with action inputs. The model learns the complex game mechanics of minecraft, while having a 2 second long memory resulting in quite a dream-like experience.

Performance

Consumer Hardware

25 FPS

Real-time playable on RTX 4090

Token Efficiency

15 Tokens

Per frame vs 500+ in previous models

Training Scale

8x H100

Distributed training cluster

Model Scale

1B Params

Optimal performance vs compute

Technical Innovation

Lucid v1 demonstrates that extreme compression and causal modeling can coexist. The model successfully captures physics-based interactions, player actions and their consequences, and environmental persistence—all while operating at a fraction of the computational cost of previous approaches.

This represents a fundamental shift in world modeling: from maximizing token count to maximizing information density. The result is the first neural game engine capable of true real-time interaction on consumer hardware.

Future Implications

This breakthrough opens entirely new possibilities for interactive AI systems. When game worlds can be generated in real-time by neural networks, the boundaries between creation and play dissolve completely.

Beyond gaming, this technology enables interactive simulations, training environments, and new forms of digital experience that exist only in the learned representations of neural networks—accessible to anyone with consumer hardware.

←All Notes All Notes→