Gemma 4 12B

Summary

Gemma 4 12B is a Google DeepMind dense open-weight multimodal model that processes text, image, and audio inputs with an encoder-free multimodal interface.

Role In The Wiki

Gemma 4 12B is the current production-release counterpart to encoder-free multimodal research sources such as Tuna-2. Its wiki value is not that it settles the semantic-encoder-versus-raw-input debate, but that it shows a major model release using lightweight projection frontends instead of separate vision and audio encoders.

Official Artifacts

Evidence

Relation To Foundation TSFM Agenda

Use the source-level agenda mapping in gemma-4-12b-2026 rather than duplicating verdict rows here.

At the entity level, Gemma 4 12B is a named production/open-weight model release that makes encoder-free multimodal routing operationally concrete. It is an analogy for time-series and world-model work, especially where numeric features, event streams, logs, audio, images, and text need to enter one model without separate heavy modality encoders.