Google launches Gemma 4: this is the new family of open models

  • Gemma 4 is a family of open AI models based on the same technology as Gemini 3, but licensed under a fully commercial Apache 2.0 license.
  • It includes four variants (E2B, E4B, 26B MoE and 31B Dense) designed to work from mobile and edge devices to high-end workstations and GPUs.
  • It offers multimodality (text, image, video and audio in the smaller models), context windows of up to 256K tokens and native support for agents, JSON and function calls.
  • Its efficiency allows large models to run on a single 80GB NVIDIA H100 GPU and enables on-premises, cloud, and sovereign cloud deployments, which is key for regulated European companies.

Gemma 4 Artificial Intelligence Model

Google has made a major shift in its open artificial intelligence strategy with the launch of Gemma 4, a new family of open weight models It aims to combine high performance, hardware efficiency, and a truly open license for commercial use. Built on the same technological foundation as Gemini 3, this line targets both large enterprises and developers who want to deploy advanced AI without relying entirely on closed cloud services.

Far from being just another experimental model, Gemma 4 arrives as a complete proposal of four variants These solutions are capable of running on mobile devices, edge devices, personal computers, and servers with high-performance GPUs. Google's strategy focuses on offering more intelligence per parameter, reducing infrastructure costs, and simultaneously giving the community and businesses the flexibility to adapt the models to their own needs.

A family of four models designed to cover everything from mobile to data center

Gemma 4 model variants

The Gemma 4 family is organized in four main sizes: E2B, E4B, 26B MoE and 31B DenseThe first two are geared towards edge execution, while the 26.000 billion and 31.000 billion parameter models target powerful workstations, including high-performance laptops and server environments.

The variants Effective 2B (E2B) and Effective 4B (E4B) They have been specifically designed for resource-constrained devices, such as Android phones, IoT boards, and embedded systems such as Raspberry Pi or hardware from manufacturers like Qualcomm and MediaTek. Its goal is to maintain good reasoning and multimodal capabilities while minimizing memory, battery, and latency consumption.

Above, the model 26B with Mixture of Experts (MoE) architecture It is optimized to minimize response time: during inference it only activates around 3,8 billion parameters, allowing for high-speed token generation on developer hardware or consumer GPUs, including custom AI chips, ideal for local programming assistants and development tools.

At the top end is located Gemma 4 31B DenseThe dense, task-oriented variant prioritizes quality and consistency over speed. This model has already positioned itself among the top positions in open-source model rankings such as the Arena AI text leaderboard, competing with systems that are twenty times larger in terms of parameters.

This combination of light and heavy models allows Gemma 4 to cover From everyday mobile uses to critical business workflowsgiving systems architects leeway to choose between speed of inference or depth of analysis depending on each project.

Extended multimodality and long context windows

Multimodality in Gemma 4

One of the strengths of the new family is its ability to work with multiple content types nativelyAll Gemma 4 models can process text and images, supporting different resolutions and aspect ratios, which facilitates use cases such as scanning document analysis, visual understanding of interfaces, or generating descriptions.

In addition, the versions E2B and E4B extend multimodality to video and audioThis allows them to handle low-latency speech recognition tasks, video clip analysis, or augmented reality applications directly on the device. In mobile or IoT scenarios, this ability to run vision and audio without constantly relying on the cloud reduces connectivity issues and improves privacy.

Regarding the handling of large amounts of information, the Gemma 4 family introduces context windows of up to 256.000 tokens in the largest modelsThe edge device variants offer 128K contexts, while the 26B and 31B variants reach 256K tokens. This allows, for example, loading entire code repositories, large document databases, or very long conversation histories in a single query.

This breadth of context is particularly useful for offline code generation, automated technical support, or legal document analysisThese areas are especially relevant in European companies subject to strict regulations and which often need to keep information within their own systems.

Along with multimodality and extended context, Google highlights Gemma 4's support for more than 140 languagesThis broad linguistic coverage makes it an attractive option for companies with a global presence, European public administrations, or startups that want to launch multilingual products without depending on multiple different models.

Autonomous agents, JSON, and function calls: Gemma 4 agentic flow-oriented

Self-employed agents with Gemma 4

Gemma 4 goes beyond traditional text generation. The entire family has been designed with a clear focus on agent-based workflows, an increasingly relevant trend in business and software development environments.

The models include as standard native support for function callingThis allows the system to invoke external APIs or specific tools in a controlled manner. In addition, they offer structured JSON output, facilitating integration with applications that require formatted responses for consumption by other services or microservices.

Another key aspect is compatibility with native system instructionsThese features allow for a precise definition of the system's role and the establishment of clear rules governing the model's behavior. This capability is particularly useful when building autonomous agents that manage customer service, automate internal processes, or coordinate various tools within a company.

According to Google Cloud executives, enterprise AI requires models capable of execute complex logic while keeping data within secure environmentsIn this sense, Gemma 4's agentic approach is combined with on-premises and controlled cloud deployment options to reduce risks and increase control over where and how data is processed.

The company accompanies these models with a Agent Development Kit (ADK), a modular framework designed to accelerate agent design, and with support for running intensive workloads serverless on Cloud Run over NVIDIA RTX PRO 6000 GPUs (Blackwell), which lowers the initial investment needed to experiment with complex agents.

Apache License 2.0 and digital sovereignty: implications for Europe and Spain

One of the most significant changes compared to previous generations of Gemma is in the license. For the first time, Gemma 4 is distributed under Apache 2.0, a fully permissive open license which allows commercial use without additional specific restrictions by Google.

In previous versions, the terms of use included conditions that raised concerns among corporate legal teams, especially in large companies and public administrations. With Apache 2.0, Google places Gemma 4 in the same licensing category as... Other open reference models such as Llamafacilitating its adoption in production projects without the need for individual negotiations.

This decision has a clear European interpretation. The combination of open model, compatibility with over 140 languages, and sovereign deployment options It aligns with data residency regulations and the discussions surrounding the European Union's AI Regulation. Spanish and European companies can integrate Gemma 4 into their solutions, maintaining greater control over where data is stored and processed.

Google anticipates Gemma 4 will be available in Sovereign Cloud environments and air-gapped configurationsas well as in on-premises installations. For regulated sectors such as banking, healthcare, energy, or public administration, this opens the door to leveraging advanced AI without needing to send sensitive information to shared infrastructures outside the European area.

The flexibility of the license also encourages the creation of local and specialized variantsExamples have already been seen in the past, such as models adapted to specific languages ​​and contexts (for example, BgGPT in Bulgaria or medical applications in North American universities), and Google's expectation is that Gemma 4 will strengthen this ecosystem, which some refer to as a "Gemmaverse" with tens of thousands of community variants.

Google Cloud integration, local execution, and required hardware

Beyond opening up the model, Google has prepared a support infrastructure focused on Vertex AI and Google Kubernetes Engine (GKE)Through these services, organizations can provision tailored resources, scale inference workloads, and adjust deployment to their security and compliance requirements.

In Vertex AI, Gemma 4 is integrated as part of the model catalog, allowing technical teams test, fine-tune and deploy Customized variants while maintaining control over computing resources. The combination with GKE enables dynamic scaling, adapting the number of inference service replicas to actual demand.

An important fact for medium-sized companies is that The bfloat16 weights of the 26B and 31B models fit on a single 80GB NVIDIA H100 GPUThis significantly reduces the minimum investment required to access high-end models, compared to alternatives that require multiple GPUs in parallel.

At the same time, Gemma 4 is optimized to run on diverse hardware, from consumer GPUs to mobile solutions with 5G M2M connectivityThe E2B and E4B models leverage techniques such as Per-Layer Embeddings (PLE) to maximize per-parameter efficiency, allowing them to run on phones, Raspberry Pi, or edge devices with very low latencies.

Compatibility also extends to ecosystems such as Hugging Face, Ollama, vLLM, LM Studio or llama.cppas well as Google development platforms like AI Studio and AICore (for Android prototyping). This makes it easy for both independent developers and corporate teams to integrate Gemma 4 into their regular workflows without having to start from scratch.

Potential uses in business, education, and the public sector

Gemma 4's capabilities allow for deployment a wide range of practical applications that go beyond classic chatbots. In the business environment, the models can serve as a basis for internal virtual assistants that answer questions about corporate documentation, generate executive summaries, or automate repetitive tasks in multiple languages.

In the field of programming, the combination of wide context windows, code generation, and low latency This makes Gemma 4 suitable for local development assistants, automated code review, or tools that analyze entire repositories in a single pass, keeping the code on the company's own infrastructure.

In education, Gemma 4 could be used for create personalized tutors that adapt the content At the student level, they generate summaries of complex texts or explain images and graphics, something especially useful for students with specific accessibility needs.

For the public sector and administrations in Spain and Europe, the possibility of deploying these models in controlled environments, with data residing in European territoryIt opens up options in citizen services, file analysis, or automation of procedures, provided they are integrated with the guarantees of transparency and human supervision required by the regulations.

In sectors such as manufacturing, precision agriculture, or infrastructure management, local execution on edge computing devices allows Analyze data in real time without relying on a permanent cloud connectionThis reduces transmission costs, improves response times, and decreases the exposure of sensitive data to external networks.

Local AI, costs, and the gap between open and proprietary models

The launch of Gemma 4 reflects a clear trend in the industry: the priority is no longer just who has the biggest model, but who achieves the best balance between capacity, cost, and ease of deploymentGoogle insists on the idea of ​​"parameter intelligence" as the central metric of this new generation.

The ability to run advanced models locally, without always relying on large cloud services, points to a change in the way products and services are designedFor many everyday tasks—summarizing a text, creating a reminder, processing a simple image—it doesn't make much sense to send data to remote massive models if it can be solved on the device itself.

Even so, Gemma 4 is not intended to replace Google's proprietary models, but complement themThe company maintains Gemini as its most advanced and closed layer, reserved for use cases where maximum capacity is paramount. Gemma 4 sits a step below in terms of technological edge, but gains ground in openness, flexibility, and cost control.

For IT departments, this presents an increasingly visible choice: Closed models, with greater ease of use but less control, compared to open models that require more active management of the infrastructure in exchange for total sovereignty and greater economic optimization in the medium term.

In this context, the competitiveness of Spanish and European companies in the field of AI may depend, to a large extent, on their ability to integrate open models like Gemma 4 into their critical processescombining them when necessary with proprietary services and always ensuring compliance with data protection regulations and future European regulations on artificial intelligence.

With Gemma 4, Google consolidates a firm commitment to efficient open models, capable of running on accessible hardware, adapting to different regulatory frameworks, and serving as a basis for a new generation of local agents and applications; those who know how to take advantage of this combination of openness, performance, and control will have an advantage when building sustainable AI solutions aligned with the demands of Europe.

importance of ota updates in iot-2
Related article:
Open source in IoT: platforms, data and edge AI that make the difference