models Archives - AI News https://www.artificialintelligence-news.com/news/tag/models/ Artificial Intelligence News Fri, 25 Apr 2025 14:07:33 +0000 en-GB hourly 1 https://wordpress.org/?v=6.8.1 https://www.artificialintelligence-news.com/wp-content/uploads/2020/09/cropped-ai-icon-32x32.png models Archives - AI News https://www.artificialintelligence-news.com/news/tag/models/ 32 32 Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/ https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/#respond Fri, 25 Apr 2025 12:28:01 +0000 https://www.artificialintelligence-news.com/?p=106047 Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions. Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare. Baidu ERNIE X1 […]

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

]]>
Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions.

Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare.

Baidu ERNIE X1 Turbo: Deep reasoning meets cost efficiency

Positioned as a deep-thinking reasoning model, ERNIE X1 Turbo tackles complex tasks requiring sophisticated understanding. It enters a competitive field, claiming superior performance in some benchmarks against rivals like DeepSeek R1, V3, and OpenAI o1:

Benchmarks of Baidu ERNIE X1 Turbo compared to rival AI large language models like DeepSeek R1 and OpenAI o1.

Key to X1 Turbo’s enhanced capabilities is an advanced “chain of thought” process, enabling more structured and logical problem-solving.

Furthermore, ERNIE X1 Turbo boasts improved multimodal functions – the ability to understand and process information beyond just text, potentially including images or other data types – alongside refined tool utilisation abilities. This makes it particularly well-suited for nuanced applications such as literary creation, complex logical reasoning challenges, code generation, and intricate instruction following.

ERNIE X1 Turbo achieves this performance while undercutting competitor pricing. Input token costs start at $0.14 per million tokens, with output tokens priced at $0.55 per million. This pricing structure is approximately 25% of DeepSeek R1.

Baidu ERNIE 4.5 Turbo: Multimodal muscle at a fraction of the cost

Sharing the spotlight is ERNIE 4.5 Turbo, which focuses on delivering upgraded multimodal features and significantly faster response times compared to its non-Turbo counterpart. The emphasis here is on providing a versatile, responsive AI experience while slashing operational costs.

The model achieves an 80% price reduction compared to the original ERNIE 4.5 with input set at $0.11 per million tokens and output at $0.44 per million tokens. This represents roughly 40% of the cost of the latest version of DeepSeek V3, again highlighting a deliberate strategy to attract users through cost-effectiveness.

Performance benchmarks further bolster its credentials. In multiple tests evaluating both multimodal and text capabilities, Baidu ERNIE 4.5 Turbo outperforms OpenAI’s highly-regarded GPT-4o model. 

In multimodal capability assessments, ERNIE 4.5 Turbo achieved an average score of 77.68 to surpass GPT-4o’s score of 72.76 in the same tests.

Benchmarks of Baidu ERNIE 4.5 Turbo compared to rival AI large language models like DeepSeek R1 and OpenAI o1.

While benchmark results always require careful interpretation, this suggests ERNIE 4.5 Turbo is a serious contender for tasks involving an integrated understanding of different data types.

Baidu continues to shake up the AI marketplace

The launch of ERNIE X1 Turbo and 4.5 Turbo signifies a growing trend in the AI sector: the democratisation of high-end capabilities. While foundational models continue to push the boundaries of performance, there is increasing demand for models that balance power with accessibility and affordability.

By lowering the price points for models with sophisticated reasoning and multimodal features, the Baidu ERNIE Turbo series could enable a wider range of developers and businesses to integrate advanced AI into their applications.

This competitive pricing puts pressure on established players like OpenAI and Anthropic, as well as emerging competitors like DeepSeek, potentially leading to further price adjustments across the market.

(Image Credit: Alpha Photo under CC BY-NC 2.0 license)

See also: China’s MCP adoption: AI assistants that actually do things

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/feed/ 0
How does AI judge? Anthropic studies the values of Claude https://www.artificialintelligence-news.com/news/how-does-ai-judge-anthropic-studies-values-of-claude/ https://www.artificialintelligence-news.com/news/how-does-ai-judge-anthropic-studies-values-of-claude/#respond Wed, 23 Apr 2025 12:04:53 +0000 https://www.artificialintelligence-news.com/?p=105438 AI models like Anthropic Claude are increasingly asked not just for factual recall, but for guidance involving complex human values. Whether it’s parenting advice, workplace conflict resolution, or help drafting an apology, the AI’s response inherently reflects a set of underlying principles. But how can we truly understand which values an AI expresses when interacting […]

The post How does AI judge? Anthropic studies the values of Claude appeared first on AI News.

]]>
AI models like Anthropic Claude are increasingly asked not just for factual recall, but for guidance involving complex human values. Whether it’s parenting advice, workplace conflict resolution, or help drafting an apology, the AI’s response inherently reflects a set of underlying principles. But how can we truly understand which values an AI expresses when interacting with millions of users?

In a research paper, the Societal Impacts team at Anthropic details a privacy-preserving methodology designed to observe and categorise the values Claude exhibits “in the wild.” This offers a glimpse into how AI alignment efforts translate into real-world behaviour.

The core challenge lies in the nature of modern AI. These aren’t simple programs following rigid rules; their decision-making processes are often opaque.

Anthropic says it explicitly aims to instil certain principles in Claude, striving to make it “helpful, honest, and harmless.” This is achieved through techniques like Constitutional AI and character training, where preferred behaviours are defined and reinforced.

However, the company acknowledges the uncertainty. “As with any aspect of AI training, we can’t be certain that the model will stick to our preferred values,” the research states.

“What we need is a way of rigorously observing the values of an AI model as it responds to users ‘in the wild’ […] How rigidly does it stick to the values? How much are the values it expresses influenced by the particular context of the conversation? Did all our training actually work?”

Analysing Anthropic Claude to observe AI values at scale

To answer these questions, Anthropic developed a sophisticated system that analyses anonymised user conversations. This system removes personally identifiable information before using language models to summarise interactions and extract the values being expressed by Claude. The process allows researchers to build a high-level taxonomy of these values without compromising user privacy.

The study analysed a substantial dataset: 700,000 anonymised conversations from Claude.ai Free and Pro users over one week in February 2025, predominantly involving the Claude 3.5 Sonnet model. After filtering out purely factual or non-value-laden exchanges, 308,210 conversations (approximately 44% of the total) remained for in-depth value analysis.

The analysis revealed a hierarchical structure of values expressed by Claude. Five high-level categories emerged, ordered by prevalence:

  1. Practical values: Emphasising efficiency, usefulness, and goal achievement.
  2. Epistemic values: Relating to knowledge, truth, accuracy, and intellectual honesty.
  3. Social values: Concerning interpersonal interactions, community, fairness, and collaboration.
  4. Protective values: Focusing on safety, security, well-being, and harm avoidance.
  5. Personal values: Centred on individual growth, autonomy, authenticity, and self-reflection.

These top-level categories branched into more specific subcategories like “professional and technical excellence” or “critical thinking.” At the most granular level, frequently observed values included “professionalism,” “clarity,” and “transparency” – fitting for an AI assistant.

Critically, the research suggests Anthropic’s alignment efforts are broadly successful. The expressed values often map well onto the “helpful, honest, and harmless” objectives. For instance, “user enablement” aligns with helpfulness, “epistemic humility” with honesty, and values like “patient wellbeing” (when relevant) with harmlessness.

Nuance, context, and cautionary signs

However, the picture isn’t uniformly positive. The analysis identified rare instances where Claude expressed values starkly opposed to its training, such as “dominance” and “amorality.”

Anthropic suggests a likely cause: “The most likely explanation is that the conversations that were included in these clusters were from jailbreaks, where users have used special techniques to bypass the usual guardrails that govern the model’s behavior.”

Far from being solely a concern, this finding highlights a potential benefit: the value-observation method could serve as an early warning system for detecting attempts to misuse the AI.

The study also confirmed that, much like humans, Claude adapts its value expression based on the situation.

When users sought advice on romantic relationships, values like “healthy boundaries” and “mutual respect” were disproportionately emphasised. When asked to analyse controversial history, “historical accuracy” came strongly to the fore. This demonstrates a level of contextual sophistication beyond what static, pre-deployment tests might reveal.

Furthermore, Claude’s interaction with user-expressed values proved multifaceted:

  • Mirroring/strong support (28.2%): Claude often reflects or strongly endorses the values presented by the user (e.g., mirroring “authenticity”). While potentially fostering empathy, the researchers caution it could sometimes verge on sycophancy.
  • Reframing (6.6%): In some cases, especially when providing psychological or interpersonal advice, Claude acknowledges the user’s values but introduces alternative perspectives.
  • Strong resistance (3.0%): Occasionally, Claude actively resists user values. This typically occurs when users request unethical content or express harmful viewpoints (like moral nihilism). Anthropic posits these moments of resistance might reveal Claude’s “deepest, most immovable values,” akin to a person taking a stand under pressure.

Limitations and future directions

Anthropic is candid about the method’s limitations. Defining and categorising “values” is inherently complex and potentially subjective. Using Claude itself to power the categorisation might introduce bias towards its own operational principles.

This method is designed for monitoring AI behaviour post-deployment, requiring substantial real-world data and cannot replace pre-deployment evaluations. However, this is also a strength, enabling the detection of issues – including sophisticated jailbreaks – that only manifest during live interactions.

The research concludes that understanding the values AI models express is fundamental to the goal of AI alignment.

“AI models will inevitably have to make value judgments,” the paper states. “If we want those judgments to be congruent with our own values […] then we need to have ways of testing which values a model expresses in the real world.”

This work provides a powerful, data-driven approach to achieving that understanding. Anthropic has also released an open dataset derived from the study, allowing other researchers to further explore AI values in practice. This transparency marks a vital step in collectively navigating the ethical landscape of sophisticated AI.

See also: Google introduces AI reasoning control in Gemini 2.5 Flash

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post How does AI judge? Anthropic studies the values of Claude appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/how-does-ai-judge-anthropic-studies-values-of-claude/feed/ 0
DolphinGemma: Google AI model understands dolphin chatter https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/ https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/#respond Mon, 14 Apr 2025 14:18:49 +0000 https://www.artificialintelligence-news.com/?p=105315 Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication. The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations. Google, collaborating with engineers at […]

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication.

The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations.

Google, collaborating with engineers at the Georgia Institute of Technology and leveraging the field research of the Wild Dolphin Project (WDP), has unveiled DolphinGemma to help realise that goal.

Announced around National Dolphin Day, the foundational AI model represents a new tool in the effort to comprehend cetacean communication. Trained specifically to learn the structure of dolphin sounds, DolphinGemma can even generate novel, dolphin-like audio sequences.

Over decades, the Wild Dolphin Project – operational since 1985 – has run the world’s longest continuous underwater study of dolphins to develop a deep understanding of context-specific sounds, such as:

  • Signature “whistles”: Serving as unique identifiers, akin to names, crucial for interactions like mothers reuniting with calves.
  • Burst-pulse “squawks”: Commonly associated with conflict or aggressive encounters.
  • Click “buzzes”: Often detected during courtship activities or when dolphins chase sharks.

WDP’s ultimate goal is to uncover the inherent structure and potential meaning within these natural sound sequences, searching for the grammatical rules and patterns that might signify a form of language.

This long-term, painstaking analysis has provided the essential grounding and labelled data crucial for training sophisticated AI models like DolphinGemma.

DolphinGemma: The AI ear for cetacean sounds

Analysing the sheer volume and complexity of dolphin communication is a formidable task ideally suited for AI.

DolphinGemma, developed by Google, employs specialised audio technologies to tackle this. It uses the SoundStream tokeniser to efficiently represent dolphin sounds, feeding this data into a model architecture adept at processing complex sequences.

Based on insights from Google’s Gemma family of lightweight, open models (which share technology with the powerful Gemini models), DolphinGemma functions as an audio-in, audio-out system.

Fed with sequences of natural dolphin sounds from WDP’s extensive database, DolphinGemma learns to identify recurring patterns and structures. Crucially, it can predict the likely subsequent sounds in a sequence—much like human language models predict the next word.

With around 400 million parameters, DolphinGemma is optimised to run efficiently, even on the Google Pixel smartphones WDP uses for data collection in the field.

As WDP begins deploying the model this season, it promises to accelerate research significantly. By automatically flagging patterns and reliable sequences previously requiring immense human effort to find, it can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication.

The CHAT system and two-way interaction

While DolphinGemma focuses on understanding natural communication, a parallel project explores a different avenue: active, two-way interaction.

The CHAT (Cetacean Hearing Augmentation Telemetry) system – developed by WDP in partnership with Georgia Tech – aims to establish a simpler, shared vocabulary rather than directly translating complex dolphin language.

The concept relies on associating specific, novel synthetic whistles (created by CHAT, distinct from natural sounds) with objects the dolphins enjoy interacting with, like scarves or seaweed. Researchers demonstrate the whistle-object link, hoping the dolphins’ natural curiosity leads them to mimic the sounds to request the items.

As more natural dolphin sounds are understood through work with models like DolphinGemma, these could potentially be incorporated into the CHAT interaction framework.

Google Pixel enables ocean research

Underpinning both the analysis of natural sounds and the interactive CHAT system is crucial mobile technology. Google Pixel phones serve as the brains for processing the high-fidelity audio data in real-time, directly in the challenging ocean environment.

The CHAT system, for instance, relies on Google Pixel phones to:

  • Detect a potential mimic amidst background noise.
  • Identify the specific whistle used.
  • Alert the researcher (via underwater bone-conducting headphones) about the dolphin’s ‘request’.

This allows the researcher to respond quickly with the correct object, reinforcing the learned association. While a Pixel 6 initially handled this, the next generation CHAT system (planned for summer 2025) will utilise a Pixel 9, integrating speaker/microphone functions and running both deep learning models and template matching algorithms simultaneously for enhanced performance.

Google Pixel 9 phone that will be used for the next generation DolphinGemma CHAT system.

Using smartphones like the Pixel dramatically reduces the need for bulky, expensive custom hardware. It improves system maintainability, lowers power requirements, and shrinks the physical size. Furthermore, DolphinGemma’s predictive power integrated into CHAT could help identify mimics faster, making interactions more fluid and effective.

Recognising that breakthroughs often stem from collaboration, Google intends to release DolphinGemma as an open model later this summer. While trained on Atlantic spotted dolphins, its architecture holds promise for researchers studying other cetaceans, potentially requiring fine-tuning for different species’ vocal repertoires..

The aim is to equip researchers globally with powerful tools to analyse their own acoustic datasets, accelerating the collective effort to understand these intelligent marine mammals. We are shifting from passive listening towards actively deciphering patterns, bringing the prospect of bridging the communication gap between our species perhaps just a little closer.

See also: IEA: The opportunities and challenges of AI for global energy

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/feed/ 0
Deep Cogito open LLMs use IDA to outperform same size models https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/ https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/#respond Wed, 09 Apr 2025 08:03:15 +0000 https://www.artificialintelligence-news.com/?p=105246 Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence. The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each […]

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.

The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks”.

Impressively, the 70B model from Deep Cogito even surpasses the performance of the recently released Llama 4 109B Mixture-of-Experts (MoE) model.   

Iterated Distillation and Amplification (IDA)

Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). 

Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.

The IDA process involves two key steps iterated repeatedly:

  • Amplification: Using more computation to enable the model to derive better solutions or capabilities, akin to advanced reasoning techniques.
  • Distillation: Internalising these amplified capabilities back into the model’s parameters.

Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.

“When we study superintelligent systems,” the research notes, referencing successes like AlphaGo, “we find two key ingredients enabled this breakthrough: Advanced Reasoning and Iterative Self-Improvement”. IDA is presented as a way to integrate both into LLM training.

Deep Cogito claims IDA is efficient, stating the new models were developed by a small team in approximately 75 days. They also highlight IDA’s potential scalability compared to methods like Reinforcement Learning from Human Feedback (RLHF) or standard distillation from larger models.

As evidence, the company points to their 70B model outperforming Llama 3.3 70B (distilled from a 405B model) and Llama 4 Scout 109B (distilled from a 2T parameter model).

Capabilities and performance of Deep Cogito models

The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases.

A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.

Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes.

Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

For instance, the Cogito 70B model achieves 91.73% on MMLU in standard mode (+6.40% vs Llama 3.3 70B) and 91.00% in thinking mode (+4.40% vs Deepseek R1 Distill 70B). Livebench scores also show improvements.

Here are benchmarks of 14B models for a medium-sized comparison:

Benchmark comparison of medium 14B size large language models from Deep Cogito compared to Alibaba Qwen and DeepSeek R1

While acknowledging benchmarks don’t fully capture real-world utility, Deep Cogito expresses confidence in practical performance.

This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.

(Photo by Pietro Mattia)

See also: Alibaba Cloud targets global AI growth with new models and tools

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/feed/ 0
Alibaba Cloud targets global AI growth with new models and tools https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/ https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/#respond Tue, 08 Apr 2025 17:56:13 +0000 https://www.artificialintelligence-news.com/?p=105235 Alibaba Cloud has expanded its AI portfolio for global customers with a raft of new models, platform enhancements, and Software-as-a-Service (SaaS) tools. The announcements, made during its Spring Launch 2025 online event, underscore the drive by Alibaba to accelerate AI innovation and adoption on a global scale. The digital technology and intelligence arm of Alibaba […]

The post Alibaba Cloud targets global AI growth with new models and tools appeared first on AI News.

]]>
Alibaba Cloud has expanded its AI portfolio for global customers with a raft of new models, platform enhancements, and Software-as-a-Service (SaaS) tools.

The announcements, made during its Spring Launch 2025 online event, underscore the drive by Alibaba to accelerate AI innovation and adoption on a global scale.

The digital technology and intelligence arm of Alibaba is focusing on meeting increasing demand for AI-driven digital transformation worldwide.

Selina Yuan, President of International Business at Alibaba Cloud Intelligence, said: “We are launching a series of Platform-as-a-Service(PaaS) and AI capability updates to meet the growing demand for digital transformation from across the globe.

“These upgrades allow us to deliver even more secure and high-performance services that empower businesses to scale and innovate in an AI-driven world.”

Alibaba expands access to foundational AI models

Central to the announcement is the broadened availability of Alibaba Cloud’s proprietary Qwen large language model (LLM) series for international clients, initially accessible via its Singapore availability zones.

This includes several specialised models:

  • Qwen-Max: A large-scale Mixture of Experts (MoE) model.
  • QwQ-Plus: An advanced reasoning model designed for complex analytical tasks, sophisticated question answering, and expert-level mathematical problem-solving.
  • QVQ-Max: A visual reasoning model capable of handling complex multimodal problems, supporting visual input and chain-of-thought output for enhanced accuracy.
  • Qwen2.5-Omni-7b: An end-to-end multimodal model.

These additions provide international businesses with more powerful and diverse tools for developing sophisticated AI applications.

Platform enhancements power AI scale

To support these advanced models, Alibaba Cloud’s Platform for AI (PAI) received significant upgrades aimed at delivering scalable, cost-effective, and user-friendly generative AI solutions.

Key enhancements include the introduction of distributed inference capabilities within the PAI-Elastic Algorithm Service (EAS). Utilising a multi-node architecture, this addresses the computational demands of super-large models – particularly those employing MoE structures or requiring ultra-long-text processing – to overcome limitations inherent in traditional single-node setups.

Furthermore, PAI-EAS now features a prefill-decode disaggregation function designed to boost performance and reduce operational costs.

Alibaba Cloud reported impressive results when deploying this with the Qwen2.5-72B model, achieving a 92% increase in concurrency and a 91% boost in tokens per second (TPS).

The PAI-Model Gallery has also been refreshed, now offering nearly 300 open-source models—including the complete range of Alibaba Cloud’s own open-source Qwen and Wan series. These are accessible via a no-code deployment and management interface.

Additional new PAI-Model Gallery features – like model evaluation and model distillation (transferring knowledge from large to smaller, more cost-effective models) – further enhance its utility.

Alibaba integrates AI into data management

Alibaba Cloud’s flagship cloud-native relational database, PolarDB, now incorporates native AI inference powered by Qwen.

PolarDB’s in-database machine learning capability eliminates the need to move data for inference workflows, which significantly cuts processing latency while improving efficiency and data security.

The feature is optimised for text-centric tasks such as developing conversational Retrieval-Augmented Generation (RAG) agents, generating text embeddings, and performing semantic similarity searches.

Additionally, the company’s data warehouse, AnalyticDB, is now integrated into Alibaba Cloud’s generative AI development platform Model Studio.

This integration serves as the recommended vector database for RAG solutions. This allows organisations to connect their proprietary knowledge bases directly with AI models on the platform to streamline the creation of context-aware applications.

New SaaS tools for industry transformation

Beyond infrastructure and platform layers, Alibaba Cloud introduced two new SaaS AI tools:

  • AI Doc: An intelligent document processing tool using LLMs to parse diverse documents (reports, forms, manuals) efficiently. It extracts specific information and can generate tailored reports, such as ESG reports when integrated with Alibaba Cloud’s Energy Expert sustainability solution.
  • Smart Studio: An AI-powered content creation platform supporting text-to-image, image-to-image, and text-to-video generation. It aims to enhance marketing and creative outputs in sectors like e-commerce, gaming, and entertainment, enabling features like virtual try-ons or generating visuals from text descriptions.

All these developments follow Alibaba’s announcement in February of a $53 billion investment over the next three years dedicated to advancing its cloud computing and AI infrastructure.

This colossal investment, noted as exceeding the company’s total AI and cloud expenditure over the previous decade, highlights a deep commitment to AI-driven growth and solidifies its position as a major global cloud provider.

“As cloud and AI become essential for global growth, we are committed to enhancing our core product offerings to address our customers’ evolving needs,” concludes Yuan.

See also: Amazon Nova Act: A step towards smarter, web-native AI agents

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Cloud targets global AI growth with new models and tools appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/feed/ 0
Midjourney V7: Faster AI image generation https://www.artificialintelligence-news.com/news/midjourney-v7-faster-ai-image-generation/ https://www.artificialintelligence-news.com/news/midjourney-v7-faster-ai-image-generation/#respond Fri, 04 Apr 2025 16:34:12 +0000 https://www.artificialintelligence-news.com/?p=105188 Midjourney has announced the alpha release of its V7 image generation model for testing by the AI community. The new model packs improvements in text prompt understanding, image quality, and feature coherence. “V7 is an amazing model. It’s much smarter with text prompts, image prompts look fantastic, image quality is noticeably higher with beautiful textures, […]

The post Midjourney V7: Faster AI image generation appeared first on AI News.

]]>
Midjourney has announced the alpha release of its V7 image generation model for testing by the AI community. The new model packs improvements in text prompt understanding, image quality, and feature coherence.

“V7 is an amazing model. It’s much smarter with text prompts, image prompts look fantastic, image quality is noticeably higher with beautiful textures, and bodies, hands, and objects of all kinds have significantly better coherence on all details,” Midjourney explained.

A key innovation in V7 is the default activation of model personalisation. Users must initially unlock this feature, a process that takes approximately five minutes. This personalisation can be toggled on or off at any time and is intended to significantly improve the AI’s ability to interpret user desires and aesthetic preferences. Midjourney believes this feature sets a new benchmark for understanding user intent.

Midjourney is also introducing a feature alongside the V7 image generation model called ‘Draft Mode,’ which promises to generate images ten times faster and at half the cost.

This increased speed has enabled Midjourney to implement a unique “conversational mode” on its web interface. Users can now instruct the system to make changes, such as replacing a cat with an owl or altering the time of day to nighttime, and the AI will automatically adjust the prompt and initiate a new image generation task.

Draft Mode also incorporates voice input functionality. By pressing the microphone button, users can verbally articulate their ideas and observe the images as they are generated in near real-time:

Screenshot of voice input functionality in the Draft Mode when using the Midjourney V7 AI image generation model.

Midjourney believes that Draft Mode offers an unprecedented method for refining creative concepts. If a generated image is appealing, users can select the ‘enhance’ or ‘vary’ options to re-render it at full quality. While draft images are of a lower quality compared to the standard mode, their behaviour and aesthetic characteristics remain consistent.

The V7 image generation model from Midjourney will initially be available in two speed modes: Turbo and Relax. The standard speed mode is currently undergoing further optimisation and is expected to be released shortly. Midjourney has clarified that Turbo jobs will cost twice as much as a standard job, while draft jobs will cost half the amount.

The company also provided updates on other functionalities. Features such as upscaling, editing, and retexturing will initially revert to using the V6 model, with updates planned for the future. Functionality for mood boards and SREF is currently operational and performance is expected to improve with subsequent updates.

Looking to the near future, Midjourney has outlined an active development schedule. Users can expect new features every one to two weeks for the next 60 days. A significant upcoming feature will be a new V7 character and object reference capability.

Finally, Midjourney has advised users that V7 is an entirely new model with its own unique strengths and potential weaknesses. They encourage experimentation and feedback on its capabilities, reminding users that it may require different prompting techniques compared to previous versions.

(Image credit: Midjourney)

See also: Tony Blair Institute AI copyright report sparks backlash

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Midjourney V7: Faster AI image generation appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/midjourney-v7-faster-ai-image-generation/feed/ 0
Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/ https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/#respond Wed, 26 Mar 2025 17:17:26 +0000 https://www.artificialintelligence-news.com/?p=105017 Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date. The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks. According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 […]

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date.

The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks.

According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 models are “thinking models”.  This signifies their capability to reason through their thoughts before generating a response, leading to enhanced performance and improved accuracy.    

The capacity for “reasoning” extends beyond mere classification and prediction, Kavukcuoglu explains. It encompasses the system’s ability to analyse information, deduce logical conclusions, incorporate context and nuance, and ultimately, make informed decisions.

DeepMind has been exploring methods to enhance AI’s intelligence and reasoning capabilities for some time, employing techniques such as reinforcement learning and chain-of-thought prompting. This groundwork led to the recent introduction of their first thinking model, Gemini 2.0 Flash Thinking.    

“Now, with Gemini 2.5,” says Kavukcuoglu, “we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training.”

Google plans to integrate these thinking capabilities directly into all of its future models—enabling them to tackle more complex problems and support more capable, context-aware agents.    

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 Pro Experimental is positioned as DeepMind’s most advanced model for handling intricate tasks. As of writing, it has secured the top spot on the LMArena leaderboard – a key metric for assessing human preferences – by a significant margin, demonstrating a highly capable model with a high-quality style:

Screenshot of LMArena leaderboard where the new Gemini 2.5 Pro Experimental AI model from Google DeepMind has just taken the top spot.

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Gemini 2.5 Pro has demonstrated state-of-the-art performance across various benchmarks that demand advanced reasoning.

Notably, it leads in maths and science benchmarks – such as GPQA and AIME 2025 – without relying on test-time techniques that increase costs, like majority voting. It also achieved a state-of-the-art score of 18.8% on Humanity’s Last Exam, a dataset designed by subject matter experts to evaluate the human frontier of knowledge and reasoning.

DeepMind has placed significant emphasis on coding performance, and Gemini 2.5 represents a substantial leap forward compared to its predecessor, 2.0, with further improvements in the pipeline. 2.5 Pro excels in creating visually compelling web applications and agentic code applications, as well as code transformation and editing.

On SWE-Bench Verified, the industry standard for agentic code evaluations, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. The model’s reasoning capabilities also enable it to create a video game by generating executable code from a single-line prompt.

Building on its predecessors’ strengths

Gemini 2.5 builds upon the core strengths of earlier Gemini models, including native multimodality and a long context window. 2.5 Pro launches with a one million token context window, with plans to expand this to two million tokens soon. This enables the model to comprehend vast datasets and handle complex problems from diverse information sources, spanning text, audio, images, video, and even entire code repositories.    

Developers and enterprises can now begin experimenting with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced users can also access it via the model dropdown on desktop and mobile platforms. The model will be rolled out on Vertex AI in the coming weeks.    

Google DeepMind encourages users to provide feedback, which will be used to further enhance Gemini’s capabilities.

(Photo by Anshita Nair)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/feed/ 0
DeepSeek V3-0324 tops non-reasoning AI models in open-source first https://www.artificialintelligence-news.com/news/deepseek-v3-0324-tops-non-reasoning-ai-models-open-source-first/ https://www.artificialintelligence-news.com/news/deepseek-v3-0324-tops-non-reasoning-ai-models-open-source-first/#respond Tue, 25 Mar 2025 13:10:20 +0000 https://www.artificialintelligence-news.com/?p=104986 DeepSeek V3-0324 has become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index in a landmark achievement for open-source AI. The new model advanced seven points in the benchmark to surpass proprietary counterparts such as Google’s Gemini 2.0 Pro, Anthropic’s Claude 3.7 Sonnet, and Meta’s Llama 3.3 70B. While V3-0324 trails behind reasoning models, […]

The post DeepSeek V3-0324 tops non-reasoning AI models in open-source first appeared first on AI News.

]]>
DeepSeek V3-0324 has become the highest-scoring non-reasoning model on the Artificial Analysis Intelligence Index in a landmark achievement for open-source AI.

The new model advanced seven points in the benchmark to surpass proprietary counterparts such as Google’s Gemini 2.0 Pro, Anthropic’s Claude 3.7 Sonnet, and Meta’s Llama 3.3 70B.

While V3-0324 trails behind reasoning models, including DeepSeek’s own R1 and offerings from OpenAI and Alibaba, the achievement highlights the growing viability of open-source solutions in latency-sensitive applications where immediate responses are critical.

DeepSeek V3-0324 represents a new era for open-source AI

Non-reasoning models – which generate answers instantly without deliberative “thinking” phases – are essential for real-time use cases like chatbots, customer service automation, and live translation. DeepSeek’s latest iteration now sets the standard for these applications, eclipsing even leading proprietary tools.

Benchmark results of DeepSeek V3-0324 in the Artificial Analysis Intelligence Index showing a landmark achievement for non-reasoning open-source AI models.

“This is the first time an open weights model is the leading non-reasoning model, a milestone for open source,” states Artificial Analysis. The model’s performance edges it closer to proprietary reasoning models, though the latter remain superior for tasks requiring complex problem-solving.

DeepSeek V3-0324 retains most specifications from its December 2024 predecessor, including:  

  • 128k context window (capped at 64k via DeepSeek’s API)
  • 671 billion total parameters, necessitating over 700GB of GPU memory for FP8 precision
  • 37 billion active parameters
  • Text-only functionality (no multimodal support) 
  • MIT License

“Still not something you can run at home!” Artificial Analysis quips, emphasising its enterprise-grade infrastructure requirements.

Open-source AI is bringing the heat

While proprietary reasoning models like DeepSeek R1 maintain dominance in the broader Intelligence Index, the gap is narrowing.

Three months ago, DeepSeek V3 nearly matched Anthropic’s and Google’s proprietary models but fell short of surpassing them. Today, the updated V3-0324 not only leads open-source alternatives but also outperforms all proprietary non-reasoning rivals.

“This release is arguably even more impressive than R1,” says Artificial Analysis.

DeepSeek’s progress signals a shift in the AI sector, where open-source frameworks increasingly compete with closed systems. For developers and enterprises, the MIT-licensed V3-0324 offers a powerful, adaptable tool—though its computational costs may limit accessibility.

“DeepSeek are now driving the frontier of non-reasoning open weights models,” declares Artificial Analysis.

With R2 on the horizon, the community awaits another potential leap in AI performance.

(Photo by Paul Hanaoka)

See also: Hugging Face calls for open-source focus in the AI Action Plan

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepSeek V3-0324 tops non-reasoning AI models in open-source first appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/deepseek-v3-0324-tops-non-reasoning-ai-models-open-source-first/feed/ 0
LG EXAONE Deep is a maths, science, and coding buff https://www.artificialintelligence-news.com/news/lg-exaone-deep-maths-science-and-coding-buff/ https://www.artificialintelligence-news.com/news/lg-exaone-deep-maths-science-and-coding-buff/#respond Tue, 18 Mar 2025 12:49:26 +0000 https://www.artificialintelligence-news.com/?p=104905 LG AI Research has unveiled EXAONE Deep, a reasoning model that excels in complex problem-solving across maths, science, and coding. The company highlighted the global challenge in creating advanced reasoning models, noting that currently, only a handful of organisations with foundational models are actively pursuing this complex area. EXAONE Deep aims to compete directly with […]

The post LG EXAONE Deep is a maths, science, and coding buff appeared first on AI News.

]]>
LG AI Research has unveiled EXAONE Deep, a reasoning model that excels in complex problem-solving across maths, science, and coding.

The company highlighted the global challenge in creating advanced reasoning models, noting that currently, only a handful of organisations with foundational models are actively pursuing this complex area. EXAONE Deep aims to compete directly with these leading models, showcasing a competitive level of reasoning ability.

LG AI Research has focused its efforts on dramatically improving EXAONE Deep’s reasoning capabilities in core domains. The model also demonstrates a strong ability to understand and apply knowledge across a broader range of subjects.

The performance benchmarks released by LG AI Research are impressive:

  • Maths: The EXAONE Deep 32B model outperformed a competing model, despite being only 5% of its size, in a demanding mathematics benchmark. Furthermore, the 7.8B and 2.4B versions achieved first place in all major mathematics benchmarks for their respective model sizes.
  • Science and coding: In these areas, the EXAONE Deep models (7.8B and 2.4B) have secured the top spot across all major benchmarks.
  • MMLU (Massive Multitask Language Understanding): The 32B model achieved a score of 83.0 on the MMLU benchmark, which LG AI Research claims is the best performance among domestic Korean models.

The capabilities of the EXAONE Deep 32B model have already garnered international recognition.

Shortly after its release, it was included in the ‘Notable AI Models’ list by US-based non-profit research organisation Epoch AI. This listing places EXAONE Deep alongside its predecessor, EXAONE 3.5, making LG the only Korean entity with models featured on this prestigious list in the past two years.

Maths prowess

EXAONE Deep has demonstrated exceptional mathematical reasoning skills across its various model sizes (32B, 7.8B, and 2.4B). In assessments based on the 2025 academic year’s mathematics curriculum, all three models outperformed global reasoning models of comparable size.

The 32B model achieved a score of 94.5 in a general mathematics competency test and 90.0 in the American Invitational Mathematics Examination (AIME) 2024, a qualifying exam for the US Mathematical Olympiad.

In the AIME 2025, the 32B model matched the performance of DeepSeek-R1—a significantly larger 671B model. This result showcases EXAONE Deep’s efficient learning and strong logical reasoning abilities, particularly when tackling challenging mathematical problems.

The smaller 7.8B and 2.4B models also achieved top rankings in major benchmarks for lightweight and on-device models, respectively. The 7.8B model scored 94.8 on the MATH-500 benchmark and 59.6 on AIME 2025, while the 2.4B model achieved scores of 92.3 and 47.9 in the same evaluations.

Science and coding excellence

EXAONE Deep has also showcased remarkable capabilities in professional science reasoning and software coding.

The 32B model scored 66.1 on the GPQA Diamond test, which assesses problem-solving skills in doctoral-level physics, chemistry, and biology. In the LiveCodeBench evaluation, which measures coding proficiency, the model achieved a score of 59.5, indicating its potential for high-level applications in these expert domains.

The 7.8B and 2.4B models continued this trend of strong performance, both securing first place in the GPQA Diamond and LiveCodeBench benchmarks within their respective size categories. This achievement builds upon the success of the EXAONE 3.5 2.4B model, which previously topped Hugging Face’s LLM Readerboard in the edge division.

Enhanced general knowledge

Beyond its specialised reasoning capabilities, EXAONE Deep has also demonstrated improved performance in general knowledge understanding.

The 32B model achieved an impressive score of 83.0 on the MMLU benchmark, positioning it as the top-performing domestic model in this comprehensive evaluation. This indicates that EXAONE Deep’s reasoning enhancements extend beyond specific domains and contribute to a broader understanding of various subjects.

LG AI Research believes that EXAONE Deep’s reasoning advancements represent a leap towards a future where AI can tackle increasingly complex problems and contribute to enriching and simplifying human lives through continuous research and innovation.

See also: Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post LG EXAONE Deep is a maths, science, and coding buff appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/lg-exaone-deep-maths-science-and-coding-buff/feed/ 0
Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/ https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/#respond Mon, 17 Mar 2025 10:07:40 +0000 https://www.artificialintelligence-news.com/?p=104813 Baidu has launched its latest foundation AI models, ERNIE 4.5 and ERNIE X1, and is offering them free for individuals through ERNIE Bot. The company says that it aims to “push the boundaries of multimodal and reasoning models” by providing advanced capabilities at a more accessible price point. Baidu plans to integrate these models into […]

The post Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 appeared first on AI News.

]]>
Baidu has launched its latest foundation AI models, ERNIE 4.5 and ERNIE X1, and is offering them free for individuals through ERNIE Bot.

The company says that it aims to “push the boundaries of multimodal and reasoning models” by providing advanced capabilities at a more accessible price point. Baidu plans to integrate these models into its broader product ecosystem, including Baidu Search and the Wenxiaoyan app, to enhance user experiences.

ERNIE 4.5, Baidu’s “new generation native multimodal foundation model,” features collaborative optimisation across multiple modalities, resulting in improved multimodal comprehension. It enhances language understanding, generation, reasoning, and memory, while also improving “hallucination prevention, logical reasoning, and coding abilities.”

A key feature of ERNIE 4.5 is its ability to integrate and understand various content types, including text, images, audio, and video. It can also grasp complex content such as internet memes and satirical cartoons, showcasing strong contextual awareness.

Baidu claims ERNIE 4.5 outperforms GPT-4.5 in several benchmarks while being significantly more affordable, priced at “just 1% of GPT-4.5.”

Benchmark comparing the ERNIE 4.5 foundation AI model from Baidu to rivals such as GPT-4.5, DeepSeek, and others.

The model’s advancements are attributed to technologies like ‘FlashMask’ dynamic attention masking, heterogeneous multimodal mixture-of-experts, spatiotemporal representation compression, knowledge-centric training data construction, and self-feedback enhanced post-training.

ERNIE X1, Baidu’s new deep-thinking reasoning model, focuses on enhanced understanding, planning, reflection, and evolution. As Baidu’s “first multimodal deep-thinking reasoning model capable of tool use,” X1 excels in areas like Chinese knowledge Q&A, literary creation, and complex calculations.

The model’s tool use includes features like advanced search, document Q&A, image understanding, AI image generation, and webpage reading.

ERNIE X1’s capabilities are supported by technologies such as the progressive reinforcement learning method, end-to-end training approach integrating chains of thought and action, and a unified multi-faceted reward system.

For enterprise users and developers, ERNIE 4.5 is accessible through APIs on Baidu AI Cloud’s Qianfan platform, with competitive pricing structures. ERNIE X1 will soon be available on the same platform.

Baidu anticipates that “2025 is set to be an important year for the development and iteration of large language models and technologies” and plans to continue investing in AI, data centres, and cloud infrastructure to advance its AI capabilities and develop next-generation models.

See also: OpenAI and Google call for US government action to secure AI lead

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/feed/ 0
Gemma 3: Google launches its latest open AI models https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/ https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/#respond Wed, 12 Mar 2025 09:08:41 +0000 https://www.artificialintelligence-news.com/?p=104758 Google has launched Gemma 3, the latest version of its family of open AI models that aim to set a new benchmark for AI accessibility. Built upon the foundations of the company’s Gemini 2.0 models, Gemma 3 is engineered to be lightweight, portable, and adaptable—enabling developers to create AI applications across a wide range of […]

The post Gemma 3: Google launches its latest open AI models appeared first on AI News.

]]>
Google has launched Gemma 3, the latest version of its family of open AI models that aim to set a new benchmark for AI accessibility.

Built upon the foundations of the company’s Gemini 2.0 models, Gemma 3 is engineered to be lightweight, portable, and adaptable—enabling developers to create AI applications across a wide range of devices.  

This release comes hot on the heels of Gemma’s first birthday, an anniversary underscored by impressive adoption metrics. Gemma models have achieved more than 100 million downloads and spawned the creation of over 60,000 community-built variants. Dubbed the “Gemmaverse,” this ecosystem signals a thriving community aiming to democratise AI.  

“The Gemma family of open models is foundational to our commitment to making useful AI technology accessible,” explained Google.

Gemma 3: Features and capabilities

Gemma 3 models are available in various sizes – 1B, 4B, 12B, and 27B parameters – allowing developers to select a model tailored to their specific hardware and performance requirements. These models promise faster execution, even on modest computational setups, without compromising functionality or accuracy.

Here are some of the standout features of Gemma 3:  

  • Single-accelerator performance: Gemma 3 sets a new benchmark for single-accelerator models. In preliminary human preference evaluations on the LMArena leaderboard, Gemma 3 outperformed rivals including Llama-405B, DeepSeek-V3, and o3-mini.
  • Multilingual support across 140 languages: Catering to diverse audiences, Gemma 3 comes with pretrained capabilities for over 140 languages. Developers can create applications that connect with users in their native tongues, expanding the global reach of their projects.  
  • Sophisticated text and visual analysis: With advanced text, image, and short video reasoning capabilities, developers can implement Gemma 3 to craft interactive and intelligent applications—addressing an array of use cases from content analysis to creative workflows.  
  • Expanded context window: Offering a 128k-token context window, Gemma 3 can analyse and synthesise large datasets, making it ideal for applications requiring extended content comprehension.
  • Function calling for workflow automation: With function calling support, developers can utilise structured outputs to automate processes and build agentic AI systems effortlessly.
  • Quantised models for lightweight efficiency: Gemma 3 introduces official quantised versions, significantly reducing model size while preserving output accuracy—a bonus for developers optimising for mobile or resource-constrained environments.

The model’s performance advantages are clearly illustrated in the Chatbot Arena Elo Score leaderboard. Despite requiring just a single NVIDIA H100 GPU, the flagship 27B version of Gemma 3 ranks among the top chatbots, achieving an Elo score of 1338. Many competitors demand up to 32 GPUs to deliver comparable performance.

Google Gemma 3 performance illustrated on benchmark against both open source and proprietary AI models in the Chatbot Arena Elo Score leaderboard.

One of Gemma 3’s strengths lies in its adaptability within developers’ existing workflows.  

  • Diverse tooling compatibility: Gemma 3 supports popular AI libraries and tools, including Hugging Face Transformers, JAX, PyTorch, and Google AI Edge. For optimised deployment, platforms such as Vertex AI or Google Colab are ready to help developers get started with minimal hassle.  
  • NVIDIA optimisations: Whether using entry-level GPUs like Jetson Nano or cutting-edge hardware like Blackwell chips, Gemma 3 ensures maximum performance, further simplified through the NVIDIA API Catalog.  
  • Broadened hardware support: Beyond NVIDIA, Gemma 3 integrates with AMD GPUs via the ROCm stack and supports CPU execution with Gemma.cpp for added versatility.

For immediate experiments, users can access Gemma 3 models via platforms such as Hugging Face and Kaggle, or take advantage of the Google AI Studio for in-browser deployment.

Advancing responsible AI  

“We believe open models require careful risk assessment, and our approach balances innovation with safety,” explains Google.  

Gemma 3’s team adopted stringent governance policies, applying fine-tuning and robust benchmarking to align the model with ethical guidelines. Given the models enhanced capabilities in STEM fields, it underwent specific evaluations to mitigate risks of misuse, such as generating harmful substances.

Google is pushing for collective efforts within the industry to create proportionate safety frameworks for increasingly powerful models.

To play its part, Google is launching ShieldGemma 2. The 4B image safety checker leverages Gemma 3’s architecture and outputs safety labels across categories such as dangerous content, explicit material, and violence. While offering out-of-the-box solutions, developers can customise the tool to meet tailored safety requirements.

The “Gemmaverse” isn’t just a technical ecosystem, it’s a community-driven movement. Projects such as AI Singapore’s SEA-LION v3, INSAIT’s BgGPT, and Nexa AI’s OmniAudio are testament to the power of collaboration within this ecosystem.  

To bolster academic research, Google has also introduced the Gemma 3 Academic Program. Researchers can apply for $10,000 worth of Google Cloud credits to accelerate their AI-centric projects. Applications open today and remain available for four weeks.  

With its accessibility, capabilities, and widespread compatibility, Gemma 3 makes a strong case for becoming a cornerstone in the AI development community.

(Image credit: Google)

See also: Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemma 3: Google launches its latest open AI models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/feed/ 0
Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase https://www.artificialintelligence-news.com/news/alibaba-qwen-qwq-32b-scaled-reinforcement-learning-showcase/ https://www.artificialintelligence-news.com/news/alibaba-qwen-qwq-32b-scaled-reinforcement-learning-showcase/#respond Thu, 06 Mar 2025 09:14:13 +0000 https://www.artificialintelligence-news.com/?p=104695 The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models. The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, […]

The post Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase appeared first on AI News.

]]>
The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.

The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, and adapt its reasoning based on environmental feedback.

“Scaling RL has the potential to enhance model performance beyond conventional pretraining and post-training methods,” the team stated. “Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.”

QwQ-32B achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testament to the effectiveness of RL when applied to robust foundation models pretrained on extensive world knowledge. This remarkable outcome underscores the potential of RL to bridge the gap between model size and performance.

The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities.

The results highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.

Benchmark results:

  • AIME24: QwQ-32B achieved 79.5, slightly behind DeepSeek-R1-6718’s 79.8, but significantly ahead of OpenAl-o1-mini’s 63.6 and the distilled models.
  • LiveCodeBench: QwQ-32B scored 63.4, again closely matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled models and OpenAl-o1-mini’s 53.8.
  • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled models and OpenAl-o1-mini’s 57.5.
  • IFEval: QwQ-32B scored 83.9, very close to DeepSeek-R1-6718’s 83.3, and leading the distilled models and OpenAl-o1-mini’s 59.1.
  • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled models and OpenAl-o1-mini’s 49.3.

The Qwen team’s approach involved a cold-start checkpoint and a multi-stage RL process driven by outcome-based rewards. The initial stage focused on scaling RL for math and coding tasks, utilising accuracy verifiers and code execution servers. The second stage expanded to general capabilities, incorporating rewards from general reward models and rule-based verifiers.

“We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding,” the team explained.

QwQ-32B is open-weight and available on Hugging Face and ModelScope under the Apache 2.0 license, and is also accessible via Qwen Chat. The Qwen team views this as an initial step in scaling RL to enhance reasoning capabilities and aims to further explore the integration of agents with RL for long-horizon reasoning.

“As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI),” the team stated.

See also: Deepgram Nova-3 Medical: AI speech model cuts healthcare transcription errors

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/alibaba-qwen-qwq-32b-scaled-reinforcement-learning-showcase/feed/ 0