reasoning Archives - AI News

Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost

Ryan Daws — Fri, 25 Apr 2025 12:28:01 +0000

Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions.

Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare.

Baidu ERNIE X1 Turbo: Deep reasoning meets cost efficiency

Positioned as a deep-thinking reasoning model, ERNIE X1 Turbo tackles complex tasks requiring sophisticated understanding. It enters a competitive field, claiming superior performance in some benchmarks against rivals like DeepSeek R1, V3, and OpenAI o1:

Key to X1 Turbo’s enhanced capabilities is an advanced “chain of thought” process, enabling more structured and logical problem-solving.

Furthermore, ERNIE X1 Turbo boasts improved multimodal functions – the ability to understand and process information beyond just text, potentially including images or other data types – alongside refined tool utilisation abilities. This makes it particularly well-suited for nuanced applications such as literary creation, complex logical reasoning challenges, code generation, and intricate instruction following.

ERNIE X1 Turbo achieves this performance while undercutting competitor pricing. Input token costs start at $0.14 per million tokens, with output tokens priced at $0.55 per million. This pricing structure is approximately 25% of DeepSeek R1.

Baidu ERNIE 4.5 Turbo: Multimodal muscle at a fraction of the cost

Sharing the spotlight is ERNIE 4.5 Turbo, which focuses on delivering upgraded multimodal features and significantly faster response times compared to its non-Turbo counterpart. The emphasis here is on providing a versatile, responsive AI experience while slashing operational costs.

The model achieves an 80% price reduction compared to the original ERNIE 4.5 with input set at $0.11 per million tokens and output at $0.44 per million tokens. This represents roughly 40% of the cost of the latest version of DeepSeek V3, again highlighting a deliberate strategy to attract users through cost-effectiveness.

Performance benchmarks further bolster its credentials. In multiple tests evaluating both multimodal and text capabilities, Baidu ERNIE 4.5 Turbo outperforms OpenAI’s highly-regarded GPT-4o model.

In multimodal capability assessments, ERNIE 4.5 Turbo achieved an average score of 77.68 to surpass GPT-4o’s score of 72.76 in the same tests.

While benchmark results always require careful interpretation, this suggests ERNIE 4.5 Turbo is a serious contender for tasks involving an integrated understanding of different data types.

Baidu continues to shake up the AI marketplace

The launch of ERNIE X1 Turbo and 4.5 Turbo signifies a growing trend in the AI sector: the democratisation of high-end capabilities. While foundational models continue to push the boundaries of performance, there is increasing demand for models that balance power with accessibility and affordability.

By lowering the price points for models with sophisticated reasoning and multimodal features, the Baidu ERNIE Turbo series could enable a wider range of developers and businesses to integrate advanced AI into their applications.

This competitive pricing puts pressure on established players like OpenAI and Anthropic, as well as emerging competitors like DeepSeek, potentially leading to further price adjustments across the market.

(Image Credit: Alpha Photo under CC BY-NC 2.0 license)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date

Ryan Daws — Wed, 26 Mar 2025 17:17:26 +0000

Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date.

The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks.

According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 models are “thinking models”. This signifies their capability to reason through their thoughts before generating a response, leading to enhanced performance and improved accuracy.

The capacity for “reasoning” extends beyond mere classification and prediction, Kavukcuoglu explains. It encompasses the system’s ability to analyse information, deduce logical conclusions, incorporate context and nuance, and ultimately, make informed decisions.

DeepMind has been exploring methods to enhance AI’s intelligence and reasoning capabilities for some time, employing techniques such as reinforcement learning and chain-of-thought prompting. This groundwork led to the recent introduction of their first thinking model, Gemini 2.0 Flash Thinking.

“Now, with Gemini 2.5,” says Kavukcuoglu, “we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training.”

Google plans to integrate these thinking capabilities directly into all of its future models—enabling them to tackle more complex problems and support more capable, context-aware agents.

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 Pro Experimental is positioned as DeepMind’s most advanced model for handling intricate tasks. As of writing, it has secured the top spot on the LMArena leaderboard – a key metric for assessing human preferences – by a significant margin, demonstrating a highly capable model with a high-quality style:

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Gemini 2.5 Pro has demonstrated state-of-the-art performance across various benchmarks that demand advanced reasoning.

Notably, it leads in maths and science benchmarks – such as GPQA and AIME 2025 – without relying on test-time techniques that increase costs, like majority voting. It also achieved a state-of-the-art score of 18.8% on Humanity’s Last Exam, a dataset designed by subject matter experts to evaluate the human frontier of knowledge and reasoning.

DeepMind has placed significant emphasis on coding performance, and Gemini 2.5 represents a substantial leap forward compared to its predecessor, 2.0, with further improvements in the pipeline. 2.5 Pro excels in creating visually compelling web applications and agentic code applications, as well as code transformation and editing.

On SWE-Bench Verified, the industry standard for agentic code evaluations, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. The model’s reasoning capabilities also enable it to create a video game by generating executable code from a single-line prompt.

Building on its predecessors’ strengths

Gemini 2.5 builds upon the core strengths of earlier Gemini models, including native multimodality and a long context window. 2.5 Pro launches with a one million token context window, with plans to expand this to two million tokens soon. This enables the model to comprehend vast datasets and handle complex problems from diverse information sources, spanning text, audio, images, video, and even entire code repositories.

Developers and enterprises can now begin experimenting with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced users can also access it via the model dropdown on desktop and mobile platforms. The model will be rolled out on Vertex AI in the coming weeks.

Google DeepMind encourages users to provide feedback, which will be used to further enhance Gemini’s capabilities.

(Photo by Anshita Nair)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

LG EXAONE Deep is a maths, science, and coding buff

Ryan Daws — Tue, 18 Mar 2025 12:49:26 +0000

LG AI Research has unveiled EXAONE Deep, a reasoning model that excels in complex problem-solving across maths, science, and coding.

The company highlighted the global challenge in creating advanced reasoning models, noting that currently, only a handful of organisations with foundational models are actively pursuing this complex area. EXAONE Deep aims to compete directly with these leading models, showcasing a competitive level of reasoning ability.

LG AI Research has focused its efforts on dramatically improving EXAONE Deep’s reasoning capabilities in core domains. The model also demonstrates a strong ability to understand and apply knowledge across a broader range of subjects.

The performance benchmarks released by LG AI Research are impressive:

Maths: The EXAONE Deep 32B model outperformed a competing model, despite being only 5% of its size, in a demanding mathematics benchmark. Furthermore, the 7.8B and 2.4B versions achieved first place in all major mathematics benchmarks for their respective model sizes.
Science and coding: In these areas, the EXAONE Deep models (7.8B and 2.4B) have secured the top spot across all major benchmarks.
MMLU (Massive Multitask Language Understanding): The 32B model achieved a score of 83.0 on the MMLU benchmark, which LG AI Research claims is the best performance among domestic Korean models.

The capabilities of the EXAONE Deep 32B model have already garnered international recognition.

Shortly after its release, it was included in the ‘Notable AI Models’ list by US-based non-profit research organisation Epoch AI. This listing places EXAONE Deep alongside its predecessor, EXAONE 3.5, making LG the only Korean entity with models featured on this prestigious list in the past two years.

Maths prowess

EXAONE Deep has demonstrated exceptional mathematical reasoning skills across its various model sizes (32B, 7.8B, and 2.4B). In assessments based on the 2025 academic year’s mathematics curriculum, all three models outperformed global reasoning models of comparable size.

The 32B model achieved a score of 94.5 in a general mathematics competency test and 90.0 in the American Invitational Mathematics Examination (AIME) 2024, a qualifying exam for the US Mathematical Olympiad.

In the AIME 2025, the 32B model matched the performance of DeepSeek-R1—a significantly larger 671B model. This result showcases EXAONE Deep’s efficient learning and strong logical reasoning abilities, particularly when tackling challenging mathematical problems.

The smaller 7.8B and 2.4B models also achieved top rankings in major benchmarks for lightweight and on-device models, respectively. The 7.8B model scored 94.8 on the MATH-500 benchmark and 59.6 on AIME 2025, while the 2.4B model achieved scores of 92.3 and 47.9 in the same evaluations.

Science and coding excellence

EXAONE Deep has also showcased remarkable capabilities in professional science reasoning and software coding.

The 32B model scored 66.1 on the GPQA Diamond test, which assesses problem-solving skills in doctoral-level physics, chemistry, and biology. In the LiveCodeBench evaluation, which measures coding proficiency, the model achieved a score of 59.5, indicating its potential for high-level applications in these expert domains.

The 7.8B and 2.4B models continued this trend of strong performance, both securing first place in the GPQA Diamond and LiveCodeBench benchmarks within their respective size categories. This achievement builds upon the success of the EXAONE 3.5 2.4B model, which previously topped Hugging Face’s LLM Readerboard in the edge division.

Enhanced general knowledge

Beyond its specialised reasoning capabilities, EXAONE Deep has also demonstrated improved performance in general knowledge understanding.

The 32B model achieved an impressive score of 83.0 on the MMLU benchmark, positioning it as the top-performing domestic model in this comprehensive evaluation. This indicates that EXAONE Deep’s reasoning enhancements extend beyond specific domains and contribute to a broader understanding of various subjects.

LG AI Research believes that EXAONE Deep’s reasoning advancements represent a leap towards a future where AI can tackle increasingly complex problems and contribute to enriching and simplifying human lives through continuous research and innovation.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post LG EXAONE Deep is a maths, science, and coding buff appeared first on AI News.

DeepSeek-R1 reasoning models rival OpenAI in performance

Ryan Daws — Mon, 20 Jan 2025 14:36:16 +0000

DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks.

DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this approach has led to the natural emergence of “numerous powerful and interesting reasoning behaviours,” including self-verification, reflection, and the generation of extensive chains of thought (CoT).

“Notably, [DeepSeek-R1-Zero] is the first open research to validate that reasoning capabilities of LLMs can be incentivised purely through RL, without the need for SFT,” DeepSeek researchers explained. This milestone not only underscores the model’s innovative foundations but also paves the way for RL-focused advancements in reasoning AI.

However, DeepSeek-R1-Zero’s capabilities come with certain limitations. Key challenges include “endless repetition, poor readability, and language mixing,” which could pose significant hurdles in real-world applications. To address these shortcomings, DeepSeek developed its flagship model: DeepSeek-R1.

Introducing DeepSeek-R1

DeepSeek-R1 builds upon its predecessor by incorporating cold-start data prior to RL training. This additional pre-training step enhances the model’s reasoning capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.

Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s much-lauded o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor.

DeepSeek has chosen to open-source both DeepSeek-R1-Zero and DeepSeek-R1 along with six smaller distilled models. Among these, DeepSeek-R1-Distill-Qwen-32B has demonstrated exceptional results—even outperforming OpenAI’s o1-mini across multiple benchmarks.

MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, eclipsing OpenAI (96.4%) and other key competitors.
LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B scored 57.2%, a standout performance among smaller models.
AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem-solving.

DeepSeek-R1 is here!

Performance on par with OpenAI-o1
Fully open-source model & technical report
MIT licensed: Distill & commercialize freely!

Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today!

1/n pic.twitter.com/7BlpWAPu6y
— DeepSeek (@deepseek_ai) January 20, 2025

A pipeline to benefit the wider industry

DeepSeek has shared insights into its rigorous pipeline for reasoning model development, which integrates a combination of supervised fine-tuning and reinforcement learning.

According to the company, the process involves two SFT stages to establish the foundational reasoning and non-reasoning abilities, as well as two RL stages tailored for discovering advanced reasoning patterns and aligning these capabilities with human preferences.

“We believe the pipeline will benefit the industry by creating better models,” DeepSeek remarked, alluding to the potential of their methodology to inspire future advancements across the AI sector.

One standout achievement of their RL-focused approach is the ability of DeepSeek-R1-Zero to execute intricate reasoning patterns without prior human instruction—a first for the open-source AI research community.

Importance of distillation

DeepSeek researchers also highlighted the importance of distillation—the process of transferring reasoning abilities from larger models to smaller, more efficient ones, a strategy that has unlocked performance gains even for smaller configurations.

Smaller distilled iterations of DeepSeek-R1 – such as the 1.5B, 7B, and 14B versions – were able to hold their own in niche applications. The distilled models can outperform results achieved via RL training on models of comparable sizes.

Bonus: Open-Source Distilled Models!

Distilled from DeepSeek-R1, 6 small models fully open-sourced
32B & 70B models on par with OpenAI-o1-mini
Empowering the open-source community

Pushing the boundaries of **open AI**!

2/n pic.twitter.com/tfXLM2xtZZ
— DeepSeek (@deepseek_ai) January 20, 2025

For researchers, these distilled models are available in configurations spanning from 1.5 billion to 70 billion parameters, supporting Qwen2.5 and Llama3 architectures. This flexibility empowers versatile usage across a wide range of tasks, from coding to natural language understanding.

DeepSeek has adopted the MIT License for its repository and weights, extending permissions for commercial use and downstream modifications. Derivative works, such as using DeepSeek-R1 to train other large language models (LLMs), are permitted. However, users of specific distilled models should ensure compliance with the licences of the original base models, such as Apache 2.0 and Llama3 licences.

(Photo by Prateek Katyal)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepSeek-R1 reasoning models rival OpenAI in performance appeared first on AI News.