llm Archives - AI News https://www.artificialintelligence-news.com/news/tag/llm/ Artificial Intelligence News Tue, 29 Apr 2025 16:42:00 +0000 en-GB hourly 1 https://wordpress.org/?v=6.8.1 https://www.artificialintelligence-news.com/wp-content/uploads/2020/09/cropped-ai-icon-32x32.png llm Archives - AI News https://www.artificialintelligence-news.com/news/tag/llm/ 32 32 OpenAI’s latest LLM opens doors for China’s AI startups https://www.artificialintelligence-news.com/news/openai-latest-llm-opens-doors-for-china-ai-startups/ https://www.artificialintelligence-news.com/news/openai-latest-llm-opens-doors-for-china-ai-startups/#respond Tue, 29 Apr 2025 16:41:59 +0000 https://www.artificialintelligence-news.com/?p=16158 At the Apsara Conference in Hangzhou, hosted by Alibaba Cloud, China’s AI startups emphasised their efforts to develop large language models. The companies’ efforts follow the announcement of OpenAI’s latest LLMs, including the o1 generative pre-trained transformer model backed by Microsoft. The model is intended to tackle difficult tasks, paving the way for advances in […]

The post OpenAI’s latest LLM opens doors for China’s AI startups appeared first on AI News.

]]>
At the Apsara Conference in Hangzhou, hosted by Alibaba Cloud, China’s AI startups emphasised their efforts to develop large language models.

The companies’ efforts follow the announcement of OpenAI’s latest LLMs, including the o1 generative pre-trained transformer model backed by Microsoft. The model is intended to tackle difficult tasks, paving the way for advances in science, coding, and mathematics.

During the conference, Kunal Zhilin, founder of Moonshot AI, underlined the importance of the o1 model, adding that it has the potential to reshape various industries and create new opportunities for AI startups.

Zhilin stated that reinforcement learning and scalability might be pivotal for AI development. He spoke of the scaling law, which states that larger models with more training data perform better.

“This approach pushes the ceiling of AI capabilities,” Zhilin said, adding that OpenAI o1 has the potential to disrupt sectors and generate new opportunities for startups.

OpenAI has also stressed the model’s ability to solve complex problems, which it says operate in a manner similar to human thinking. By refining its strategies and learning from mistakes, the model improves its problem-solving capabilities.

Zhilin said companies with enough computing power will be able to innovate not only in algorithms, but also in foundational AI models. He sees this as pivotal, as AI engineers rely increasingly on reinforcement learning to generate new data after exhausting available organic data sources.

StepFun CEO Jiang Daxin concurred with Zhilin but stated that computational power remains a big challenge for many start-ups, particularly due to US trade restrictions that hinder Chinese enterprises’ access to advanced semiconductors.

“The computational requirements are still substantial,” Daxin stated.

An insider at Baichuan AI has said that only a small group of Chinese AI start-ups — including Moonshot AI, Baichuan AI, Zhipu AI, and MiniMax — are in a position to make large-scale investments in reinforcement learning. These companies — collectively referred to as the “AI tigers” — are involved heavily in LLM development, pushing the next generation of AI.

More from the Apsara Conference

Also at the conference, Alibaba Cloud made several announcements, including the release of its Qwen 2.5 model family, which features advances in coding and mathematics. The models range from 0.5 billion to 72 billion parameters and support approximately 29 languages, including Chinese, English, French, and Spanish.

Specialised models such as Qwen2.5-Coder and Qwen2.5-Math have already gained some traction, with over 40 million downloads on platforms Hugging Face and ModelScope.

Alibaba Cloud added to its product portfolio, delivering a text-to-video model in its picture generator, Tongyi Wanxiang. The model can create videos in realistic and animated styles, with possible uses in advertising and filmmaking.

Alibaba Cloud unveiled Qwen 2-VL, the latest version of its vision language model. It handles videos longer than 20 minutes, supports video-based question-answering, and is optimised for mobile devices and robotics.

For more information on the conference, click here.

(Photo by: @Guy_AI_Wise via X)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI’s latest LLM opens doors for China’s AI startups appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/openai-latest-llm-opens-doors-for-china-ai-startups/feed/ 0
Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/ https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/#respond Fri, 25 Apr 2025 12:28:01 +0000 https://www.artificialintelligence-news.com/?p=106047 Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions. Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare. Baidu ERNIE X1 […]

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

]]>
Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions.

Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare.

Baidu ERNIE X1 Turbo: Deep reasoning meets cost efficiency

Positioned as a deep-thinking reasoning model, ERNIE X1 Turbo tackles complex tasks requiring sophisticated understanding. It enters a competitive field, claiming superior performance in some benchmarks against rivals like DeepSeek R1, V3, and OpenAI o1:

Benchmarks of Baidu ERNIE X1 Turbo compared to rival AI large language models like DeepSeek R1 and OpenAI o1.

Key to X1 Turbo’s enhanced capabilities is an advanced “chain of thought” process, enabling more structured and logical problem-solving.

Furthermore, ERNIE X1 Turbo boasts improved multimodal functions – the ability to understand and process information beyond just text, potentially including images or other data types – alongside refined tool utilisation abilities. This makes it particularly well-suited for nuanced applications such as literary creation, complex logical reasoning challenges, code generation, and intricate instruction following.

ERNIE X1 Turbo achieves this performance while undercutting competitor pricing. Input token costs start at $0.14 per million tokens, with output tokens priced at $0.55 per million. This pricing structure is approximately 25% of DeepSeek R1.

Baidu ERNIE 4.5 Turbo: Multimodal muscle at a fraction of the cost

Sharing the spotlight is ERNIE 4.5 Turbo, which focuses on delivering upgraded multimodal features and significantly faster response times compared to its non-Turbo counterpart. The emphasis here is on providing a versatile, responsive AI experience while slashing operational costs.

The model achieves an 80% price reduction compared to the original ERNIE 4.5 with input set at $0.11 per million tokens and output at $0.44 per million tokens. This represents roughly 40% of the cost of the latest version of DeepSeek V3, again highlighting a deliberate strategy to attract users through cost-effectiveness.

Performance benchmarks further bolster its credentials. In multiple tests evaluating both multimodal and text capabilities, Baidu ERNIE 4.5 Turbo outperforms OpenAI’s highly-regarded GPT-4o model. 

In multimodal capability assessments, ERNIE 4.5 Turbo achieved an average score of 77.68 to surpass GPT-4o’s score of 72.76 in the same tests.

Benchmarks of Baidu ERNIE 4.5 Turbo compared to rival AI large language models like DeepSeek R1 and OpenAI o1.

While benchmark results always require careful interpretation, this suggests ERNIE 4.5 Turbo is a serious contender for tasks involving an integrated understanding of different data types.

Baidu continues to shake up the AI marketplace

The launch of ERNIE X1 Turbo and 4.5 Turbo signifies a growing trend in the AI sector: the democratisation of high-end capabilities. While foundational models continue to push the boundaries of performance, there is increasing demand for models that balance power with accessibility and affordability.

By lowering the price points for models with sophisticated reasoning and multimodal features, the Baidu ERNIE Turbo series could enable a wider range of developers and businesses to integrate advanced AI into their applications.

This competitive pricing puts pressure on established players like OpenAI and Anthropic, as well as emerging competitors like DeepSeek, potentially leading to further price adjustments across the market.

(Image Credit: Alpha Photo under CC BY-NC 2.0 license)

See also: China’s MCP adoption: AI assistants that actually do things

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/baidu-ernie-x1-and-4-5-turbo-high-performance-low-cost/feed/ 0
DolphinGemma: Google AI model understands dolphin chatter https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/ https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/#respond Mon, 14 Apr 2025 14:18:49 +0000 https://www.artificialintelligence-news.com/?p=105315 Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication. The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations. Google, collaborating with engineers at […]

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication.

The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations.

Google, collaborating with engineers at the Georgia Institute of Technology and leveraging the field research of the Wild Dolphin Project (WDP), has unveiled DolphinGemma to help realise that goal.

Announced around National Dolphin Day, the foundational AI model represents a new tool in the effort to comprehend cetacean communication. Trained specifically to learn the structure of dolphin sounds, DolphinGemma can even generate novel, dolphin-like audio sequences.

Over decades, the Wild Dolphin Project – operational since 1985 – has run the world’s longest continuous underwater study of dolphins to develop a deep understanding of context-specific sounds, such as:

  • Signature “whistles”: Serving as unique identifiers, akin to names, crucial for interactions like mothers reuniting with calves.
  • Burst-pulse “squawks”: Commonly associated with conflict or aggressive encounters.
  • Click “buzzes”: Often detected during courtship activities or when dolphins chase sharks.

WDP’s ultimate goal is to uncover the inherent structure and potential meaning within these natural sound sequences, searching for the grammatical rules and patterns that might signify a form of language.

This long-term, painstaking analysis has provided the essential grounding and labelled data crucial for training sophisticated AI models like DolphinGemma.

DolphinGemma: The AI ear for cetacean sounds

Analysing the sheer volume and complexity of dolphin communication is a formidable task ideally suited for AI.

DolphinGemma, developed by Google, employs specialised audio technologies to tackle this. It uses the SoundStream tokeniser to efficiently represent dolphin sounds, feeding this data into a model architecture adept at processing complex sequences.

Based on insights from Google’s Gemma family of lightweight, open models (which share technology with the powerful Gemini models), DolphinGemma functions as an audio-in, audio-out system.

Fed with sequences of natural dolphin sounds from WDP’s extensive database, DolphinGemma learns to identify recurring patterns and structures. Crucially, it can predict the likely subsequent sounds in a sequence—much like human language models predict the next word.

With around 400 million parameters, DolphinGemma is optimised to run efficiently, even on the Google Pixel smartphones WDP uses for data collection in the field.

As WDP begins deploying the model this season, it promises to accelerate research significantly. By automatically flagging patterns and reliable sequences previously requiring immense human effort to find, it can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication.

The CHAT system and two-way interaction

While DolphinGemma focuses on understanding natural communication, a parallel project explores a different avenue: active, two-way interaction.

The CHAT (Cetacean Hearing Augmentation Telemetry) system – developed by WDP in partnership with Georgia Tech – aims to establish a simpler, shared vocabulary rather than directly translating complex dolphin language.

The concept relies on associating specific, novel synthetic whistles (created by CHAT, distinct from natural sounds) with objects the dolphins enjoy interacting with, like scarves or seaweed. Researchers demonstrate the whistle-object link, hoping the dolphins’ natural curiosity leads them to mimic the sounds to request the items.

As more natural dolphin sounds are understood through work with models like DolphinGemma, these could potentially be incorporated into the CHAT interaction framework.

Google Pixel enables ocean research

Underpinning both the analysis of natural sounds and the interactive CHAT system is crucial mobile technology. Google Pixel phones serve as the brains for processing the high-fidelity audio data in real-time, directly in the challenging ocean environment.

The CHAT system, for instance, relies on Google Pixel phones to:

  • Detect a potential mimic amidst background noise.
  • Identify the specific whistle used.
  • Alert the researcher (via underwater bone-conducting headphones) about the dolphin’s ‘request’.

This allows the researcher to respond quickly with the correct object, reinforcing the learned association. While a Pixel 6 initially handled this, the next generation CHAT system (planned for summer 2025) will utilise a Pixel 9, integrating speaker/microphone functions and running both deep learning models and template matching algorithms simultaneously for enhanced performance.

Google Pixel 9 phone that will be used for the next generation DolphinGemma CHAT system.

Using smartphones like the Pixel dramatically reduces the need for bulky, expensive custom hardware. It improves system maintainability, lowers power requirements, and shrinks the physical size. Furthermore, DolphinGemma’s predictive power integrated into CHAT could help identify mimics faster, making interactions more fluid and effective.

Recognising that breakthroughs often stem from collaboration, Google intends to release DolphinGemma as an open model later this summer. While trained on Atlantic spotted dolphins, its architecture holds promise for researchers studying other cetaceans, potentially requiring fine-tuning for different species’ vocal repertoires..

The aim is to equip researchers globally with powerful tools to analyse their own acoustic datasets, accelerating the collective effort to understand these intelligent marine mammals. We are shifting from passive listening towards actively deciphering patterns, bringing the prospect of bridging the communication gap between our species perhaps just a little closer.

See also: IEA: The opportunities and challenges of AI for global energy

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/feed/ 0
Deep Cogito open LLMs use IDA to outperform same size models https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/ https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/#respond Wed, 09 Apr 2025 08:03:15 +0000 https://www.artificialintelligence-news.com/?p=105246 Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence. The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each […]

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.

The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks”.

Impressively, the 70B model from Deep Cogito even surpasses the performance of the recently released Llama 4 109B Mixture-of-Experts (MoE) model.   

Iterated Distillation and Amplification (IDA)

Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). 

Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.

The IDA process involves two key steps iterated repeatedly:

  • Amplification: Using more computation to enable the model to derive better solutions or capabilities, akin to advanced reasoning techniques.
  • Distillation: Internalising these amplified capabilities back into the model’s parameters.

Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.

“When we study superintelligent systems,” the research notes, referencing successes like AlphaGo, “we find two key ingredients enabled this breakthrough: Advanced Reasoning and Iterative Self-Improvement”. IDA is presented as a way to integrate both into LLM training.

Deep Cogito claims IDA is efficient, stating the new models were developed by a small team in approximately 75 days. They also highlight IDA’s potential scalability compared to methods like Reinforcement Learning from Human Feedback (RLHF) or standard distillation from larger models.

As evidence, the company points to their 70B model outperforming Llama 3.3 70B (distilled from a 405B model) and Llama 4 Scout 109B (distilled from a 2T parameter model).

Capabilities and performance of Deep Cogito models

The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases.

A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.

Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes.

Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

For instance, the Cogito 70B model achieves 91.73% on MMLU in standard mode (+6.40% vs Llama 3.3 70B) and 91.00% in thinking mode (+4.40% vs Deepseek R1 Distill 70B). Livebench scores also show improvements.

Here are benchmarks of 14B models for a medium-sized comparison:

Benchmark comparison of medium 14B size large language models from Deep Cogito compared to Alibaba Qwen and DeepSeek R1

While acknowledging benchmarks don’t fully capture real-world utility, Deep Cogito expresses confidence in practical performance.

This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.

(Photo by Pietro Mattia)

See also: Alibaba Cloud targets global AI growth with new models and tools

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/feed/ 0
DeepSeek-R1 reasoning models rival OpenAI in performance  https://www.artificialintelligence-news.com/news/deepseek-r1-reasoning-models-rival-openai-in-performance/ https://www.artificialintelligence-news.com/news/deepseek-r1-reasoning-models-rival-openai-in-performance/#respond Mon, 20 Jan 2025 14:36:16 +0000 https://www.artificialintelligence-news.com/?p=16911 DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks. DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this approach has led to the natural emergence of “numerous powerful and interesting reasoning behaviours,” including […]

The post DeepSeek-R1 reasoning models rival OpenAI in performance  appeared first on AI News.

]]>
DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks.

DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this approach has led to the natural emergence of “numerous powerful and interesting reasoning behaviours,” including self-verification, reflection, and the generation of extensive chains of thought (CoT).

“Notably, [DeepSeek-R1-Zero] is the first open research to validate that reasoning capabilities of LLMs can be incentivised purely through RL, without the need for SFT,” DeepSeek researchers explained. This milestone not only underscores the model’s innovative foundations but also paves the way for RL-focused advancements in reasoning AI.

However, DeepSeek-R1-Zero’s capabilities come with certain limitations. Key challenges include “endless repetition, poor readability, and language mixing,” which could pose significant hurdles in real-world applications. To address these shortcomings, DeepSeek developed its flagship model: DeepSeek-R1.

Introducing DeepSeek-R1

DeepSeek-R1 builds upon its predecessor by incorporating cold-start data prior to RL training. This additional pre-training step enhances the model’s reasoning capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.

Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s much-lauded o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor.

DeepSeek has chosen to open-source both DeepSeek-R1-Zero and DeepSeek-R1 along with six smaller distilled models. Among these, DeepSeek-R1-Distill-Qwen-32B has demonstrated exceptional results—even outperforming OpenAI’s o1-mini across multiple benchmarks.

  • MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, eclipsing OpenAI (96.4%) and other key competitors.  
  • LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B scored 57.2%, a standout performance among smaller models.  
  • AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem-solving.

A pipeline to benefit the wider industry

DeepSeek has shared insights into its rigorous pipeline for reasoning model development, which integrates a combination of supervised fine-tuning and reinforcement learning.

According to the company, the process involves two SFT stages to establish the foundational reasoning and non-reasoning abilities, as well as two RL stages tailored for discovering advanced reasoning patterns and aligning these capabilities with human preferences.

“We believe the pipeline will benefit the industry by creating better models,” DeepSeek remarked, alluding to the potential of their methodology to inspire future advancements across the AI sector.

One standout achievement of their RL-focused approach is the ability of DeepSeek-R1-Zero to execute intricate reasoning patterns without prior human instruction—a first for the open-source AI research community.

Importance of distillation

DeepSeek researchers also highlighted the importance of distillation—the process of transferring reasoning abilities from larger models to smaller, more efficient ones, a strategy that has unlocked performance gains even for smaller configurations.

Smaller distilled iterations of DeepSeek-R1 – such as the 1.5B, 7B, and 14B versions – were able to hold their own in niche applications. The distilled models can outperform results achieved via RL training on models of comparable sizes.

For researchers, these distilled models are available in configurations spanning from 1.5 billion to 70 billion parameters, supporting Qwen2.5 and Llama3 architectures. This flexibility empowers versatile usage across a wide range of tasks, from coding to natural language understanding.

DeepSeek has adopted the MIT License for its repository and weights, extending permissions for commercial use and downstream modifications. Derivative works, such as using DeepSeek-R1 to train other large language models (LLMs), are permitted. However, users of specific distilled models should ensure compliance with the licences of the original base models, such as Apache 2.0 and Llama3 licences.

(Photo by Prateek Katyal)

See also: Microsoft advances materials discovery with MatterGen

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepSeek-R1 reasoning models rival OpenAI in performance  appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/deepseek-r1-reasoning-models-rival-openai-in-performance/feed/ 0
Cisco: Securing enterprises in the AI era https://www.artificialintelligence-news.com/news/cisco-securing-enterprises-in-the-ai-era/ https://www.artificialintelligence-news.com/news/cisco-securing-enterprises-in-the-ai-era/#respond Wed, 15 Jan 2025 16:02:18 +0000 https://www.artificialintelligence-news.com/?p=16883 As AI becomes increasingly integral to business operations, new safety concerns and security threats emerge at an unprecedented pace—outstripping the capabilities of traditional cybersecurity solutions. The stakes are high with potentially significant repercussions. According to Cisco’s 2024 AI Readiness Index, only 29% of surveyed organisations feel fully equipped to detect and prevent unauthorised tampering with […]

The post Cisco: Securing enterprises in the AI era appeared first on AI News.

]]>
As AI becomes increasingly integral to business operations, new safety concerns and security threats emerge at an unprecedented pace—outstripping the capabilities of traditional cybersecurity solutions.

The stakes are high with potentially significant repercussions. According to Cisco’s 2024 AI Readiness Index, only 29% of surveyed organisations feel fully equipped to detect and prevent unauthorised tampering with AI technologies.

Continuous model validation

DJ Sampath, Head of AI Software & Platform at Cisco, said: “When we talk about model validation, it is not just a one time thing, right? You’re doing the model validation on a continuous basis.

Headshot of DJ Sampath from Cisco for an article on securing enterprises in the AI era.

“So as you see changes happen to the model – if you’re doing any type of finetuning, or you discover new attacks that are starting to show up that you need the models to learn from – we’re constantly learning all of that information and revalidating the model to see how these models are behaving under these new attacks that we’ve discovered.

“The other very important point is that we have a really advanced threat research team which is constantly looking at these AI attacks and understanding how these attacks can further be enhanced. In fact, we’re, we’re, we’re contributing to the work groups inside of standards organisations like MITRE, OWASP, and NIST.”

Beyond preventing harmful outputs, Cisco addresses the vulnerabilities of AI models to malicious external influences that can change their behaviour. These risks include prompt injection attacks, jailbreaking, and training data poisoning—each demanding stringent preventive measures.

Evolution brings new complexities

Frank Dickson, Group VP for Security & Trust at IDC, gave his take on the evolution of cybersecurity over time and what advancements in AI mean for the industry.

“The first macro trend was that we moved from on-premise to the cloud and that introduced this whole host of new problem statements that we had to address. And then as applications move from monolithic to microservices, we saw this whole host of new problem sets.

Headshot of Frank Dickson from IDC for an article on securing enterprises in the AI era.

“AI and the addition of LLMs… same thing, whole host of new problem sets.”

The complexities of AI security are heightened as applications become multi-model. Vulnerabilities can arise at various levels – from models to apps – implicating different stakeholders such as developers, end-users, and vendors.

“Once an application moved from on-premise to the cloud, it kind of stayed there. Yes, we developed applications across multiple clouds, but once you put an application in AWS or Azure or GCP, you didn’t jump it across those various cloud environments monthly, quarterly, weekly, right?

“Once you move from monolithic application development to microservices, you stay there. Once you put an application in Kubernetes, you don’t jump back into something else.

“As you look to secure a LLM, the important thing to note is the model changes. And when we talk about model change, it’s not like it’s a revision … this week maybe [developers are] using Anthropic, next week they may be using Gemini.

“They’re completely different and the threat vectors of each model are completely different. They all have their strengths and they all have their dramatic weaknesses.”

Unlike conventional safety measures integrated into individual models, Cisco delivers controls for a multi-model environment through its newly-announced AI Defense. The solution is self-optimising, using Cisco’s proprietary machine learning algorithms to identify evolving AI safety and security concerns—informed by threat intelligence from Cisco Talos.

Adjusting to the new normal

Jeetu Patel, Executive VP and Chief Product Officer at Cisco, shared his view that major advancements in a short period of time always seem revolutionary but quickly feel normal.

Headshot of Jeetu Patel from Cisco for an article on securing enterprises in the AI era.

Waymo is, you know, self-driving cars from Google. You get in, and there’s no one sitting in the car, and it takes you from point A to point B. It feels mind-bendingly amazing, like we are living in the future. The second time, you kind of get used to it. The third time, you start complaining about the seats.

“Even how quickly we’ve gotten used to AI and ChatGPT over the course of the past couple years, I think what will happen is any major advancement will feel exceptionally progressive for a short period of time. Then there’s a normalisation that happens where everyone starts getting used to it.”

Patel believes that normalisation will happen with AGI as well. However, he notes that “you cannot underestimate the progress that these models are starting to make” and, ultimately, the kind of use cases they are going to unlock.

“No-one had thought that we would have a smartphone that’s gonna have more compute capacity than the mainframe computer at your fingertips and be able to do thousands of things on it at any point in time and now it’s just another way of life. My 14-year-old daughter doesn’t even think about it.

“We ought to make sure that we as companies get adjusted to that very quickly.”

See also: Sam Altman, OpenAI: ‘Lucky and humbling’ to work towards superintelligence

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Cisco: Securing enterprises in the AI era appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/cisco-securing-enterprises-in-the-ai-era/feed/ 0
Alibaba Marco-o1: Advancing LLM reasoning capabilities https://www.artificialintelligence-news.com/news/alibaba-marco-o1-advancing-llm-reasoning-capabilities/ https://www.artificialintelligence-news.com/news/alibaba-marco-o1-advancing-llm-reasoning-capabilities/#respond Thu, 28 Nov 2024 17:07:03 +0000 https://www.artificialintelligence-news.com/?p=16579 Alibaba has announced Marco-o1, a large language model (LLM) designed to tackle both conventional and open-ended problem-solving tasks. Marco-o1, from Alibaba’s MarcoPolo team, represents another step forward in the ability of AI to handle complex reasoning challenges—particularly in maths, physics, coding, and areas where clear standards may be absent. Building upon OpenAI’s reasoning advancements with […]

The post Alibaba Marco-o1: Advancing LLM reasoning capabilities appeared first on AI News.

]]>
Alibaba has announced Marco-o1, a large language model (LLM) designed to tackle both conventional and open-ended problem-solving tasks.

Marco-o1, from Alibaba’s MarcoPolo team, represents another step forward in the ability of AI to handle complex reasoning challenges—particularly in maths, physics, coding, and areas where clear standards may be absent.

Building upon OpenAI’s reasoning advancements with its o1 model, Marco-o1 distinguishes itself by incorporating several advanced techniques, including Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), and novel reflection mechanisms. These components work in concert to enhance the model’s problem-solving capabilities across various domains.

The development team has implemented a comprehensive fine-tuning strategy using multiple datasets, including a filtered version of the Open-O1 CoT Dataset, a synthetic Marco-o1 CoT Dataset, and a specialised Marco Instruction Dataset. In total, the training corpus comprises over 60,000 carefully curated samples.

The model has demonstrated particularly impressive results in multilingual applications. In testing, Marco-o1 achieved notable accuracy improvements of 6.17% on the English MGSM dataset and 5.60% on its Chinese counterpart. The model has shown particular strength in translation tasks, especially when handling colloquial expressions and cultural nuances.

One of the model’s most innovative features is its implementation of varying action granularities within the MCTS framework. This approach allows the model to explore reasoning paths at different levels of detail, from broad steps to more precise “mini-steps” of 32 or 64 tokens. The team has also introduced a reflection mechanism that prompts the model to self-evaluate and reconsider its reasoning, leading to improved accuracy in complex problem-solving scenarios.

The MCTS integration has proven particularly effective, with all MCTS-enhanced versions of the model showing significant improvements over the base Marco-o1-CoT version. The team’s experiments with different action granularities have revealed interesting patterns, though they note that determining the optimal strategy requires further research and more precise reward models.

Benchmark comparison of the latest Marco-o1 LLM model with MCTS integration to previous AI models and variations.
(Credit: MarcoPolo Team, AI Business, Alibaba International Digital Commerce)

The development team has been transparent about the model’s current limitations, acknowledging that while Marco-o1 exhibits strong reasoning characteristics, it still falls short of a fully realised “o1” model. They emphasise that this release represents an ongoing commitment to improvement rather than a finished product.

Looking ahead, the Alibaba team has announced plans to incorporate reward models, including Outcome Reward Modeling (ORM) and Process Reward Modeling (PRM), to enhance the decision-making capabilities og Marco-o1. They are also exploring reinforcement learning techniques to further refine the model’s problem-solving abilities.

The Marco-o1 model and associated datasets have been made available to the research community through Alibaba’s GitHub repository, complete with comprehensive documentation and implementation guides. The release includes installation instructions and example scripts for both direct model usage and deployment via FastAPI.

(Photo by Alina Grubnyak)

See also: New AI training techniques aim to overcome current challenges

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Marco-o1: Advancing LLM reasoning capabilities appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/alibaba-marco-o1-advancing-llm-reasoning-capabilities/feed/ 0
Ai2 OLMo 2: Raising the bar for open language models https://www.artificialintelligence-news.com/news/ai2-olmo-2-raising-bar-open-language-models/ https://www.artificialintelligence-news.com/news/ai2-olmo-2-raising-bar-open-language-models/#respond Wed, 27 Nov 2024 18:43:42 +0000 https://www.artificialintelligence-news.com/?p=16566 Ai2 is releasing OLMo 2, a family of open-source language models that advances the democratisation of AI and narrows the gap between open and proprietary solutions. The new models, available in 7B and 13B parameter versions, are trained on up to 5 trillion tokens and demonstrate performance levels that match or exceed comparable fully open […]

The post Ai2 OLMo 2: Raising the bar for open language models appeared first on AI News.

]]>
Ai2 is releasing OLMo 2, a family of open-source language models that advances the democratisation of AI and narrows the gap between open and proprietary solutions.

The new models, available in 7B and 13B parameter versions, are trained on up to 5 trillion tokens and demonstrate performance levels that match or exceed comparable fully open models whilst remaining competitive with open-weight models such as Llama 3.1 on English academic benchmarks.

“Since the release of the first OLMo in February 2024, we’ve seen rapid growth in the open language model ecosystem, and a narrowing of the performance gap between open and proprietary models,” explained Ai2.

The development team achieved these improvements through several innovations, including enhanced training stability measures, staged training approaches, and state-of-the-art post-training methodologies derived from their Tülu 3 framework. Notable technical improvements include the switch from nonparametric layer norm to RMSNorm and the implementation of rotary positional embedding.

OLMo 2 model training breakthrough

The training process employed a sophisticated two-stage approach. The initial stage utilised the OLMo-Mix-1124 dataset of approximately 3.9 trillion tokens, sourced from DCLM, Dolma, Starcoder, and Proof Pile II. The second stage incorporated a carefully curated mixture of high-quality web data and domain-specific content through the Dolmino-Mix-1124 dataset.

Particularly noteworthy is the OLMo 2-Instruct-13B variant, which is the most capable model in the series. The model demonstrates superior performance compared to Qwen 2.5 14B instruct, Tülu 3 8B, and Llama 3.1 8B instruct models across various benchmarks.

Benchmarks comparing the OLMo 2 open large language model to other models such as Mistral, Qwn, Llama, Gemma, and more.
(Credit: Ai2)

Commiting to open science

Reinforcing its commitment to open science, Ai2 has released comprehensive documentation including weights, data, code, recipes, intermediate checkpoints, and instruction-tuned models. This transparency allows for full inspection and reproduction of results by the wider AI community.

The release also introduces an evaluation framework called OLMES (Open Language Modeling Evaluation System), comprising 20 benchmarks designed to assess core capabilities such as knowledge recall, commonsense reasoning, and mathematical reasoning.

OLMo 2 raises the bar in open-source AI development, potentially accelerating the pace of innovation in the field whilst maintaining transparency and accessibility.

(Photo by Rick Barrett)

See also: OpenAI enhances AI safety with new red teaming methods

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Ai2 OLMo 2: Raising the bar for open language models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/ai2-olmo-2-raising-bar-open-language-models/feed/ 0
OpenAI faces diminishing returns with latest AI model https://www.artificialintelligence-news.com/news/openai-faces-diminishing-returns-latest-ai-model/ https://www.artificialintelligence-news.com/news/openai-faces-diminishing-returns-latest-ai-model/#respond Tue, 12 Nov 2024 18:33:37 +0000 https://www.artificialintelligence-news.com/?p=16469 OpenAI is facing diminishing returns with its latest AI model while navigating the pressures of recent investments. According to The Information, OpenAI’s next AI model – codenamed Orion – is delivering smaller performance gains compared to its predecessors. In employee testing, Orion reportedly achieved the performance level of GPT-4 after completing just 20% of its […]

The post OpenAI faces diminishing returns with latest AI model appeared first on AI News.

]]>
OpenAI is facing diminishing returns with its latest AI model while navigating the pressures of recent investments.

According to The Information, OpenAI’s next AI model – codenamed Orion – is delivering smaller performance gains compared to its predecessors.

In employee testing, Orion reportedly achieved the performance level of GPT-4 after completing just 20% of its training. However, the transition from GPT-4 to the anticipated GPT-5 is said to exhibit smaller quality improvements than the leap from GPT-3 to GPT-4.

“Some researchers at the company believe Orion isn’t reliably better than its predecessor in handling certain tasks,” stated employees in the report. “Orion performs better at language tasks but may not outperform previous models at tasks such as coding, according to an OpenAI employee.”

Early stages of AI training usually yield the most significant improvements, while subsequent phases typically result in smaller performance gains. Consequently, the remaining 80% of training is unlikely to deliver advancements on par with previous generational improvements.

This situation with its latest AI model emerges at a pivotal time for OpenAI, following a recent funding round that saw the company raise $6.6 billion. With this financial backing comes increased expectations from investors, as well as technical challenges that complicate traditional scaling methodologies in AI development.

If these early versions do not meet expectations, OpenAI’s future fundraising prospects may not attract the same level of interest.

The limitations highlighted in the report underline a significant challenge confronting the entire AI industry: the diminishing availability of high-quality training data and the necessity to maintain relevance in an increasingly competitive field.

According to a paper (PDF) that was published in June, AI firms will deplete the pool of publicly available human-generated text data between 2026 and 2032. The Information notes that developers have “”largely squeezed as much out of” the data that has been used for enabling the rapid AI advancements we’ve seen in recent years.

To address these challenges, OpenAI is fundamentally rethinking its AI development strategy.

“In response to the recent challenge to training-based scaling laws posed by slowing GPT improvements, the industry appears to be shifting its effort to improving models after their initial training, potentially yielding a different type of scaling law,” explains The Information.

As OpenAI navigates these challenges, the company must balance innovation with practical application and investor expectations. However, the ongoing exodus of leading figures from the company won’t help matters.

(Photo by Jukan Tateisi)

See also: ASI Alliance launches AIRIS that ‘learns’ in Minecraft

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI faces diminishing returns with latest AI model appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/openai-faces-diminishing-returns-latest-ai-model/feed/ 0
Anthropic unveils new Claude AI models and ‘computer control’ https://www.artificialintelligence-news.com/news/anthropic-new-claude-ai-models-and-computer-control/ https://www.artificialintelligence-news.com/news/anthropic-new-claude-ai-models-and-computer-control/#respond Tue, 22 Oct 2024 16:07:38 +0000 https://www.artificialintelligence-news.com/?p=16365 Anthropic has announced upgrades to its AI portfolio, including an enhanced Claude 3.5 Sonnet model and the introduction of Claude 3.5 Haiku, alongside a “computer control” feature in public beta. The upgraded Claude 3.5 Sonnet demonstrates substantial improvements across all metrics, with particularly notable advances in coding capabilities. The model achieved an impressive 49.0% on […]

The post Anthropic unveils new Claude AI models and ‘computer control’ appeared first on AI News.

]]>
Anthropic has announced upgrades to its AI portfolio, including an enhanced Claude 3.5 Sonnet model and the introduction of Claude 3.5 Haiku, alongside a “computer control” feature in public beta.

The upgraded Claude 3.5 Sonnet demonstrates substantial improvements across all metrics, with particularly notable advances in coding capabilities. The model achieved an impressive 49.0% on the SWE-bench Verified benchmark, surpassing all publicly available models, including OpenAI’s offerings and specialist coding systems.

In a pioneering development, Anthropic has introduced computer use functionality that enables Claude to interact with computers similarly to humans: viewing screens, controlling cursors, clicking, and typing. This capability, currently in public beta, marks Claude 3.5 Sonnet as the first frontier AI model to offer such functionality.

Several major technology firms have already begun implementing these new capabilities.

“The upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding,” reports GitLab, which noted up to 10% stronger reasoning across use cases without additional latency.

The new Claude 3.5 Haiku model, set for release later this month, matches the performance of the previous Claude 3 Opus whilst maintaining cost-effectiveness and speed. It notably achieved 40.6% on SWE-bench Verified, outperforming many competitive models including the original Claude 3.5 Sonnet and GPT-4o.

Model benchmarks comparing new Claude AI models from Anthropic.
(Credit: Anthropic)

Regarding computer control capabilities, Anthropic has taken a measured approach, acknowledging current limitations whilst highlighting potential. On the OSWorld benchmark, which evaluates computer interface navigation, Claude 3.5 Sonnet achieved 14.9% in screenshot-only tests, significantly outperforming the next-best system’s 7.8%.

The developments have undergone rigorous safety evaluations, with pre-deployment testing conducted in partnership with both the US and UK AI Safety Institutes. Anthropic maintains that the ASL-2 Standard, as detailed in their Responsible Scaling Policy, remains appropriate for these models.

(Image Credit: Anthropic)

See also: IBM unveils Granite 3.0 AI models with open-source commitment

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Anthropic unveils new Claude AI models and ‘computer control’ appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/anthropic-new-claude-ai-models-and-computer-control/feed/ 0
IBM unveils Granite 3.0 AI models with open-source commitment https://www.artificialintelligence-news.com/news/ibm-granite-3-ai-models-open-source-commitment/ https://www.artificialintelligence-news.com/news/ibm-granite-3-ai-models-open-source-commitment/#respond Mon, 21 Oct 2024 12:16:08 +0000 https://www.artificialintelligence-news.com/?p=16341 IBM has taken the wraps off its most sophisticated family of AI models to date, dubbed Granite 3.0, at the company’s annual TechXchange event. The Granite 3.0 lineup includes a range of models designed for various applications: IBM claims that its new 8B and 2B language models can match or surpass the performance of similarly […]

The post IBM unveils Granite 3.0 AI models with open-source commitment appeared first on AI News.

]]>
IBM has taken the wraps off its most sophisticated family of AI models to date, dubbed Granite 3.0, at the company’s annual TechXchange event.

The Granite 3.0 lineup includes a range of models designed for various applications:

  • General purpose/language: 8B and 2B variants in both Instruct and Base configurations
  • Safety: Guardian models in 8B and 2B sizes, designed to implement guardrails
  • Mixture-of-Experts: A series of models optimised for different deployment scenarios

IBM claims that its new 8B and 2B language models can match or surpass the performance of similarly sized offerings from leading providers across numerous academic and industry benchmarks. These models are positioned as versatile workhorses for enterprise AI, excelling in tasks such as Retrieval Augmented Generation (RAG), classification, summarisation, and entity extraction.

A key differentiator for the Granite 3.0 family is IBM’s commitment to open-source AI. The models are released under the permissive Apache 2.0 licence, offering a unique combination of performance, flexibility, and autonomy to both enterprise clients and the broader AI community.

IBM believes that by combining a compact Granite model with proprietary enterprise data, particularly using their novel InstructLab alignment technique, businesses can achieve task-specific performance rivalling larger models at a fraction of the cost. Early proofs-of-concept suggest potential cost savings of up to 23x less than large frontier models.

According to IBM, transparency and safety remain at the forefront of its AI strategy. The company has published a technical report and responsible use guide for Granite 3.0, detailing the datasets used, data processing steps, and benchmark results. Additionally, IBM offers IP indemnity for all Granite models on its watsonx.ai platform, providing enterprises with greater confidence when integrating these models with their own data.

The Granite 3.0 8B Instruct model has shown particularly promising results, outperforming similar-sized open-source models from Meta and Mistral on standard academic benchmarks. It also leads across all measured safety dimensions on IBM’s AttaQ safety benchmark.

IBM is also introducing the Granite Guardian 3.0 models, designed to implement safety guardrails by checking user prompts and LLM responses for various risks. These models offer a comprehensive set of risk and harm detection capabilities, including unique checks for RAG-specific issues such as groundedness and context relevance.

The entire suite of Granite 3.0 models is available for download on HuggingFace, with commercial use options on IBM’s watsonx platform. IBM has also collaborated with ecosystem partners to integrate Granite models into various offerings, providing greater choice for enterprises worldwide.

As IBM continues to advance its AI portfolio, the company says it’s focusing on developing more sophisticated AI agent technologies capable of greater autonomy and complex problem-solving. This includes plans to introduce new AI agent features in IBM watsonx Orchestrate and build agent capabilities across its portfolio in 2025.

See also: Scoring AI models: Endor Labs unveils evaluation tool

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post IBM unveils Granite 3.0 AI models with open-source commitment appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/ibm-granite-3-ai-models-open-source-commitment/feed/ 0
China Telecom trains AI model with 1 trillion parameters on domestic chips https://www.artificialintelligence-news.com/news/china-telecom-trains-ai-model-with-1-trillion-parameters-on-domestic-chips/ https://www.artificialintelligence-news.com/news/china-telecom-trains-ai-model-with-1-trillion-parameters-on-domestic-chips/#respond Thu, 10 Oct 2024 13:32:31 +0000 https://www.artificialintelligence-news.com/?p=16265 China Telecom, one of the country’s state-owned telecom giants, has created two LLMs that were trained solely on domestically-produced chips. This breakthrough represents a significant step in China’s ongoing efforts to become self-reliant in AI technology, especially in light of escalating US limitations on access to advanced semiconductors for its competitors. According to the company’s […]

The post China Telecom trains AI model with 1 trillion parameters on domestic chips appeared first on AI News.

]]>
China Telecom, one of the country’s state-owned telecom giants, has created two LLMs that were trained solely on domestically-produced chips.

This breakthrough represents a significant step in China’s ongoing efforts to become self-reliant in AI technology, especially in light of escalating US limitations on access to advanced semiconductors for its competitors.

According to the company’s Institute of AI, one of the models, TeleChat2-115B and another unnamed model were trained on tens of thousands of Chinese-made chips. This achievement is especially noteworthy given the tighter US export rules that have limited China’s ability to purchase high-end processors from Nvidia and other foreign companies. In a statement shared on WeChat, the AI institute claimed that this accomplishment demonstrated China’s capability to independently train LLMs and signals a new era of innovation and self-reliance in AI technology.

The scale of these models is remarkable. China Telecom stated that the unnamed LLM has one trillion parameters. In AI terminology, parameters are the variables that help the model in learning during training. The more parameters there are, the more complicated and powerful the AI becomes.

Chinese companies are striving to keep pace with global leaders in AI based outside the country. Washington’s export restrictions on Nvidia’s latest AI chips such as the A100 and H100, have compelled China to seek alternatives. As a result, Chinese companies have developed their own processors to reduce reliance on Western technologies. For instance, the TeleChat2-115B model has approximately 100 billion parameters, and therefore can perform as well as mainstream platforms.

China Telecom did not specify which company supplied the domestically-designed chips used to train its models. However, as previously discussed on these pages, Huawei’s Ascend chips play a key part in the country’s AI plans.

Huawei, which has faced US penalties in recent years, is also increasing its efforts in the artificial intelligence field. The company has recently started testing its latest AI processor, the Ascend 910C, with potential clients waiting in the domestic market. Large Chinese server companies, as well as internet giants that have previously used Nvidia chips, are apparently testing the new chip’s performance. Huawei’s Ascend processors, as one of the few viable alternatives to Nvidia hardware, are viewed as a key component of China’s strategy that will lessen its reliance on foreign technology.

In addition to Huawei, China Telecom is collaborating with other domestic chipmakers such as Cambricon, a Chinese start-up specialising in AI processors. The partnerships reflect a broader tendency in China’s tech industry to build a homegrown ecosystem of AI solutions, further shielding the country from the effects of US export controls.

By developing its own AI chips and technology, China is gradually reducing its dependence on foreign-made hardware, especially Nvidia’s highly sought-after and therefore expensive GPUs. While US sanctions make it difficult for Chinese companies to obtain the latest Nvidia hardware, a black market for foreign chips has emerged. Rather than risk operating in the grey market, many Chinese companies prefer to purchase lower-powered alternatives such as previous-gen models to maintain access to Nvidia’s official support and services.

China’s achievement reflects a broader shift in its approach to AI and semiconductor technology, emphasising self-sufficiency and resilience in an increasingly competitive global economy and in face of American protectionist trade policies.

(Photo by Mark Kuiper)

See also: Has Huawei outsmarted Apple in the AI race?

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post China Telecom trains AI model with 1 trillion parameters on domestic chips appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/china-telecom-trains-ai-model-with-1-trillion-parameters-on-domestic-chips/feed/ 0