AI Deep & Reinforcement Learning News | AI News

RAGEN: AI framework tackles LLM agent instability

Ryan Daws — Thu, 24 Apr 2025 16:06:47 +0000

Researchers have introduced RAGEN, an AI framework designed to counter LLM agent instability when handling complex situations.

Training these AI agents presents significant hurdles, particularly when decisions span multiple steps and involve unpredictable feedback from the environment. While reinforcement learning (RL) has shown promise in static tasks like solving maths problems or generating code, its application to dynamic, multi-turn agent training has been less explored.

Addressing this gap, a collaborative team from institutions including Northwestern University, Stanford University, Microsoft, and New York University has proposed StarPO (State-Thinking-Actions-Reward Policy Optimisation).

StarPO offers a generalised approach for training agents at the trajectory level (i.e. it optimises the entire sequence of interactions, not just individual actions.)

Accompanying this is RAGEN, a modular system built to implement StarPO. This enables the training and evaluation of LLM agents, particularly focusing on their reasoning capabilities under RL. RAGEN provides the necessary infrastructure for rollouts, reward assignment, and optimisation within multi-turn, stochastic (randomly determined) environments.

Minimalist environments, maximum insight

To isolate the core learning challenges from confounding factors like extensive pre-existing knowledge or task-specific engineering, the researchers tested LLMs using RAGEN in three deliberately minimalistic, controllable symbolic gaming environments:

Bandit: A single-turn, stochastic task testing risk-sensitive symbolic reasoning. The agent chooses between options (like ‘Phoenix’ or ‘Dragon’ arms) with different, initially unknown, reward profiles.
Sokoban: A multi-turn, deterministic puzzle requiring foresight and planning, as actions (pushing boxes) are irreversible.
Frozen Lake: A multi-turn, stochastic grid navigation task where movement attempts can randomly fail, demanding planning under uncertainty.

These environments allow for clear analysis of how agents learn decision-making policies purely through interaction.

Key findings: Stability, rollouts, and reasoning

The study yielded three significant findings concerning the training of self-evolving LLM agents:

The ‘Echo Trap’ and the need for stability

A recurring problem observed during multi-turn RL training was dubbed the “Echo Trap”. Agents would initially improve but then suffer performance collapse, overfitting to locally rewarded reasoning patterns.

This was marked by collapsing reward variance, falling entropy (a measure of randomness/exploration), and sudden spikes in gradients (indicating training instability). Early signs included drops in reward standard deviation and output entropy.

To combat this, the team developed StarPO-S, a stabilised version of the framework. StarPO-S incorporates:

Variance-based trajectory filtering: Focusing training on task instances where the agent’s behaviour shows higher uncertainty (higher reward variance), discarding low-variance, less informative rollouts. This improved stability and efficiency.
Critic incorporation: Using methods like PPO (Proximal Policy Optimisation), which employ a ‘critic’ to estimate value, generally showed better stability than critic-free methods like GRPO (Group Relative Policy Optimisation) in most tests.
Decoupled clipping and KL removal: Techniques adapted from other research (DAPO) involving asymmetric clipping (allowing more aggressive learning from positive rewards) and removing KL divergence penalties (encouraging exploration) further boosted stability and performance.

StarPO-S consistently delayed collapse and improved final task performance compared to vanilla StarPO.

Rollout quality is crucial

The characteristics of the ‘rollouts’ (simulated interaction trajectories used for training) significantly impact learning. Key factors identified include:

Task diversity: Training with a diverse set of initial states (prompts), but with multiple responses generated per prompt, aids generalisation. The sweet spot seemed to be moderate diversity enabling contrast between different outcomes in similar scenarios.
Interaction granularity: Allowing multiple actions per turn (around 5-6 proved optimal) enables better planning within a fixed turn limit, without introducing the noise associated with excessively long action sequences.
Rollout frequency: Using fresh, up-to-date rollouts that reflect the agent’s current policy is vital. More frequent sampling (approaching an ‘online’ setting) leads to faster convergence and better generalisation by reducing policy-data mismatch.

Maintaining freshness, alongside appropriate action budgets and task diversity, is key for stable training.

Reasoning requires careful reward design

Simply prompting models to ‘think’ doesn’t guarantee meaningful reasoning emerges, especially in multi-turn tasks. The study found:

Reasoning traces helped generalisation in the simpler, single-turn Bandit task, even when symbolic cues conflicted with rewards.
In multi-turn tasks like Sokoban, reasoning benefits were limited, and the length of ‘thinking’ segments consistently declined during training. Agents often regressed to direct action selection or produced “hallucinated reasoning” if rewards only tracked task success, revealing a “mismatch between thoughts and environment states.”

This suggests that standard trajectory-level rewards (often sparse and outcome-based) are insufficient.

“Without fine-grained, reasoning-aware reward signals, agent reasoning hardly emerge[s] through multi-turn RL.”

The researchers propose that future work should explore rewards that explicitly evaluate the quality of intermediate reasoning steps, perhaps using format-based penalties or rewarding explanation quality, rather than just final outcomes.

RAGEN and StarPO: A step towards self-evolving AI

The RAGEN system and StarPO framework represent a step towards training LLM agents that can reason and adapt through interaction in complex, unpredictable environments.

This research highlights the unique stability challenges posed by multi-turn RL and offers concrete strategies – like StarPO-S’s filtering and stabilisation techniques – to mitigate them. It also underscores the critical role of rollout generation strategies and the need for more sophisticated reward mechanisms to cultivate genuine reasoning, rather than superficial strategies or hallucinations.

Why does your RL training always collapse?

In our new paper of RAGEN, we explore what breaks when you train LLM *Agents* with multi-turn reinforcement learning—and possibly how to fix it.

https://t.co/z0U0612HWT
https://t.co/4DUfaees48
1/ pic.twitter.com/Oy6ilkgimd
— Zihan Wang – on RAGEN (@wzihanw) April 23, 2025

While acknowledging limitations – including the need to test on larger models and optimise for domains without easily verifiable rewards – the work opens “a scalable and principled path for building AI systems” in areas demanding complex interaction and verifiable outcomes, such as theorem proving, software engineering, and scientific discovery.

(Image by Gerd Altmann)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post RAGEN: AI framework tackles LLM agent instability appeared first on AI News.

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Ryan Daws — Thu, 06 Mar 2025 09:14:13 +0000

The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.

The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, and adapt its reasoning based on environmental feedback.

“Scaling RL has the potential to enhance model performance beyond conventional pretraining and post-training methods,” the team stated. “Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models.”

QwQ-32B achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testament to the effectiveness of RL when applied to robust foundation models pretrained on extensive world knowledge. This remarkable outcome underscores the potential of RL to bridge the gap between model size and performance.

The model has been evaluated across a range of benchmarks, including AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to assess its mathematical reasoning, coding proficiency, and general problem-solving capabilities.

The results highlight QwQ-32B’s performance in comparison to other leading models, including DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.

Benchmark results:

AIME24: QwQ-32B achieved 79.5, slightly behind DeepSeek-R1-6718’s 79.8, but significantly ahead of OpenAl-o1-mini’s 63.6 and the distilled models.
LiveCodeBench: QwQ-32B scored 63.4, again closely matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled models and OpenAl-o1-mini’s 53.8.
LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled models and OpenAl-o1-mini’s 57.5.
IFEval: QwQ-32B scored 83.9, very close to DeepSeek-R1-6718’s 83.3, and leading the distilled models and OpenAl-o1-mini’s 59.1.
BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled models and OpenAl-o1-mini’s 49.3.

The Qwen team’s approach involved a cold-start checkpoint and a multi-stage RL process driven by outcome-based rewards. The initial stage focused on scaling RL for math and coding tasks, utilising accuracy verifiers and code execution servers. The second stage expanded to general capabilities, incorporating rewards from general reward models and rule-based verifiers.

“We find that this stage of RL training with a small amount of steps can increase the performance of other general capabilities, such as instruction following, alignment with human preference, and agent performance, without significant performance drop in math and coding,” the team explained.

QwQ-32B is open-weight and available on Hugging Face and ModelScope under the Apache 2.0 license, and is also accessible via Qwen Chat. The Qwen team views this as an initial step in scaling RL to enhance reasoning capabilities and aims to further explore the integration of agents with RL for long-horizon reasoning.

“As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI),” the team stated.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase appeared first on AI News.

DeepSeek-R1 reasoning models rival OpenAI in performance

Ryan Daws — Mon, 20 Jan 2025 14:36:16 +0000

DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks.

DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this approach has led to the natural emergence of “numerous powerful and interesting reasoning behaviours,” including self-verification, reflection, and the generation of extensive chains of thought (CoT).

“Notably, [DeepSeek-R1-Zero] is the first open research to validate that reasoning capabilities of LLMs can be incentivised purely through RL, without the need for SFT,” DeepSeek researchers explained. This milestone not only underscores the model’s innovative foundations but also paves the way for RL-focused advancements in reasoning AI.

However, DeepSeek-R1-Zero’s capabilities come with certain limitations. Key challenges include “endless repetition, poor readability, and language mixing,” which could pose significant hurdles in real-world applications. To address these shortcomings, DeepSeek developed its flagship model: DeepSeek-R1.

Introducing DeepSeek-R1

DeepSeek-R1 builds upon its predecessor by incorporating cold-start data prior to RL training. This additional pre-training step enhances the model’s reasoning capabilities and resolves many of the limitations noted in DeepSeek-R1-Zero.

Notably, DeepSeek-R1 achieves performance comparable to OpenAI’s much-lauded o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor.

DeepSeek has chosen to open-source both DeepSeek-R1-Zero and DeepSeek-R1 along with six smaller distilled models. Among these, DeepSeek-R1-Distill-Qwen-32B has demonstrated exceptional results—even outperforming OpenAI’s o1-mini across multiple benchmarks.

MATH-500 (Pass@1): DeepSeek-R1 achieved 97.3%, eclipsing OpenAI (96.4%) and other key competitors.
LiveCodeBench (Pass@1-COT): The distilled version DeepSeek-R1-Distill-Qwen-32B scored 57.2%, a standout performance among smaller models.
AIME 2024 (Pass@1): DeepSeek-R1 achieved 79.8%, setting an impressive standard in mathematical problem-solving.

DeepSeek-R1 is here!

Performance on par with OpenAI-o1
Fully open-source model & technical report
MIT licensed: Distill & commercialize freely!

Website & API are live now! Try DeepThink at https://t.co/v1TFy7LHNy today!

1/n pic.twitter.com/7BlpWAPu6y
— DeepSeek (@deepseek_ai) January 20, 2025

A pipeline to benefit the wider industry

DeepSeek has shared insights into its rigorous pipeline for reasoning model development, which integrates a combination of supervised fine-tuning and reinforcement learning.

According to the company, the process involves two SFT stages to establish the foundational reasoning and non-reasoning abilities, as well as two RL stages tailored for discovering advanced reasoning patterns and aligning these capabilities with human preferences.

“We believe the pipeline will benefit the industry by creating better models,” DeepSeek remarked, alluding to the potential of their methodology to inspire future advancements across the AI sector.

One standout achievement of their RL-focused approach is the ability of DeepSeek-R1-Zero to execute intricate reasoning patterns without prior human instruction—a first for the open-source AI research community.

Importance of distillation

DeepSeek researchers also highlighted the importance of distillation—the process of transferring reasoning abilities from larger models to smaller, more efficient ones, a strategy that has unlocked performance gains even for smaller configurations.

Smaller distilled iterations of DeepSeek-R1 – such as the 1.5B, 7B, and 14B versions – were able to hold their own in niche applications. The distilled models can outperform results achieved via RL training on models of comparable sizes.

Bonus: Open-Source Distilled Models!

Distilled from DeepSeek-R1, 6 small models fully open-sourced
32B & 70B models on par with OpenAI-o1-mini
Empowering the open-source community

Pushing the boundaries of **open AI**!

2/n pic.twitter.com/tfXLM2xtZZ
— DeepSeek (@deepseek_ai) January 20, 2025

For researchers, these distilled models are available in configurations spanning from 1.5 billion to 70 billion parameters, supporting Qwen2.5 and Llama3 architectures. This flexibility empowers versatile usage across a wide range of tasks, from coding to natural language understanding.

DeepSeek has adopted the MIT License for its repository and weights, extending permissions for commercial use and downstream modifications. Derivative works, such as using DeepSeek-R1 to train other large language models (LLMs), are permitted. However, users of specific distilled models should ensure compliance with the licences of the original base models, such as Apache 2.0 and Llama3 licences.

(Photo by Prateek Katyal)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DeepSeek-R1 reasoning models rival OpenAI in performance appeared first on AI News.

New AI training techniques aim to overcome current challenges

Joe Green — Thu, 28 Nov 2024 11:58:28 +0000

OpenAI and other leading AI companies are developing new training techniques to overcome limitations of current methods. Addressing unexpected delays and complications in the development of larger, more powerful language models, these fresh techniques focus on human-like behaviour to teach algorithms to ‘think.

Reportedly led by a dozen AI researchers, scientists, and investors, the new training techniques, which underpin OpenAI’s recent ‘o1’ model (formerly Q* and Strawberry), have the potential to transform the landscape of AI development. The reported advances may influence the types or quantities of resources AI companies need continuously, including specialised hardware and energy to aid the development of AI models.

The o1 model is designed to approach problems in a way that mimics human reasoning and thinking, breaking down numerous tasks into steps. The model also utilises specialised data and feedback provided by experts in the AI industry to enhance its performance.

Since ChatGPT was unveiled by OpenAI in 2022, there has been a surge in AI innovation, and many technology companies claim existing AI models require expansion, be it through greater quantities of data or improved computing resources. Only then can AI models consistently improve.

Now, AI experts have reported limitations in scaling up AI models. The 2010s were a revolutionary period for scaling, but Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, says that the training of AI models, particularly in the understanding language structures and patterns, has levelled off.

“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Scaling the right thing matters more now,” they said.

In recent times, AI lab researchers have experienced delays in and challenges to developing and releasing large language models (LLM) that are more powerful than OpenAI’s GPT-4 model.

First, there is the cost of training large models, often running into tens of millions of dollars. And, due to complications that arise, like hardware failing due to system complexity, a final analysis of how these models run can take months.

In addition to these challenges, training runs require substantial amounts of energy, often resulting in power shortages that can disrupt processes and impact the wider electriciy grid. Another issue is the colossal amount of data large language models use, so much so that AI models have reportedly used up all accessible data worldwide.

Researchers are exploring a technique known as ‘test-time compute’ to improve current AI models when being trained or during inference phases. The method can involve the generation of multiple answers in real-time to decide on a range of best solutions. Therefore, the model can allocate greater processing resources to difficult tasks that require human-like decision-making and reasoning. The aim – to make the model more accurate and capable.

Noam Brown, a researcher at OpenAI who helped develop the o1 model, shared an example of how a new approach can achieve surprising results. At the TED AI conference in San Francisco last month, Brown explained that “having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer.”

Rather than simply increasing the model size and training time, this can change how AI models process information and lead to more powerful, efficient systems.

It is reported that other AI labs have been developing versions of the o1 technique. The include xAI, Google DeepMind, and Anthropic. Competition in the AI world is nothing new, but we could see a significant impact on the AI hardware market as a result of new techniques. Companies like Nvidia, which currently dominates the supply of AI chips due to the high demand for their products, may be particularly affected by updated AI training techniques.

Nvidia became the world’s most valuable company in October, and its rise in fortunes can be largely attributed to its chips’ use in AI arrays. New techniques may impact Nvidia’s market position, forcing the company to adapt its products to meet the evolving AI hardware demand. Potentially, this could open more avenues for new competitors in the inference market.

A new age of AI development may be on the horizon, driven by evolving hardware demands and more efficient training methods such as those deployed in the o1 model. The future of both AI models and the companies behind them could be reshaped, unlocking unprecedented possibilities and greater competition.

The post New AI training techniques aim to overcome current challenges appeared first on AI News.

How cold hard data science harnesses AI with Wolfram Research

AI News — Mon, 30 Sep 2024 14:34:30 +0000

It’s sometimes difficult to distinguish the reality of technology from the hype and marketing messages that bombard our inboxes daily. In just the last five years, we’ve probably heard too much about the metaverse, blockchain and virtual reality, for example. At present, we’re in the midst of a furore about the much-abused term ‘AI’, and time will tell whether this particular storm will be seen as a teacup resident.

Artificial Intelligence News spoke exclusively to Jon McLoone, the Director of Technical Communication and Strategy at of one the most mature organisations in the computational intelligence and scientific innovation space, Wolfram Research, to help us put our present concepts of AI and their practical uses into a deeper context.

Jon has worked at Wolfram Research for 32 years in various roles, currently leading the European Technical Services team. A mathematician by training and a skilled practitioner in many aspects of data analysis, we began our interview by having him describe Wolfram’s work in an elevator pitch format.

Jon McLoone

“Our value proposition is that we know computation and Wolfram technology. We tailor our technology to the problem that an organisation has. That’s across a broad range of things. So, we don’t have a typical customer. What they have in common is they’re doing something innovative.”

“We’re doing problem-solving, the type of things that use computation and data science. We’re building out a unified platform for computation, and when we talk about computation, we mean the kinds of technical computing, like engineering calculations, data science and machine learning. It’s things like social network analysis, biosciences, actuarial science, and financial computations. Abstractly, these are all fundamentally mathematical things.”

“Our world is all those structured areas where we’ve spent 30 years building out different ontologies. We have a symbolic representation of the maths, but also things like graphs and networks, documents, videos, images, audio, time series, entities in the real world, like cities, rivers, and mountains. My team is doing the fun stuff of actually making it do something useful!”

“AI we just see as another kind of computation. There were different algorithms that have been developed over years, some of them hundreds of years ago, some of them only tens of years ago. Gen AI just adds to this list.”

Claims made about AI in 2024 can sometimes be overoptimistic, so we need to be realistic about its capabilities and consider what it excels at and where it falls short.

“There’s still human intelligence, which still remains as the strategic element. You’re not going to say, in the next five years AI will run my company and make decisions. Generative AI is very fluent but is unreliable. Its job is to be plausible, not to be correct. And particularly when you get into the kinds of things Wolfram does, it’s terrible because it will tell you the kinds of things that your mathematical answer would look like.” (Artificial Intelligence News‘ italics.)

The work of Wolfram Research in this context focuses on what Jon terms ‘symbolic AI’. To differentiate generative and symbolic AI, he gave us the analogy of modelling the trajectory of a thrown ball. A generative AI would learn how the ball travels by examining many thousands of such throws and then be able to produce a description of the trajectory. “That description would be plausible. That kind of model is data-rich, understanding poor.”

A symbolic representation of the thrown ball, on the other hand, would involve differential equations for projectile motion and representations of elements: mass, viscosity of the atmosphere, friction, and many other factors. “It could then be asked, ‘What happens if I throw the ball on Mars?’ It’ll say something accurate. It’s not going to fail.”

The ideal way to solve business (or scientific, medical, or engineering) problems is a combination of human intelligence, symbolic reasoning, as epitomised in Wolfram Language, and what we now term AI acting as the glue between them. AI is a great technology for interpreting meaning and acting as an interface between the component parts.

“Some of the interesting crossovers are where we take natural language and turn that into some structured information that you can then compute with. Human language is very messy and ambiguous, and generative AI is very good at mapping that to some structure. Once you’re in a structured world of something that is syntactically formal, then you can do things on it.”

A recent example of combining ‘traditional’ AI with the work of Wolfram involved medical records:

“We did a project recently taking medical reports, which were handwritten, typed and digital. But they contain words, and trying to do statistics on those isn’t possible. And so, you’ve got to use the generative AI part for mapping all of these words to things like classes: was this an avoidable death? Yes. No. That’s a nice, structured key value pair. And then once we’ve got that information in structured form (for example a piece of JSON or XML, or whatever your chosen structure), we can then do classical statistics to start saying, ‘Is there a trend? Can we project? Was there an impact from COVID on hospital harms?’ Clear-cut questions that you can approach symbolically with things like means and medians and models.”

During our interview, Jon also gave a précis of a presentation, which took as its example of his organisation’s work, an imaginary peanut butter cup manufacturing plant. What might be the effects of changing out a particular ingredient or altering some detail of the recipe and the effects of that change on the product’s shelf life?

“LLMs (large language models) will say, ‘Oh, they’ll probably last a few weeks because peanut butter cups usually sit on the shelf a few weeks. But going to a computational model that can plug into the ingredients, and compute, and you’ll know this thing should last for eight weeks before it goes off. Or what that change might do to the manufacturing process? A computational model can connect to the digital twin of your manufacturing plant and learn, ‘That will slow things down by 3%, so your productivity will fall by 20% because it creates a bottleneck here.’ LLMs are great at connecting you and your question to the model, maths, data science or the database. And that’s really an interesting three-way meeting of minds.”

You can catch Wolfram Research at the upcoming TechEx event in Amsterdam, October 1-2, at stand 166 of the AI & Big Data strand. We can’t guarantee any peanut butter-related discussion at the event, but to discover how powerful modelling and generative AI can be harnessed to solve your specific problems and quandaries, contact the company via its website.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post How cold hard data science harnesses AI with Wolfram Research appeared first on AI News.

Enhancing healthcare documentation with IDP

AI News — Thu, 26 Sep 2024 16:01:57 +0000

Healthcare documentation is an integral part of the sector that ensures the delivery of high-quality care and maintains the continuity of patient information. However, as healthcare providers have to deal with excessive amounts of data, managing it can feel overwhelming. With the advent of intelligent document processing technology, a new solution can now be implemented. This article explores how such technology works, its role in healthcare documentation, and its benefits, limitations, and implications for the future.

Intelligent document processing and its importance

Intelligent document processing is a more advanced type of automation based on AI technology, machine learning, natural language processing, and optical character recognition to collect, process, and organise data from multiple forms of paperwork. Unlike traditional document systems, IDP can handle unstructured and semi-structured data for multiple healthcare documents, which can exist in various forms. As such data is based on advanced, permanent algorithms and artificial intelligence tools, IDP can enhance the functions of healthcare providers and assist them in the care delivery process.

IDP’s role in healthcare documentation

Multiple forms of documents, like health, employment, or insurance records, reports, notes, forms, and social documents, have to be dealt with by multiple providers daily. IDP can reduce the need for inefficient data management processes through:

Automating the data extraction process by automatically capturing the essential information from the documents. Thus, it reduces the human factor and enhance performance,
Establishing more accurate data With AI algorithms. IDP ensures that the data captured is accurate and consistent; crucial for patient safety and care quality,
Organising data in a searchable format to allow better data access.
Ensuring compliance with regulations like HIPAA by securely managing sensitive patient data and providing audit trails.

Benefits of IDP in healthcare

The implementation of IDP in healthcare comes with several benefits:

Increased efficiency: By automating routine tasks, healthcare providers can focus more on patient care rather than paperwork,
Cost reduction: IDP reduces the need for manual data entry and paper-based processes, leading to significant cost savings,
Better patient experience: Quick access to patient history and records leads to more informed decision-making and personalised care,
Scalability: As healthcare facilities grow, IDP systems can easily scale to manage increased data volumes without compromising performance.

Challenges in implementing IDP

While IDP offers many advantages, there are challenges to its adoption:

Integration with existing systems: Integrating IDP with current healthcare IT ecosystems can be complex and requires careful planning,
Data privacy concerns: Protecting patient data is paramount, and IDP must adhere to stringent security standards,
Change management: Staff may resist shifting from manual to automated processes, necessitating adequate training and change management strategies.

Future of IDP in healthcare

In the future, IDP is likely to increase its impact in the healthcare field. Given the rise of AI and machine learning, the corresponding systems will become increasingly sophisticated, likely providing predictive analytics and decision support services. This could help improve diagnostic precision and create a more personalised patient treatment plan, eventually leading to better outcomes. In addition, IDP may facilitate data exchange between different healthcare systems.

Conclusion

Intelligent document processing is a typical solution that is bound to become increasingly impactful in healthcare. It may help healthcare professionals deal more effectively with the contemporary challenges of patient data. Although challenges exist, the potential results of improved client care, decreased expenses, and more precise data make IDP an invaluable asset. Thus, it can be concluded that Intelligent Document Processing should be considered one of the healthcare industry’s future solutions in its quest toward digitalisation.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Enhancing healthcare documentation with IDP appeared first on AI News.

Primate Labs launches Geekbench AI benchmarking tool

Ryan Daws — Fri, 16 Aug 2024 09:13:49 +0000

Primate Labs has officially launched Geekbench AI, a benchmarking tool designed specifically for machine learning and AI-centric workloads.

The release of Geekbench AI 1.0 marks the culmination of years of development and collaboration with customers, partners, and the AI engineering community. The benchmark, previously known as Geekbench ML during its preview phase, has been rebranded to align with industry terminology and ensure clarity about its purpose.

Geekbench AI is now available for Windows, macOS, and Linux through the Primate Labs website, as well as on the Google Play Store and Apple App Store for mobile devices.

Primate Labs’ latest benchmarking tool aims to provide a standardised method for measuring and comparing AI capabilities across different platforms and architectures. The benchmark offers a unique approach by providing three overall scores, reflecting the complexity and heterogeneity of AI workloads.

“Measuring performance is, put simply, really hard,” explained Primate Labs. “That’s not because it’s hard to run an arbitrary test, but because it’s hard to determine which tests are the most important for the performance you want to measure – especially across different platforms, and particularly when everyone is doing things in subtly different ways.”

The three-score system accounts for the varied precision levels and hardware optimisations found in modern AI implementations. This multi-dimensional approach allows developers, hardware vendors, and enthusiasts to gain deeper insights into a device’s AI performance across different scenarios.

A notable addition to Geekbench AI is the inclusion of accuracy measurements for each test. This feature acknowledges that AI performance isn’t solely about speed but also about the quality of results. By combining speed and accuracy metrics, Geekbench AI provides a more holistic view of AI capabilities, helping users understand the trade-offs between performance and precision.

Geekbench AI 1.0 introduces support for a wide range of AI frameworks, including OpenVINO on Linux and Windows, and vendor-specific TensorFlow Lite delegates like Samsung ENN, ArmNN, and Qualcomm QNN on Android. This broad framework support ensures that the benchmark reflects the latest tools and methodologies used by AI developers.

The benchmark also utilises more extensive and diverse datasets, which not only enhance the accuracy evaluations but also better represent real-world AI use cases. All workloads in Geekbench AI 1.0 run for a minimum of one second, allowing devices to reach their maximum performance levels during testing while still reflecting the bursty nature of real-world applications.

Primate Labs has published detailed technical descriptions of the workloads and models used in Geekbench AI 1.0, emphasising their commitment to transparency and industry-standard testing methodologies. The benchmark is integrated with the Geekbench Browser, facilitating easy cross-platform comparisons and result sharing.

The company anticipates regular updates to Geekbench AI to keep pace with market changes and emerging AI features. However, Primate Labs believes that Geekbench AI has already reached a level of reliability that makes it suitable for integration into professional workflows, with major tech companies like Samsung and Nvidia already utilising the benchmark.

(Image Credit: Primate Labs)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Primate Labs launches Geekbench AI benchmarking tool appeared first on AI News.

Qwen2-Math: A new era for AI maths whizzes

Ryan Daws — Fri, 09 Aug 2024 12:46:18 +0000

Alibaba Cloud’s Qwen team has unveiled Qwen2-Math, a series of large language models specifically designed to tackle complex mathematical problems.

These new models – built upon the existing Qwen2 foundation – demonstrate remarkable proficiency in solving arithmetic and mathematical challenges, and outperform former industry leaders.

The Qwen team crafted Qwen2-Math using a vast and diverse Mathematics-specific Corpus. This corpus comprises a rich tapestry of high-quality resources, including web texts, books, code, exam questions, and synthetic data generated by Qwen2 itself.

Rigorous evaluation on both English and Chinese mathematical benchmarks – including GSM8K, Math, MMLU-STEM, CMATH, and GaoKao Math – revealed the exceptional capabilities of Qwen2-Math. Notably, the flagship model, Qwen2-Math-72B-Instruct, surpassed the performance of proprietary models such as GPT-4o and Claude 3.5 in various mathematical tasks.

“Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models,” the Qwen team noted.

This superior performance is attributed to the effective implementation of a math-specific reward model during the development process.

Further showcasing its prowess, Qwen2-Math demonstrated impressive results in challenging mathematical competitions like the American Invitational Mathematics Examination (AIME) 2024 and the American Mathematics Contest (AMC) 2023.

To ensure the model’s integrity and prevent contamination, the Qwen team implemented robust decontamination methods during both the pre-training and post-training phases. This rigorous approach involved removing duplicate samples and identifying overlaps with test sets to maintain the model’s accuracy and reliability.

Looking ahead, the Qwen team plans to expand Qwen2-Math’s capabilities beyond English, with bilingual and multilingual models in the pipeline. This commitment to inclusivity aims to make advanced mathematical problem-solving accessible to a global audience.

“We will continue to enhance our models’ ability to solve complex and challenging mathematical problems,” affirmed the Qwen team.

You can find the Qwen2 models on Hugging Face here.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Qwen2-Math: A new era for AI maths whizzes appeared first on AI News.

Baidu deploys its ERNIE Bot generative AI to the public

Ryan Daws — Thu, 31 Aug 2023 15:15:49 +0000

Chinese tech giant Baidu has announced that its generative AI product ERNIE Bot is now open to the public through various app stores and its website.

ERNIE Bot can generate text, images, and videos based on natural language inputs. It is powered by ERNIE (Enhanced Representation through Knowledge Integration), a powerful deep learning model.

The first version of ERNIE was introduced and open-sourced in 2019 by researchers at Tsinghua University to demonstrate the natural language understanding capabilities of a model that combines both text and knowledge graph data.

Later that year, Baidu released ERNIE 2.0 which became the first model model to set a score higher than 90 on the GLUE benchmark for evaluating natural language understanding systems.

In 2021, Baidu’s researchers posted a paper on ERNIE 3.0 in which they claim the model exceeds human performance on the SuperGLUE natural language benchmark. ERNIE 3.0 set a new top score on SuperGLUE and displaced efforts from Google and Microsoft.

According to Baidu’s CEO Robin Li, opening up ERNIE Bot to the public will enable the company to obtain more human feedback and improve the user experience. He said that ERNIE Bot is a showcase of the four core abilities of generative AI: understanding, generation, reasoning, and memory. He also said that ERNIE Bot can help users with various tasks such as writing, learning, entertainment, and work.

Baidu first unveiled ERNIE Bot in March this year, demonstrating its capabilities in different domains such as literature, art, and science. For example, ERNIE Bot can summarise a sci-fi novel and offer suggestions on how to continue the story in an expanded universe. It can also generate images and videos based on text inputs, such as creating a portrait of a fictional character or a scene from a movie.

Earlier this month, Baidu revealed that ERNIE Bot’s training throughput had increased three-fold since March and that it had achieved new milestones in data analysis and visualisation. ERNIE Bot can now generate results more quickly and handle image inputs as well. For instance, ERNIE Bot can analyse an image of a pie chart and generate a summary of the data in natural language.

Baidu is one of the first Chinese companies to obtain approval from authorities to release generative AI experiences to the public, according to Bloomberg. The report suggests that officials see AI as a “business and political imperative” for China and want to ensure that the technology is used in a responsible and ethical manner.

Beijing is keen on putting guardrails in place to prevent the spread of harmful or illegal content while still enabling Chinese companies to compete with overseas rivals in the field of AI.

Beijing’s AI guardrails

The “guardrails” include the rules published by the Chinese authorities in July 2023 that govern generative AI in China.

China’s rules go substantially beyond current regulations in other parts of the world and aim to ensure that generative AI is used in a responsible and ethical manner. The rules cover various aspects of generative AI, such as content, data, technology, fairness, and licensing.

One notable requirement is that operators of generative AI must ensure that their services adhere to the core values of socialism, while also avoiding content that incites subversion of state power, secession, terrorism, or any actions undermining national unity and social stability.

Generative AI services within China are also prohibited from promoting content that provokes ethnic hatred and discrimination, violence, obscenity, or false and harmful information.

Furthermore, the regulations reveal China’s interest in developing digital public goods for generative AI. The document emphasises the promotion of public training data resource platforms and the collaborative sharing of model-making hardware to enhance utilisation rates. The authorities also aim to encourage the orderly opening of public data classification and the expansion of high-quality public training data resources.

In terms of technology development, the rules stipulate that AI should be developed using secure and proven tools, including chips, software, tools, computing power, and data resources.

Intellectual property rights – an often contentious issue – must be respected when using data for model development, and the consent of individuals must be obtained before incorporating personal information. There is also a focus on improving the quality, authenticity, accuracy, objectivity, and diversity of training data.

To ensure fairness and non-discrimination, developers are required to create algorithms that do not discriminate based on factors such as ethnicity, belief, country, region, gender, age, occupation, or health. Moreover, operators of generative AI must obtain licenses for their services under most circumstances, adding a layer of regulatory oversight.

China’s rules not only have implications for domestic AI operators but also serve as a benchmark for international discussions on AI governance and ethical practices.

(Image Credit: Alpha Photo under CC BY-NC 2.0 license)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu deploys its ERNIE Bot generative AI to the public appeared first on AI News.

IBM Research unveils breakthrough analog AI chip for efficient deep learning

Ryan Daws — Fri, 11 Aug 2023 11:02:50 +0000

IBM Research has unveiled a groundbreaking analog AI chip that demonstrates remarkable efficiency and accuracy in performing complex computations for deep neural networks (DNNs).

This breakthrough, published in a recent paper in Nature Electronics, signifies a significant stride towards achieving high-performance AI computing while substantially conserving energy.

The traditional approach of executing deep neural networks on conventional digital computing architectures poses limitations in terms of performance and energy efficiency. These digital systems entail constant data transfer between memory and processing units, slowing down computations and reducing energy optimisation.

To tackle these challenges, IBM Research has harnessed the principles of analog AI, which emulates the way neural networks function in biological brains. This approach involves storing synaptic weights using nanoscale resistive memory devices, specifically Phase-change memory (PCM).

PCM devices alter their conductance through electrical pulses, enabling a continuum of values for synaptic weights. This analog method mitigates the need for excessive data transfer, as computations are executed directly in the memory—resulting in enhanced efficiency.

The newly introduced chip is a cutting-edge analog AI solution composed of 64 analog in-memory compute cores.

Each core integrates a crossbar array of synaptic unit cells alongside compact analog-to-digital converters, seamlessly transitioning between analog and digital domains. Furthermore, digital processing units within each core manage nonlinear neuronal activation functions and scaling operations. The chip also boasts a global digital processing unit and digital communication pathways for interconnectivity.

The research team demonstrated the chip’s prowess by achieving an accuracy of 92.81 percent on the CIFAR-10 image dataset—an unprecedented level of precision for analog AI chips.

The throughput per area, measured in Giga-operations per second (GOPS) by area, underscored its superior compute efficiency compared to previous in-memory computing chips. This innovative chip’s energy-efficient design coupled with its enhanced performance makes it a milestone achievement in the field of AI hardware.

The analog AI chip’s unique architecture and impressive capabilities lay the foundation for a future where energy-efficient AI computation is accessible across a diverse range of applications.

IBM Research’s breakthrough marks a pivotal moment that will help to catalyse advancements in AI-powered technologies for years to come.

(Image Credit: IBM Research)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The event is co-located with Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post IBM Research unveils breakthrough analog AI chip for efficient deep learning appeared first on AI News.

Azure and NVIDIA deliver next-gen GPU acceleration for AI

Ryan Daws — Wed, 09 Aug 2023 15:47:51 +0000

Microsoft Azure users are now able to harness the latest advancements in NVIDIA’s accelerated computing technology, revolutionising the training and deployment of their generative AI applications.

The integration of Azure ND H100 v5 virtual machines (VMs) with NVIDIA H100 Tensor Core GPUs and Quantum-2 InfiniBand networking promises seamless scaling of generative AI and high-performance computing applications, all at the click of a button.

This cutting-edge collaboration comes at a pivotal moment when developers and researchers are actively exploring the potential of large language models (LLMs) and accelerated computing to unlock novel consumer and business use cases.

NVIDIA’s H100 GPU achieves supercomputing-class performance through an array of architectural innovations. These include fourth-generation Tensor Cores, a new Transformer Engine for enhanced LLM acceleration, and NVLink technology that propels inter-GPU communication to unprecedented speeds of 900GB/sec.

The integration of the NVIDIA Quantum-2 CX7 InfiniBand – boasting 3,200 Gbps cross-node bandwidth – ensures flawless performance across GPUs, even at massive scales. This capability positions the technology on par with the computational capabilities of the world’s most advanced supercomputers.

The newly introduced ND H100 v5 VMs hold immense potential for training and inferring increasingly intricate LLMs and computer vision models. These neural networks power the most complex and compute-intensive generative AI applications, spanning from question answering and code generation to audio, video, image synthesis, and speech recognition.

A standout feature of the ND H100 v5 VMs is their ability to achieve up to a 2x speedup in LLM inference, notably demonstrated by the BLOOM 175B model when compared to previous generation instances. This performance boost underscores their capacity to optimise AI applications further, fueling innovation across industries.

The synergy between NVIDIA H100 Tensor Core GPUs and Microsoft Azure empowers enterprises with unparalleled AI training and inference capabilities. This partnership also streamlines the development and deployment of production AI, bolstered by the integration of the NVIDIA AI Enterprise software suite and Azure Machine Learning for MLOps.

The combined efforts have led to groundbreaking AI performance, as validated by industry-standard MLPerf benchmarks:

The integration of the NVIDIA Omniverse platform with Azure extends the reach of this collaboration further, providing users with everything they need for industrial digitalisation and AI supercomputing.

(Image Credit: Uwe Hoh from Pixabay)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Azure and NVIDIA deliver next-gen GPU acceleration for AI appeared first on AI News.

Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs

Ryan Daws — Mon, 24 Jul 2023 11:27:02 +0000

AI News spoke with Damian Bogunowicz, a machine learning engineer at Neural Magic, to shed light on the company’s innovative approach to deep learning model optimisation and inference on CPUs.

One of the key challenges in developing and deploying deep learning models lies in their size and computational requirements. However, Neural Magic tackles this issue head-on through a concept called compound sparsity.

Compound sparsity combines techniques such as unstructured pruning, quantisation, and distillation to significantly reduce the size of neural networks while maintaining their accuracy.

“We have developed our own sparsity-aware runtime that leverages CPU architecture to accelerate sparse models. This approach challenges the notion that GPUs are necessary for efficient deep learning,” explains Bogunowicz.

Bogunowicz emphasised the benefits of their approach, highlighting that more compact models lead to faster deployments and can be run on ubiquitous CPU-based machines. The ability to optimise and run specified networks efficiently without relying on specialised hardware is a game-changer for machine learning practitioners, empowering them to overcome the limitations and costs associated with GPU usage.

When asked about the suitability of sparse neural networks for enterprises, Bogunowicz explained that the vast majority of companies can benefit from using sparse models.

By removing up to 90 percent of parameters without impacting accuracy, enterprises can achieve more efficient deployments. While extremely critical domains like autonomous driving or autonomous aeroplanes may require maximum accuracy and minimal sparsity, the advantages of sparse models outweigh the limitations for the majority of businesses.

Looking ahead, Bogunowicz expressed his excitement about the future of large language models (LLMs) and their applications.

“I’m particularly excited about the future of large language models LLMs. Mark Zuckerberg discussed enabling AI agents, acting as personal assistants or salespeople, on platforms like WhatsApp,” says Bogunowicz.

One example that caught his attention was a chatbot used by Khan Academy—an AI tutor that guides students to solve problems by providing hints rather than revealing solutions outright. This application demonstrates the value that LLMs can bring to the education sector, facilitating the learning process while empowering students to develop problem-solving skills.

“Our research has shown that you can optimise LLMs efficiently for CPU deployment. We have published a research paper on SparseGPT that demonstrates the removal of around 100 billion parameters using one-shot pruning without compromising model quality,” explains Bogunowicz.

“This means there may not be a need for GPU clusters in the future of AI inference. Our goal is to soon provide open-source LLMs to the community and empower enterprises to have control over their products and models, rather than relying on big tech companies.”

As for Neural Magic’s future, Bogunowicz revealed two exciting developments they will be sharing at the upcoming AI & Big Data Expo Europe.

Firstly, they will showcase their support for running AI models on edge devices, specifically x86 and ARM architectures. This expands the possibilities for AI applications in various industries.

Secondly, they will unveil their model optimisation platform, Sparsify, which enables the seamless application of state-of-the-art pruning, quantisation, and distillation algorithms through a user-friendly web app and simple API calls. Sparsify aims to accelerate inference without sacrificing accuracy, providing enterprises with an elegant and intuitive solution.

Neural Magic’s commitment to democratising machine learning infrastructure by leveraging CPUs is impressive. Their focus on compound sparsity and their upcoming advancements in edge computing demonstrate their dedication to empowering businesses and researchers alike.

As we eagerly await the developments presented at AI & Big Data Expo Europe, it’s clear that Neural Magic is poised to make a significant impact in the field of deep learning.

You can watch our full interview with Bogunowicz below:

(Photo by Google DeepMind on Unsplash)

Neural Magic is a key sponsor of this year’s AI & Big Data Expo Europe, which is being held in Amsterdam between 26-27 September 2023.

Swing by Neural Magic’s booth at stand #178 to learn more about how the company enables organisations to use compute-heavy models in a cost-efficient and scalable way.

The post Damian Bogunowicz, Neural Magic: On revolutionising deep learning with CPUs appeared first on AI News.