development Archives - AI News https://www.artificialintelligence-news.com/news/tag/development/ Artificial Intelligence News Wed, 30 Apr 2025 13:35:24 +0000 en-GB hourly 1 https://wordpress.org/?v=6.8.1 https://www.artificialintelligence-news.com/wp-content/uploads/2020/09/cropped-ai-icon-32x32.png development Archives - AI News https://www.artificialintelligence-news.com/news/tag/development/ 32 32 Meta beefs up AI security with new Llama tools  https://www.artificialintelligence-news.com/news/meta-beefs-up-ai-security-new-llama-tools/ https://www.artificialintelligence-news.com/news/meta-beefs-up-ai-security-new-llama-tools/#respond Wed, 30 Apr 2025 13:35:22 +0000 https://www.artificialintelligence-news.com/?p=106233 If you’re building with AI, or trying to defend against the less savoury side of the technology, Meta just dropped new Llama security tools. The improved security tools for the Llama AI models arrive alongside fresh resources from Meta designed to help cybersecurity teams harness AI for defence. It’s all part of their push to […]

The post Meta beefs up AI security with new Llama tools  appeared first on AI News.

]]>
If you’re building with AI, or trying to defend against the less savoury side of the technology, Meta just dropped new Llama security tools.

The improved security tools for the Llama AI models arrive alongside fresh resources from Meta designed to help cybersecurity teams harness AI for defence. It’s all part of their push to make developing and using AI a bit safer for everyone involved.

Developers working with the Llama family of models now have some upgraded kit to play with. You can grab these latest Llama Protection tools directly from Meta’s own Llama Protections page, or find them where many developers live: Hugging Face and GitHub.

First up is Llama Guard 4. Think of it as an evolution of Meta’s customisable safety filter for AI. The big news here is that it’s now multimodal so it can understand and apply safety rules not just to text, but to images as well. That’s crucial as AI applications get more visual. This new version is also being baked into Meta’s brand-new Llama API, which is currently in a limited preview.

Then there’s LlamaFirewall. This is a new piece of the puzzle from Meta, designed to act like a security control centre for AI systems. It helps manage different safety models working together and hooks into Meta’s other protection tools. Its job? To spot and block the kind of risks that keep AI developers up at night – things like clever ‘prompt injection’ attacks designed to trick the AI, potentially dodgy code generation, or risky behaviour from AI plug-ins.

Meta has also given its Llama Prompt Guard a tune-up. The main Prompt Guard 2 (86M) model is now better at sniffing out those pesky jailbreak attempts and prompt injections. More interestingly, perhaps, is the introduction of Prompt Guard 2 22M.

Prompt Guard 2 22M is a much smaller, nippier version. Meta reckons it can slash latency and compute costs by up to 75% compared to the bigger model, without sacrificing too much detection power. For anyone needing faster responses or working on tighter budgets, that’s a welcome addition.

But Meta isn’t just focusing on the AI builders; they’re also looking at the cyber defenders on the front lines of digital security. They’ve heard the calls for better AI-powered tools to help in the fight against cyberattacks, and they’re sharing some updates aimed at just that.

The CyberSec Eval 4 benchmark suite has been updated. This open-source toolkit helps organisations figure out how good AI systems actually are at security tasks. This latest version includes two new tools:

  • CyberSOC Eval: Built with the help of cybersecurity experts CrowdStrike, this framework specifically measures how well AI performs in a real Security Operation Centre (SOC) environment. It’s designed to give a clearer picture of AI’s effectiveness in threat detection and response. The benchmark itself is coming soon.
  • AutoPatchBench: This benchmark tests how good Llama and other AIs are at automatically finding and fixing security holes in code before the bad guys can exploit them.

To help get these kinds of tools into the hands of those who need them, Meta is kicking off the Llama Defenders Program. This seems to be about giving partner companies and developers special access to a mix of AI solutions – some open-source, some early-access, some perhaps proprietary – all geared towards different security challenges.

As part of this, Meta is sharing an AI security tool they use internally: the Automated Sensitive Doc Classification Tool. It automatically slaps security labels on documents inside an organisation. Why? To stop sensitive info from walking out the door, or to prevent it from being accidentally fed into an AI system (like in RAG setups) where it could be leaked.

They’re also tackling the problem of fake audio generated by AI, which is increasingly used in scams. The Llama Generated Audio Detector and Llama Audio Watermark Detector are being shared with partners to help them spot AI-generated voices in potential phishing calls or fraud attempts. Companies like ZenDesk, Bell Canada, and AT&T are already lined up to integrate these.

Finally, Meta gave a sneak peek at something potentially huge for user privacy: Private Processing. This is new tech they’re working on for WhatsApp. The idea is to let AI do helpful things like summarise your unread messages or help you draft replies, but without Meta or WhatsApp being able to read the content of those messages.

Meta is being quite open about the security side, even publishing their threat model and inviting security researchers to poke holes in the architecture before it ever goes live. It’s a sign they know they need to get the privacy aspect right.

Overall, it’s a broad set of AI security announcements from Meta. They’re clearly trying to put serious muscle behind securing the AI they build, while also giving the wider tech community better tools to build safely and defend effectively.

See also: Alarming rise in AI-powered scams: Microsoft reveals $4B in thwarted fraud

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta beefs up AI security with new Llama tools  appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/meta-beefs-up-ai-security-new-llama-tools/feed/ 0
Meta will train AI models using EU user data https://www.artificialintelligence-news.com/news/meta-will-train-ai-models-using-eu-user-data/ https://www.artificialintelligence-news.com/news/meta-will-train-ai-models-using-eu-user-data/#respond Tue, 15 Apr 2025 16:32:02 +0000 https://www.artificialintelligence-news.com/?p=105325 Meta has confirmed plans to utilise content shared by its adult users in the EU (European Union) to train its AI models. The announcement follows the recent launch of Meta AI features in Europe and aims to enhance the capabilities and cultural relevance of its AI systems for the region’s diverse population.    In a statement, […]

The post Meta will train AI models using EU user data appeared first on AI News.

]]>
Meta has confirmed plans to utilise content shared by its adult users in the EU (European Union) to train its AI models.

The announcement follows the recent launch of Meta AI features in Europe and aims to enhance the capabilities and cultural relevance of its AI systems for the region’s diverse population.   

In a statement, Meta wrote: “Today, we’re announcing our plans to train AI at Meta using public content – like public posts and comments – shared by adults on our products in the EU.

“People’s interactions with Meta AI – like questions and queries – will also be used to train and improve our models.”

Starting this week, users of Meta’s platforms (including Facebook, Instagram, WhatsApp, and Messenger) within the EU will receive notifications explaining the data usage. These notifications, delivered both in-app and via email, will detail the types of public data involved and link to an objection form.

“We have made this objection form easy to find, read, and use, and we’ll honor all objection forms we have already received, as well as newly submitted ones,” Meta explained.

Meta explicitly clarified that certain data types remain off-limits for AI training purposes.

The company says it will not “use people’s private messages with friends and family” to train its generative AI models. Furthermore, public data associated with accounts belonging to users under the age of 18 in the EU will not be included in the training datasets.

Meta wants to build AI tools designed for EU users

Meta positions this initiative as a necessary step towards creating AI tools designed for EU users. Meta launched its AI chatbot functionality across its messaging apps in Europe last month, framing this data usage as the next phase in improving the service.

“We believe we have a responsibility to build AI that’s not just available to Europeans, but is actually built for them,” the company explained. 

“That means everything from dialects and colloquialisms, to hyper-local knowledge and the distinct ways different countries use humor and sarcasm on our products.”

This becomes increasingly pertinent as AI models evolve with multi-modal capabilities spanning text, voice, video, and imagery.   

Meta also situated its actions in the EU within the broader industry landscape, pointing out that training AI on user data is common practice.

“It’s important to note that the kind of AI training we’re doing is not unique to Meta, nor will it be unique to Europe,” the statement reads. 

“We’re following the example set by others including Google and OpenAI, both of which have already used data from European users to train their AI models.”

Meta further claimed its approach surpasses others in openness, stating, “We’re proud that our approach is more transparent than many of our industry counterparts.”   

Regarding regulatory compliance, Meta referenced prior engagement with regulators, including a delay initiated last year while awaiting clarification on legal requirements. The company also cited a favourable opinion from the European Data Protection Board (EDPB) in December 2024.

“We welcome the opinion provided by the EDPB in December, which affirmed that our original approach met our legal obligations,” wrote Meta.

Broader concerns over AI training data

While Meta presents its approach in the EU as transparent and compliant, the practice of using vast swathes of public user data from social media platforms to train large language models (LLMs) and generative AI continues to raise significant concerns among privacy advocates.

Firstly, the definition of “public” data can be contentious. Content shared publicly on platforms like Facebook or Instagram may not have been posted with the expectation that it would become raw material for training commercial AI systems capable of generating entirely new content or insights. Users might share personal anecdotes, opinions, or creative works publicly within their perceived community, without envisaging its large-scale, automated analysis and repurposing by the platform owner.

Secondly, the effectiveness and fairness of an “opt-out” system versus an “opt-in” system remain debatable. Placing the onus on users to actively object, often after receiving notifications buried amongst countless others, raises questions about informed consent. Many users may not see, understand, or act upon the notification, potentially leading to their data being used by default rather than explicit permission.

Thirdly, the issue of inherent bias looms large. Social media platforms reflect and sometimes amplify societal biases, including racism, sexism, and misinformation. AI models trained on this data risk learning, replicating, and even scaling these biases. While companies employ filtering and fine-tuning techniques, eradicating bias absorbed from billions of data points is an immense challenge. An AI trained on European public data needs careful curation to avoid perpetuating stereotypes or harmful generalisations about the very cultures it aims to understand.   

Furthermore, questions surrounding copyright and intellectual property persist. Public posts often contain original text, images, and videos created by users. Using this content to train commercial AI models, which may then generate competing content or derive value from it, enters murky legal territory regarding ownership and fair compensation—issues currently being contested in courts worldwide involving various AI developers.

Finally, while Meta highlights its transparency relative to competitors, the actual mechanisms of data selection, filtering, and its specific impact on model behaviour often remain opaque. Truly meaningful transparency would involve deeper insights into how specific data influences AI outputs and the safeguards in place to prevent misuse or unintended consequences.

The approach taken by Meta in the EU underscores the immense value technology giants place on user-generated content as fuel for the burgeoning AI economy. As these practices become more widespread, the debate surrounding data privacy, informed consent, algorithmic bias, and the ethical responsibilities of AI developers will undoubtedly intensify across Europe and beyond.

(Photo by Julio Lopez)

See also: Apple AI stresses privacy with synthetic and anonymised data

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta will train AI models using EU user data appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/meta-will-train-ai-models-using-eu-user-data/feed/ 0
Alibaba Cloud targets global AI growth with new models and tools https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/ https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/#respond Tue, 08 Apr 2025 17:56:13 +0000 https://www.artificialintelligence-news.com/?p=105235 Alibaba Cloud has expanded its AI portfolio for global customers with a raft of new models, platform enhancements, and Software-as-a-Service (SaaS) tools. The announcements, made during its Spring Launch 2025 online event, underscore the drive by Alibaba to accelerate AI innovation and adoption on a global scale. The digital technology and intelligence arm of Alibaba […]

The post Alibaba Cloud targets global AI growth with new models and tools appeared first on AI News.

]]>
Alibaba Cloud has expanded its AI portfolio for global customers with a raft of new models, platform enhancements, and Software-as-a-Service (SaaS) tools.

The announcements, made during its Spring Launch 2025 online event, underscore the drive by Alibaba to accelerate AI innovation and adoption on a global scale.

The digital technology and intelligence arm of Alibaba is focusing on meeting increasing demand for AI-driven digital transformation worldwide.

Selina Yuan, President of International Business at Alibaba Cloud Intelligence, said: “We are launching a series of Platform-as-a-Service(PaaS) and AI capability updates to meet the growing demand for digital transformation from across the globe.

“These upgrades allow us to deliver even more secure and high-performance services that empower businesses to scale and innovate in an AI-driven world.”

Alibaba expands access to foundational AI models

Central to the announcement is the broadened availability of Alibaba Cloud’s proprietary Qwen large language model (LLM) series for international clients, initially accessible via its Singapore availability zones.

This includes several specialised models:

  • Qwen-Max: A large-scale Mixture of Experts (MoE) model.
  • QwQ-Plus: An advanced reasoning model designed for complex analytical tasks, sophisticated question answering, and expert-level mathematical problem-solving.
  • QVQ-Max: A visual reasoning model capable of handling complex multimodal problems, supporting visual input and chain-of-thought output for enhanced accuracy.
  • Qwen2.5-Omni-7b: An end-to-end multimodal model.

These additions provide international businesses with more powerful and diverse tools for developing sophisticated AI applications.

Platform enhancements power AI scale

To support these advanced models, Alibaba Cloud’s Platform for AI (PAI) received significant upgrades aimed at delivering scalable, cost-effective, and user-friendly generative AI solutions.

Key enhancements include the introduction of distributed inference capabilities within the PAI-Elastic Algorithm Service (EAS). Utilising a multi-node architecture, this addresses the computational demands of super-large models – particularly those employing MoE structures or requiring ultra-long-text processing – to overcome limitations inherent in traditional single-node setups.

Furthermore, PAI-EAS now features a prefill-decode disaggregation function designed to boost performance and reduce operational costs.

Alibaba Cloud reported impressive results when deploying this with the Qwen2.5-72B model, achieving a 92% increase in concurrency and a 91% boost in tokens per second (TPS).

The PAI-Model Gallery has also been refreshed, now offering nearly 300 open-source models—including the complete range of Alibaba Cloud’s own open-source Qwen and Wan series. These are accessible via a no-code deployment and management interface.

Additional new PAI-Model Gallery features – like model evaluation and model distillation (transferring knowledge from large to smaller, more cost-effective models) – further enhance its utility.

Alibaba integrates AI into data management

Alibaba Cloud’s flagship cloud-native relational database, PolarDB, now incorporates native AI inference powered by Qwen.

PolarDB’s in-database machine learning capability eliminates the need to move data for inference workflows, which significantly cuts processing latency while improving efficiency and data security.

The feature is optimised for text-centric tasks such as developing conversational Retrieval-Augmented Generation (RAG) agents, generating text embeddings, and performing semantic similarity searches.

Additionally, the company’s data warehouse, AnalyticDB, is now integrated into Alibaba Cloud’s generative AI development platform Model Studio.

This integration serves as the recommended vector database for RAG solutions. This allows organisations to connect their proprietary knowledge bases directly with AI models on the platform to streamline the creation of context-aware applications.

New SaaS tools for industry transformation

Beyond infrastructure and platform layers, Alibaba Cloud introduced two new SaaS AI tools:

  • AI Doc: An intelligent document processing tool using LLMs to parse diverse documents (reports, forms, manuals) efficiently. It extracts specific information and can generate tailored reports, such as ESG reports when integrated with Alibaba Cloud’s Energy Expert sustainability solution.
  • Smart Studio: An AI-powered content creation platform supporting text-to-image, image-to-image, and text-to-video generation. It aims to enhance marketing and creative outputs in sectors like e-commerce, gaming, and entertainment, enabling features like virtual try-ons or generating visuals from text descriptions.

All these developments follow Alibaba’s announcement in February of a $53 billion investment over the next three years dedicated to advancing its cloud computing and AI infrastructure.

This colossal investment, noted as exceeding the company’s total AI and cloud expenditure over the previous decade, highlights a deep commitment to AI-driven growth and solidifies its position as a major global cloud provider.

“As cloud and AI become essential for global growth, we are committed to enhancing our core product offerings to address our customers’ evolving needs,” concludes Yuan.

See also: Amazon Nova Act: A step towards smarter, web-native AI agents

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alibaba Cloud targets global AI growth with new models and tools appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/alibaba-cloud-global-ai-growth-new-models-and-tools/feed/ 0
Study claims OpenAI trains AI models on copyrighted data https://www.artificialintelligence-news.com/news/study-claims-openai-trains-ai-models-copyrighted-data/ https://www.artificialintelligence-news.com/news/study-claims-openai-trains-ai-models-copyrighted-data/#respond Wed, 02 Apr 2025 09:04:28 +0000 https://www.artificialintelligence-news.com/?p=105119 A new study from the AI Disclosures Project has raised questions about the data OpenAI uses to train its large language models (LLMs). The research indicates the GPT-4o model from OpenAI demonstrates a “strong recognition” of paywalled and copyrighted data from O’Reilly Media books. The AI Disclosures Project, led by technologist Tim O’Reilly and economist […]

The post Study claims OpenAI trains AI models on copyrighted data appeared first on AI News.

]]>
A new study from the AI Disclosures Project has raised questions about the data OpenAI uses to train its large language models (LLMs). The research indicates the GPT-4o model from OpenAI demonstrates a “strong recognition” of paywalled and copyrighted data from O’Reilly Media books.

The AI Disclosures Project, led by technologist Tim O’Reilly and economist Ilan Strauss, aims to address the potentially harmful societal impacts of AI’s commercialisation by advocating for improved corporate and technological transparency. The project’s working paper highlights the lack of disclosure in AI, drawing parallels with financial disclosure standards and their role in fostering robust securities markets.

The study used a legally-obtained dataset of 34 copyrighted O’Reilly Media books to investigate whether LLMs from OpenAI were trained on copyrighted data without consent. The researchers applied the DE-COP membership inference attack method to determine if the models could differentiate between human-authored O’Reilly texts and paraphrased LLM versions.

Key findings from the report include:

  • GPT-4o shows “strong recognition” of paywalled O’Reilly book content, with an AUROC score of 82%. In contrast, OpenAI’s earlier model, GPT-3.5 Turbo, does not show the same level of recognition (AUROC score just above 50%)
  • GPT-4o exhibits stronger recognition of non-public O’Reilly book content compared to publicly accessible samples (82% vs 64% AUROC scores respectively)
  • GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples than non-public ones (64% vs 54% AUROC scores)
  • GPT-4o Mini, a smaller model, showed no knowledge of public or non-public O’Reilly Media content when tested (AUROC approximately 50%)

The researchers suggest that access violations may have occurred via the LibGen database, as all of the O’Reilly books tested were found there. They also acknowledge that newer LLMs have an improved ability to distinguish between human-authored and machine-generated language, which does not reduce the method’s ability to classify data.

The study highlights the potential for “temporal bias” in the results, due to language changes over time. To account for this, the researchers tested two models (GPT-4o and GPT-4o Mini) trained on data from the same period.

The report notes that while the evidence is specific to OpenAI and O’Reilly Media books, it likely reflects a systemic issue around the use of copyrighted data. It argues that uncompensated training data usage could lead to a decline in the internet’s content quality and diversity, as revenue streams for professional content creation diminish.

The AI Disclosures Project emphasises the need for stronger accountability in AI companies’ model pre-training processes. They suggest that liability provisions that incentivise improved corporate transparency in disclosing data provenance may be an important step towards facilitating commercial markets for training data licensing and remuneration.

The EU AI Act’s disclosure requirements could help trigger a positive disclosure-standards cycle if properly specified and enforced. Ensuring that IP holders know when their work has been used in model training is seen as a crucial step towards establishing AI markets for content creator data.

Despite evidence that AI companies may be obtaining data illegally for model training, a market is emerging in which AI model developers pay for content through licensing deals. Companies like Defined.ai facilitate the purchasing of training data, obtaining consent from data providers and stripping out personally identifiable information.

The report concludes by stating that using 34 proprietary O’Reilly Media books, the study provides empirical evidence that OpenAI likely trained GPT-4o on non-public, copyrighted data.

(Image by Sergei Tokmakov)

See also: Anthropic provides insights into the ‘AI biology’ of Claude

AI & Big Data Expo banner, a show where attendees will hear more about issues such as OpenAI allegedly using copyrighted data to train its new models.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Study claims OpenAI trains AI models on copyrighted data appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/study-claims-openai-trains-ai-models-copyrighted-data/feed/ 0
Amazon Nova Act: A step towards smarter, web-native AI agents https://www.artificialintelligence-news.com/news/amazon-nova-act-step-towards-smarter-web-native-ai-agents/ https://www.artificialintelligence-news.com/news/amazon-nova-act-step-towards-smarter-web-native-ai-agents/#respond Tue, 01 Apr 2025 16:57:43 +0000 https://www.artificialintelligence-news.com/?p=105105 Amazon has introduced Nova Act, an advanced AI model engineered for smarter agents that can execute tasks within web browsers. While large language models popularised the concept of “agents” as tools that answer queries or retrieve information via methods such as Retrieval-Augmented Generation (RAG), Amazon envisions something more robust. The company defines agents not just […]

The post Amazon Nova Act: A step towards smarter, web-native AI agents appeared first on AI News.

]]>
Amazon has introduced Nova Act, an advanced AI model engineered for smarter agents that can execute tasks within web browsers.

While large language models popularised the concept of “agents” as tools that answer queries or retrieve information via methods such as Retrieval-Augmented Generation (RAG), Amazon envisions something more robust. The company defines agents not just as responders but as entities capable of performing tangible, multi-step tasks in diverse digital and physical environments.

“Our dream is for agents to perform wide-ranging, complex, multi-step tasks like organising a wedding or handling complex IT tasks to increase business productivity,” said Amazon.

Current market offerings often fall short, with many agents requiring continuous human supervision and their functionality dependent on comprehensive API integration—something not feasible for all tasks. Nova Act is Amazon’s answer to these limitations.

Alongside the model, Amazon is releasing a research preview of the Amazon Nova Act SDK. Using the SDK, developers can create agents capable of automating web tasks like submitting out-of-office notifications, scheduling calendar holds, or enabling automatic email replies.

The SDK aims to break down complex workflows into dependable “atomic commands” such as searching, checking out, or interacting with specific interface elements like dropdowns or popups. Detailed instructions can be added to refine these commands, allowing developers to, for instance, instruct an agent to bypass an insurance upsell during checkout.

To further enhance accuracy, the SDK supports browser manipulation via Playwright, API calls, Python integrations, and parallel threading to overcome web page load delays.

Nova Act: Exceptional performance on benchmarks

Unlike other generative models that showcase middling accuracy on complex tasks, Nova Act prioritises reliability. Amazon highlights its model’s impressive scores of over 90% on internal evaluations for specific capabilities that typically challenge competitors. 

Nova Act achieved a near-perfect 0.939 on the ScreenSpot Web Text benchmark, which measures natural language instructions for text-based interactions, such as adjusting font sizes. Competing models such as Claude 3.7 Sonnet (0.900) and OpenAI’s CUA (0.883) trail behind by significant margins.

Similarly, Nova Act scored 0.879 in the ScreenSpot Web Icon benchmark, which tests interactions with visual elements like rating stars or icons. While the GroundUI Web test, designed to assess an AI’s proficiency in navigating various user interface elements, showed Nova Act slightly trailing competitors, Amazon sees this as an area ripe for improvement as the model evolves.

Amazon stresses its focus on delivering practical reliability. Once an agent built using Nova Act functions as expected, developers can deploy it headlessly, integrate it as an API, or even schedule it to run tasks asynchronously. In one demonstrated use case, an agent automatically orders a salad for delivery every Tuesday evening without requiring ongoing user intervention.

Amazon sets out its vision for scalable and smart AI agents

One of Nova Act’s standout features is its ability to transfer its user interface understanding to new environments with minimal additional training. Amazon shared an instance where Nova Act performed admirably in browser-based games, even though its training had not included video game experiences. This adaptability positions Nova Act as a versatile agent for diverse applications.

This capability is already being leveraged in Amazon’s own ecosystem. Within Alexa+, Nova Act enables self-directed web navigation to complete tasks for users, even when API access is not comprehensive enough. This represents a step towards smarter AI assistants that can function independently, harnessing their skills in more dynamic ways.

Amazon is clear that Nova Act represents the first stage in a broader mission to craft intelligent, reliable AI agents capable of handling increasingly complex, multi-step tasks. 

Expanding beyond simple instructions, Amazon’s focus is on training agents through reinforcement learning across varied, real-world scenarios rather than overly simplistic demonstrations. This foundational model serves as a checkpoint in a long-term training curriculum for Nova models, indicating the company’s ambition to reshape the AI agent landscape.

“The most valuable use cases for agents have yet to be built,” Amazon noted. “The best developers and designers will discover them. This research preview of our Nova Act SDK enables us to iterate alongside these builders through rapid prototyping and iterative feedback.”

Nova Act is a step towards making AI agents truly useful for complex, digital tasks. From rethinking benchmarks to emphasising reliability, its design philosophy is centred around empowering developers to move beyond what’s possible with current-generation tools. 

See also: Anthropic provides insights into the ‘AI biology’ of Claude

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Amazon Nova Act: A step towards smarter, web-native AI agents appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/amazon-nova-act-step-towards-smarter-web-native-ai-agents/feed/ 0
Anthropic provides insights into the ‘AI biology’ of Claude https://www.artificialintelligence-news.com/news/anthropic-provides-insights-ai-biology-of-claude/ https://www.artificialintelligence-news.com/news/anthropic-provides-insights-ai-biology-of-claude/#respond Fri, 28 Mar 2025 17:40:13 +0000 https://www.artificialintelligence-news.com/?p=105076 Anthropic has provided a more detailed look into the complex inner workings of their advanced language model, Claude. This work aims to demystify how these sophisticated AI systems process information, learn strategies, and ultimately generate human-like text. As the researchers initially highlighted, the internal processes of these models can be remarkably opaque, with their problem-solving […]

The post Anthropic provides insights into the ‘AI biology’ of Claude appeared first on AI News.

]]>
Anthropic has provided a more detailed look into the complex inner workings of their advanced language model, Claude. This work aims to demystify how these sophisticated AI systems process information, learn strategies, and ultimately generate human-like text.

As the researchers initially highlighted, the internal processes of these models can be remarkably opaque, with their problem-solving methods often “inscrutable to us, the model’s developers.”

Gaining a deeper understanding of this “AI biology” is paramount for ensuring the reliability, safety, and trustworthiness of these increasingly powerful technologies. Anthropic’s latest findings, primarily focused on their Claude 3.5 Haiku model, offer valuable insights into several key aspects of its cognitive processes.

One of the most fascinating discoveries suggests that Claude operates with a degree of conceptual universality across different languages. Through analysis of how the model processes translated sentences, Anthropic found evidence of shared underlying features. This indicates that Claude might possess a fundamental “language of thought” that transcends specific linguistic structures, allowing it to understand and apply knowledge learned in one language when working with another.

Anthropic’s research also challenged previous assumptions about how language models approach creative tasks like poetry writing.

Instead of a purely sequential, word-by-word generation process, Anthropic revealed that Claude actively plans ahead. In the context of rhyming poetry, the model anticipates future words to meet constraints like rhyme and meaning—demonstrating a level of foresight that goes beyond simple next-word prediction.

However, the research also uncovered potentially concerning behaviours. Anthropic found instances where Claude could generate plausible-sounding but ultimately incorrect reasoning, especially when grappling with complex problems or when provided with misleading hints. The ability to “catch it in the act” of fabricating explanations underscores the importance of developing tools to monitor and understand the internal decision-making processes of AI models.

Anthropic emphasises the significance of their “build a microscope” approach to AI interpretability. This methodology allows them to uncover insights into the inner workings of these systems that might not be apparent through simply observing their outputs. As they noted, this approach allows them to learn many things they “wouldn’t have guessed going in,” a crucial capability as AI models continue to evolve in sophistication.

The implications of this research extend beyond mere scientific curiosity. By gaining a better understanding of how AI models function, researchers can work towards building more reliable and transparent systems. Anthropic believes that this kind of interpretability research is vital for ensuring that AI aligns with human values and warrants our trust.

Their investigations delved into specific areas:

  • Multilingual understanding: Evidence points to a shared conceptual foundation enabling Claude to process and connect information across various languages.
  • Creative planning: The model demonstrates an ability to plan ahead in creative tasks, such as anticipating rhymes in poetry.
  • Reasoning fidelity: Anthropic’s techniques can help distinguish between genuine logical reasoning and instances where the model might fabricate explanations.
  • Mathematical processing: Claude employs a combination of approximate and precise strategies when performing mental arithmetic.
  • Complex problem-solving: The model often tackles multi-step reasoning tasks by combining independent pieces of information.
  • Hallucination mechanisms: The default behaviour in Claude is to decline answering if unsure, with hallucinations potentially arising from a misfiring of its “known entities” recognition system.
  • Vulnerability to jailbreaks: The model’s tendency to maintain grammatical coherence can be exploited in jailbreaking attempts.

Anthropic’s research provides detailed insights into the inner mechanisms of advanced language models like Claude. This ongoing work is crucial for fostering a deeper understanding of these complex systems and building more trustworthy and dependable AI.

(Photo by Bret Kavanaugh)

See also: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Anthropic provides insights into the ‘AI biology’ of Claude appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/anthropic-provides-insights-ai-biology-of-claude/feed/ 0
Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/ https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/#respond Wed, 26 Mar 2025 17:17:26 +0000 https://www.artificialintelligence-news.com/?p=105017 Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date. The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks. According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 […]

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date.

The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks.

According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 models are “thinking models”.  This signifies their capability to reason through their thoughts before generating a response, leading to enhanced performance and improved accuracy.    

The capacity for “reasoning” extends beyond mere classification and prediction, Kavukcuoglu explains. It encompasses the system’s ability to analyse information, deduce logical conclusions, incorporate context and nuance, and ultimately, make informed decisions.

DeepMind has been exploring methods to enhance AI’s intelligence and reasoning capabilities for some time, employing techniques such as reinforcement learning and chain-of-thought prompting. This groundwork led to the recent introduction of their first thinking model, Gemini 2.0 Flash Thinking.    

“Now, with Gemini 2.5,” says Kavukcuoglu, “we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training.”

Google plans to integrate these thinking capabilities directly into all of its future models—enabling them to tackle more complex problems and support more capable, context-aware agents.    

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 Pro Experimental is positioned as DeepMind’s most advanced model for handling intricate tasks. As of writing, it has secured the top spot on the LMArena leaderboard – a key metric for assessing human preferences – by a significant margin, demonstrating a highly capable model with a high-quality style:

Screenshot of LMArena leaderboard where the new Gemini 2.5 Pro Experimental AI model from Google DeepMind has just taken the top spot.

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Gemini 2.5 Pro has demonstrated state-of-the-art performance across various benchmarks that demand advanced reasoning.

Notably, it leads in maths and science benchmarks – such as GPQA and AIME 2025 – without relying on test-time techniques that increase costs, like majority voting. It also achieved a state-of-the-art score of 18.8% on Humanity’s Last Exam, a dataset designed by subject matter experts to evaluate the human frontier of knowledge and reasoning.

DeepMind has placed significant emphasis on coding performance, and Gemini 2.5 represents a substantial leap forward compared to its predecessor, 2.0, with further improvements in the pipeline. 2.5 Pro excels in creating visually compelling web applications and agentic code applications, as well as code transformation and editing.

On SWE-Bench Verified, the industry standard for agentic code evaluations, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. The model’s reasoning capabilities also enable it to create a video game by generating executable code from a single-line prompt.

Building on its predecessors’ strengths

Gemini 2.5 builds upon the core strengths of earlier Gemini models, including native multimodality and a long context window. 2.5 Pro launches with a one million token context window, with plans to expand this to two million tokens soon. This enables the model to comprehend vast datasets and handle complex problems from diverse information sources, spanning text, audio, images, video, and even entire code repositories.    

Developers and enterprises can now begin experimenting with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced users can also access it via the model dropdown on desktop and mobile platforms. The model will be rolled out on Vertex AI in the coming weeks.    

Google DeepMind encourages users to provide feedback, which will be used to further enhance Gemini’s capabilities.

(Photo by Anshita Nair)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/feed/ 0
ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2 https://www.artificialintelligence-news.com/news/arc-prize-launches-toughest-ai-benchmark-yet-arc-agi-2/ https://www.artificialintelligence-news.com/news/arc-prize-launches-toughest-ai-benchmark-yet-arc-agi-2/#respond Tue, 25 Mar 2025 16:43:12 +0000 https://www.artificialintelligence-news.com/?p=104994 ARC Prize has launched the hardcore ARC-AGI-2 benchmark, accompanied by the announcement of their 2025 competition with $1 million in prizes. As AI progresses from performing narrow tasks to demonstrating general, adaptive intelligence, the ARC-AGI-2 challenges aim to uncover capability gaps and actively guide innovation. “Good AGI benchmarks act as useful progress indicators. Better AGI […]

The post ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2 appeared first on AI News.

]]>
ARC Prize has launched the hardcore ARC-AGI-2 benchmark, accompanied by the announcement of their 2025 competition with $1 million in prizes.

As AI progresses from performing narrow tasks to demonstrating general, adaptive intelligence, the ARC-AGI-2 challenges aim to uncover capability gaps and actively guide innovation.

“Good AGI benchmarks act as useful progress indicators. Better AGI benchmarks clearly discern capabilities. The best AGI benchmarks do all this and actively inspire research and guide innovation,” the ARC Prize team states.

ARC-AGI-2 is setting out to achieve the “best” category.

Beyond memorisation

Since its inception in 2019, ARC Prize has served as a “North Star” for researchers striving toward AGI by creating enduring benchmarks. 

Benchmarks like ARC-AGI-1 leaned into measuring fluid intelligence (i.e., the ability to adapt learning to new unseen tasks.) It represented a clear departure from datasets that reward memorisation alone.

ARC Prize’s mission is also forward-thinking, aiming to accelerate timelines for scientific breakthroughs. Its benchmarks are designed not just to measure progress but to inspire new ideas.

Researchers observed a critical shift with the debut of OpenAI’s o3 in late 2024, evaluated using ARC-AGI-1. Combining deep learning-based large language models (LLMs) with reasoning synthesis engines, o3 marked a breakthrough where AI transitioned beyond rote memorisation.

Yet, despite progress, systems like o3 remain inefficient and require significant human oversight during training processes. To challenge these systems for true adaptability and efficiency, ARC Prize introduced ARC-AGI-2.

ARC-AGI-2: Closing the human-machine gap

The ARC-AGI-2 benchmark is tougher for AI yet retains its accessibility for humans. While frontier AI reasoning systems continue to score in single-digit percentages on ARC-AGI-2, humans can solve every task in under two attempts.

So, what sets ARC-AGI apart? Its design philosophy chooses tasks that are “relatively easy for humans, yet hard, or impossible, for AI.”

The benchmark includes datasets with varying visibility and the following characteristics:

  • Symbolic interpretation: AI struggles to assign semantic significance to symbols, instead focusing on shallow comparisons like symmetry checks.
  • Compositional reasoning: AI falters when it needs to apply multiple interacting rules simultaneously.
  • Contextual rule application: Systems fail to apply rules differently based on complex contexts, often fixating on surface-level patterns.

Most existing benchmarks focus on superhuman capabilities, testing advanced, specialised skills at scales unattainable for most individuals. 

ARC-AGI flips the script and highlights what AI can’t yet do; specifically the adaptability that defines human intelligence. When the gap between tasks that are easy for humans but difficult for AI eventually reaches zero, AGI can be declared achieved.

However, achieving AGI isn’t limited to the ability to solve tasks; efficiency – the cost and resources required to find solutions – is emerging as a crucial defining factor.

The role of efficiency

Measuring performance by cost per task is essential to gauge intelligence as not just problem-solving capability but the ability to do so efficiently.

Real-world examples are already showing efficiency gaps between humans and frontier AI systems:

  • Human panel efficiency: Passes ARC-AGI-2 tasks with 100% accuracy at $17/task.
  • OpenAI o3: Early estimates suggest a 4% success rate at an eye-watering $200 per task.

These metrics underline disparities in adaptability and resource consumption between humans and AI. ARC Prize has committed to reporting on efficiency alongside scores across future leaderboards.

The focus on efficiency prevents brute-force solutions from being considered “true intelligence.”

Intelligence, according to ARC Prize, encompasses finding solutions with minimal resources—a quality distinctly human but still elusive for AI.

ARC Prize 2025

ARC Prize 2025 launches on Kaggle this week, promising $1 million in total prizes and showcasing a live leaderboard for open-source breakthroughs. The contest aims to drive progress toward systems that can efficiently tackle ARC-AGI-2 challenges. 

Among the prize categories, which have increased from 2024 totals, are:

  • Grand prize: $700,000 for reaching 85% success within Kaggle efficiency limits.
  • Top score prize: $75,000 for the highest-scoring submission.
  • Paper prize: $50,000 for transformative ideas contributing to solving ARC-AGI tasks.
  • Additional prizes: $175,000, with details pending announcements during the competition.

These incentives ensure fair and meaningful progress while fostering collaboration among researchers, labs, and independent teams.

Last year, ARC Prize 2024 saw 1,500 competitor teams, resulting in 40 papers of acclaimed industry influence. This year’s increased stakes aim to nurture even greater success.

ARC Prize believes progress hinges on novel ideas rather than merely scaling existing systems. The next breakthrough in efficient general systems might not originate from current tech giants but from bold, creative researchers embracing complexity and curious experimentation.

(Image credit: ARC Prize)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post ARC Prize launches its toughest AI benchmark yet: ARC-AGI-2 appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/arc-prize-launches-toughest-ai-benchmark-yet-arc-agi-2/feed/ 0
Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/ https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/#respond Mon, 17 Mar 2025 10:07:40 +0000 https://www.artificialintelligence-news.com/?p=104813 Baidu has launched its latest foundation AI models, ERNIE 4.5 and ERNIE X1, and is offering them free for individuals through ERNIE Bot. The company says that it aims to “push the boundaries of multimodal and reasoning models” by providing advanced capabilities at a more accessible price point. Baidu plans to integrate these models into […]

The post Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 appeared first on AI News.

]]>
Baidu has launched its latest foundation AI models, ERNIE 4.5 and ERNIE X1, and is offering them free for individuals through ERNIE Bot.

The company says that it aims to “push the boundaries of multimodal and reasoning models” by providing advanced capabilities at a more accessible price point. Baidu plans to integrate these models into its broader product ecosystem, including Baidu Search and the Wenxiaoyan app, to enhance user experiences.

ERNIE 4.5, Baidu’s “new generation native multimodal foundation model,” features collaborative optimisation across multiple modalities, resulting in improved multimodal comprehension. It enhances language understanding, generation, reasoning, and memory, while also improving “hallucination prevention, logical reasoning, and coding abilities.”

A key feature of ERNIE 4.5 is its ability to integrate and understand various content types, including text, images, audio, and video. It can also grasp complex content such as internet memes and satirical cartoons, showcasing strong contextual awareness.

Baidu claims ERNIE 4.5 outperforms GPT-4.5 in several benchmarks while being significantly more affordable, priced at “just 1% of GPT-4.5.”

Benchmark comparing the ERNIE 4.5 foundation AI model from Baidu to rivals such as GPT-4.5, DeepSeek, and others.

The model’s advancements are attributed to technologies like ‘FlashMask’ dynamic attention masking, heterogeneous multimodal mixture-of-experts, spatiotemporal representation compression, knowledge-centric training data construction, and self-feedback enhanced post-training.

ERNIE X1, Baidu’s new deep-thinking reasoning model, focuses on enhanced understanding, planning, reflection, and evolution. As Baidu’s “first multimodal deep-thinking reasoning model capable of tool use,” X1 excels in areas like Chinese knowledge Q&A, literary creation, and complex calculations.

The model’s tool use includes features like advanced search, document Q&A, image understanding, AI image generation, and webpage reading.

ERNIE X1’s capabilities are supported by technologies such as the progressive reinforcement learning method, end-to-end training approach integrating chains of thought and action, and a unified multi-faceted reward system.

For enterprise users and developers, ERNIE 4.5 is accessible through APIs on Baidu AI Cloud’s Qianfan platform, with competitive pricing structures. ERNIE X1 will soon be available on the same platform.

Baidu anticipates that “2025 is set to be an important year for the development and iteration of large language models and technologies” and plans to continue investing in AI, data centres, and cloud infrastructure to advance its AI capabilities and develop next-generation models.

See also: OpenAI and Google call for US government action to secure AI lead

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu undercuts rival AI models with ERNIE 4.5 and ERNIE X1 appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/baidu-undercuts-rival-ai-models-ernie-4-5-and-ernie-x1/feed/ 0
Gemma 3: Google launches its latest open AI models https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/ https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/#respond Wed, 12 Mar 2025 09:08:41 +0000 https://www.artificialintelligence-news.com/?p=104758 Google has launched Gemma 3, the latest version of its family of open AI models that aim to set a new benchmark for AI accessibility. Built upon the foundations of the company’s Gemini 2.0 models, Gemma 3 is engineered to be lightweight, portable, and adaptable—enabling developers to create AI applications across a wide range of […]

The post Gemma 3: Google launches its latest open AI models appeared first on AI News.

]]>
Google has launched Gemma 3, the latest version of its family of open AI models that aim to set a new benchmark for AI accessibility.

Built upon the foundations of the company’s Gemini 2.0 models, Gemma 3 is engineered to be lightweight, portable, and adaptable—enabling developers to create AI applications across a wide range of devices.  

This release comes hot on the heels of Gemma’s first birthday, an anniversary underscored by impressive adoption metrics. Gemma models have achieved more than 100 million downloads and spawned the creation of over 60,000 community-built variants. Dubbed the “Gemmaverse,” this ecosystem signals a thriving community aiming to democratise AI.  

“The Gemma family of open models is foundational to our commitment to making useful AI technology accessible,” explained Google.

Gemma 3: Features and capabilities

Gemma 3 models are available in various sizes – 1B, 4B, 12B, and 27B parameters – allowing developers to select a model tailored to their specific hardware and performance requirements. These models promise faster execution, even on modest computational setups, without compromising functionality or accuracy.

Here are some of the standout features of Gemma 3:  

  • Single-accelerator performance: Gemma 3 sets a new benchmark for single-accelerator models. In preliminary human preference evaluations on the LMArena leaderboard, Gemma 3 outperformed rivals including Llama-405B, DeepSeek-V3, and o3-mini.
  • Multilingual support across 140 languages: Catering to diverse audiences, Gemma 3 comes with pretrained capabilities for over 140 languages. Developers can create applications that connect with users in their native tongues, expanding the global reach of their projects.  
  • Sophisticated text and visual analysis: With advanced text, image, and short video reasoning capabilities, developers can implement Gemma 3 to craft interactive and intelligent applications—addressing an array of use cases from content analysis to creative workflows.  
  • Expanded context window: Offering a 128k-token context window, Gemma 3 can analyse and synthesise large datasets, making it ideal for applications requiring extended content comprehension.
  • Function calling for workflow automation: With function calling support, developers can utilise structured outputs to automate processes and build agentic AI systems effortlessly.
  • Quantised models for lightweight efficiency: Gemma 3 introduces official quantised versions, significantly reducing model size while preserving output accuracy—a bonus for developers optimising for mobile or resource-constrained environments.

The model’s performance advantages are clearly illustrated in the Chatbot Arena Elo Score leaderboard. Despite requiring just a single NVIDIA H100 GPU, the flagship 27B version of Gemma 3 ranks among the top chatbots, achieving an Elo score of 1338. Many competitors demand up to 32 GPUs to deliver comparable performance.

Google Gemma 3 performance illustrated on benchmark against both open source and proprietary AI models in the Chatbot Arena Elo Score leaderboard.

One of Gemma 3’s strengths lies in its adaptability within developers’ existing workflows.  

  • Diverse tooling compatibility: Gemma 3 supports popular AI libraries and tools, including Hugging Face Transformers, JAX, PyTorch, and Google AI Edge. For optimised deployment, platforms such as Vertex AI or Google Colab are ready to help developers get started with minimal hassle.  
  • NVIDIA optimisations: Whether using entry-level GPUs like Jetson Nano or cutting-edge hardware like Blackwell chips, Gemma 3 ensures maximum performance, further simplified through the NVIDIA API Catalog.  
  • Broadened hardware support: Beyond NVIDIA, Gemma 3 integrates with AMD GPUs via the ROCm stack and supports CPU execution with Gemma.cpp for added versatility.

For immediate experiments, users can access Gemma 3 models via platforms such as Hugging Face and Kaggle, or take advantage of the Google AI Studio for in-browser deployment.

Advancing responsible AI  

“We believe open models require careful risk assessment, and our approach balances innovation with safety,” explains Google.  

Gemma 3’s team adopted stringent governance policies, applying fine-tuning and robust benchmarking to align the model with ethical guidelines. Given the models enhanced capabilities in STEM fields, it underwent specific evaluations to mitigate risks of misuse, such as generating harmful substances.

Google is pushing for collective efforts within the industry to create proportionate safety frameworks for increasingly powerful models.

To play its part, Google is launching ShieldGemma 2. The 4B image safety checker leverages Gemma 3’s architecture and outputs safety labels across categories such as dangerous content, explicit material, and violence. While offering out-of-the-box solutions, developers can customise the tool to meet tailored safety requirements.

The “Gemmaverse” isn’t just a technical ecosystem, it’s a community-driven movement. Projects such as AI Singapore’s SEA-LION v3, INSAIT’s BgGPT, and Nexa AI’s OmniAudio are testament to the power of collaboration within this ecosystem.  

To bolster academic research, Google has also introduced the Gemma 3 Academic Program. Researchers can apply for $10,000 worth of Google Cloud credits to accelerate their AI-centric projects. Applications open today and remain available for four weeks.  

With its accessibility, capabilities, and widespread compatibility, Gemma 3 makes a strong case for becoming a cornerstone in the AI development community.

(Image credit: Google)

See also: Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemma 3: Google launches its latest open AI models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemma-3-google-launches-its-latest-open-ai-models/feed/ 0
Endor Labs: AI transparency vs ‘open-washing’ https://www.artificialintelligence-news.com/news/endor-labs-ai-transparency-vs-open-washing/ https://www.artificialintelligence-news.com/news/endor-labs-ai-transparency-vs-open-washing/#respond Mon, 24 Feb 2025 18:15:45 +0000 https://www.artificialintelligence-news.com/?p=104605 As the AI industry focuses on transparency and security, debates around the true meaning of “openness” are intensifying. Experts from open-source security firm Endor Labs weighed in on these pressing topics. Andrew Stiefel, Senior Product Marketing Manager at Endor Labs, emphasised the importance of applying lessons learned from software security to AI systems. “The US […]

The post Endor Labs: AI transparency vs ‘open-washing’ appeared first on AI News.

]]>
As the AI industry focuses on transparency and security, debates around the true meaning of “openness” are intensifying. Experts from open-source security firm Endor Labs weighed in on these pressing topics.

Andrew Stiefel, Senior Product Marketing Manager at Endor Labs, emphasised the importance of applying lessons learned from software security to AI systems.

“The US government’s 2021 Executive Order on Improving America’s Cybersecurity includes a provision requiring organisations to produce a software bill of materials (SBOM) for each product sold to federal government agencies.”

An SBOM is essentially an inventory detailing the open-source components within a product, helping detect vulnerabilities. Stiefel argued that “applying these same principles to AI systems is the logical next step.”  

“Providing better transparency for citizens and government employees not only improves security,” he explained, “but also gives visibility into a model’s datasets, training, weights, and other components.”

What does it mean for an AI model to be “open”?  

Julien Sobrier, Senior Product Manager at Endor Labs, added crucial context to the ongoing discussion about AI transparency and “openness.” Sobrier broke down the complexity inherent in categorising AI systems as truly open.

“An AI model is made of many components: the training set, the weights, and programs to train and test the model, etc. It is important to make the whole chain available as open source to call the model ‘open’. It is a broad definition for now.”  

Sobrier noted the lack of consistency across major players, which has led to confusion about the term.

“Among the main players, the concerns about the definition of ‘open’ started with OpenAI, and Meta is in the news now for their LLAMA model even though that’s ‘more open’. We need a common understanding of what an open model means. We want to watch out for any ‘open-washing,’ as we saw it with free vs open-source software.”  

One potential pitfall, Sobrier highlighted, is the increasingly common practice of “open-washing,” where organisations claim transparency while imposing restrictions.

“With cloud providers offering a paid version of open-source projects (such as databases) without contributing back, we’ve seen a shift in many open-source projects: The source code is still open, but they added many commercial restrictions.”  

“Meta and other ‘open’ LLM providers might go this route to keep their competitive advantage: more openness about the models, but preventing competitors from using them,” Sobrier warned.

DeepSeek aims to increase AI transparency

DeepSeek, one of the rising — albeit controversial — players in the AI industry, has taken steps to address some of these concerns by making portions of its models and code open-source. The move has been praised for advancing transparency while providing security insights.  

“DeepSeek has already released the models and their weights as open-source,” said Andrew Stiefel. “This next move will provide greater transparency into their hosted services, and will give visibility into how they fine-tune and run these models in production.”

Such transparency has significant benefits, noted Stiefel. “This will make it easier for the community to audit their systems for security risks and also for individuals and organisations to run their own versions of DeepSeek in production.”  

Beyond security, DeepSeek also offers a roadmap on how to manage AI infrastructure at scale.

“From a transparency side, we’ll see how DeepSeek is running their hosted services. This will help address security concerns that emerged after it was discovered they left some of their Clickhouse databases unsecured.”

Stiefel highlighted that DeepSeek’s practices with tools like Docker, Kubernetes (K8s), and other infrastructure-as-code (IaC) configurations could empower startups and hobbyists to build similar hosted instances.  

Open-source AI is hot right now

DeepSeek’s transparency initiatives align with the broader trend toward open-source AI. A report by IDC reveals that 60% of organisations are opting for open-source AI models over commercial alternatives for their generative AI (GenAI) projects.  

Endor Labs research further indicates that organisations use, on average, between seven and twenty-one open-source models per application. The reasoning is clear: leveraging the best model for specific tasks and controlling API costs.

“As of February 7th, Endor Labs found that more than 3,500 additional models have been trained or distilled from the original DeepSeek R1 model,” said Stiefel. “This shows both the energy in the open-source AI model community, and why security teams need to understand both a model’s lineage and its potential risks.”  

For Sobrier, the growing adoption of open-source AI models reinforces the need to evaluate their dependencies.

“We need to look at AI models as major dependencies that our software depends on. Companies need to ensure they are legally allowed to use these models but also that they are safe to use in terms of operational risks and supply chain risks, just like open-source libraries.”

He emphasised that any risks can extend to training data: “They need to be confident that the datasets used for training the LLM were not poisoned or had sensitive private information.”  

Building a systematic approach to AI model risk  

As open-source AI adoption accelerates, managing risk becomes ever more critical. Stiefel outlined a systematic approach centred around three key steps:  

  1. Discovery: Detect the AI models your organisation currently uses.  
  2. Evaluation: Review these models for potential risks, including security and operational concerns.  
  3. Response: Set and enforce guardrails to ensure safe and secure model adoption.  

“The key is finding the right balance between enabling innovation and managing risk,” Stiefel said. “We need to give software engineering teams latitude to experiment but must do so with full visibility. The security team needs line-of-sight and the insight to act.”  

Sobrier further argued that the community must develop best practices for safely building and adopting AI models. A shared methodology is needed to evaluate AI models across parameters such as security, quality, operational risks, and openness.

Beyond transparency: Measures for a responsible AI future  

To ensure the responsible growth of AI, the industry must adopt controls that operate across several vectors:  

  • SaaS models: Safeguarding employee use of hosted models.
  • API integrations: Developers embedding third-party APIs like DeepSeek into applications, which, through tools like OpenAI integrations, can switch deployment with just two lines of code.
  • Open-source models: Developers leveraging community-built models or creating their own models from existing foundations maintained by companies like DeepSeek.

Sobrier warned of complacency in the face of rapid AI progress. “The community needs to build best practices to develop safe and open AI models,” he advised, “and a methodology to rate them along security, quality, operational risks, and openness.”  

As Stiefel succinctly summarised: “Think about security across multiple vectors and implement the appropriate controls for each.”

See also: AI in 2025: Purpose-driven models, human integration, and more

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Endor Labs: AI transparency vs ‘open-washing’ appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/endor-labs-ai-transparency-vs-open-washing/feed/ 0
Grok 3: The next-gen ‘truth-seeking’ AI model https://www.artificialintelligence-news.com/news/grok-3-next-gen-truth-seeking-ai-model/ https://www.artificialintelligence-news.com/news/grok-3-next-gen-truth-seeking-ai-model/#respond Tue, 18 Feb 2025 12:20:39 +0000 https://www.artificialintelligence-news.com/?p=104551 xAI unveiled its Grok 3 AI model on Monday, alongside new capabilities such as image analysis and refined question answering. The company harnessed an immense data centre equipped with approximately 200,000 GPUs to develop Grok 3. According to xAI owner Elon Musk, this project utilised “10x” more computing power than its predecessor, Grok 2, with […]

The post Grok 3: The next-gen ‘truth-seeking’ AI model appeared first on AI News.

]]>
xAI unveiled its Grok 3 AI model on Monday, alongside new capabilities such as image analysis and refined question answering.

The company harnessed an immense data centre equipped with approximately 200,000 GPUs to develop Grok 3. According to xAI owner Elon Musk, this project utilised “10x” more computing power than its predecessor, Grok 2, with an expanded dataset that reportedly includes information from legal case filings.

Musk claimed that Grok 3 is a “maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically-correct.”

The Grok 3 rollout includes a family of models designed for different needs. Grok 3 mini, for example, prioritises faster response times over absolute accuracy. However, particularly noteworthy are the new reasoning-focused Grok 3 models.

Dubbed Grok 3 Reasoning and Grok 3 mini Reasoning, these variants aim to emulate human-like cognitive processes by “thinking through” problems. Comparable to models like OpenAI’s o3-mini and DeepSeek’s R1, these reasoning systems attempt to fact-check their responses—reducing the likelihood of errors or missteps.

Grok 3: The benchmark results

xAI asserts that Grok 3 surpasses OpenAI’s GPT-4o in certain benchmarks, including AIME and GPQA, which assess the model’s proficiency in tackling complex problems across mathematics, physics, biology, and chemistry.

The early version of Grok 3 is also currently leading on Chatbot Arena, a crowdsourced evaluation platform where users pit AI models against one another and rank their outputs. The model is the first to break the Arena’s 1400 score.

According to xAI, Grok 3 Reasoning outperforms its rivals on a variety of prominent benchmarks:

Reasoning benchmark results of the Grok 3 AI model from xAI compared to other leading artificial intelligence models from Google, DeepSeek, and OpenAI.

These reasoning models are already integrated into features available via the Grok app. Users can select commands like “Think” or activate the more computationally-intensive “Big Brain” mode for tackling particularly challenging questions.

xAI has positioned the reasoning models as ideal tools for STEM (science, technology, engineering, and mathematics) applications, including mathematics, science, and coding challenges.

Guarding against AI distillation

Interestingly, not all of Grok 3’s internal processes are laid bare to users. Musk explained that some of the reasoning models’ “thoughts” are intentionally obscured to prevent distillation—a controversial practice where competing AI developers extract knowledge from proprietary models.

The practice was thrust into the spotlight in recent weeks after Chinese AI firm DeepSeek faced allegations of distilling OpenAI’s models to develop its latest model, R-1.

xAI’s new reasoning models serve as the foundation for a new Grok app feature called DeepSearch. The feature uses Grok models to scan the internet and Musk’s social platform, X, for relevant information before synthesising a detailed abstract in answer to user queries.

Accessing Grok 3 and committing to open-source

Access to the latest Grok model is currently tied to X’s subscription tiers. Premium+ subscribers, who pay $50 (~£41) per month, will receive priority access to the latest functionalities. 

xAI is also introducing a SuperGrok subscription plan, reportedly priced at either $30 per month or $300 annually. SuperGrok subscribers will benefit from enhanced reasoning capabilities, more DeepSearch queries, and unlimited image generation features.

The company also teased upcoming features. Within a week, the Grok app is expected to introduce a voice mode—enabling users to interact with the AI through a synthesised voice similar to Gemini Live.

Musk further revealed plans to release Grok 3 models via an enterprise-ready API in the coming weeks, with DeepSearch functionality included.

Although Grok 3 is still fresh, xAI intends to open-source its predecessor in the coming months. Musk claims that xAI will continue to open-source the last version of Grok.

“When Grok 3 is mature and stable, which is probably within a few months, then we’ll open-source Grok 2,” explains Musk.

The ‘anti-woke’ AI model

Grok has long been marketed as unfiltered, bold, and willing to engage with queries that competitors might avoid. Musk previously described the AI as “anti-woke,” presenting it as a model unafraid to touch on controversial topics. 

True to its promise, early models like Grok and Grok 2 embraced politically-charged queries, even veering into colourful language when prompted. Yet, these versions also revealed some biases when delving deep into political discourse.

“We’re working to shift Grok closer to politically-neutral,” said Musk.

However, whether Grok 3 achieves this goal remains to be seen. With such changes at play, analysts are already highlighting the potential societal impacts of introducing increasingly “truth-seeking” yet politically-sensitive AI systems.

With Grok 3, Musk and xAI have made a bold statement, pushing their technology forward while potentially fuelling debates around bias, transparency, and the ethics of AI deployment.

As competitors like OpenAI, Google, and DeepSeek refine their offerings, Grok 3’s success will hinge on its ability to balance accuracy, user demand, and societal responsibility.

See also: AI in 2025: Purpose-driven models, human integration, and more

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Grok 3: The next-gen ‘truth-seeking’ AI model appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/grok-3-next-gen-truth-seeking-ai-model/feed/ 0