AI Companies News | Latest AI Company News | AI News

Google AMIE: AI doctor learns to ‘see’ medical images

Ryan Daws — Fri, 02 May 2025 12:38:12 +0000

Google is giving its diagnostic AI the ability to understand visual medical information with its latest research on AMIE (Articulate Medical Intelligence Explorer).

Imagine chatting with an AI about a health concern, and instead of just processing your words, it could actually look at the photo of that worrying rash or make sense of your ECG printout. That’s what Google is aiming for.

We already knew AMIE showed promise in text-based medical chats, thanks to earlier work published in Nature. But let’s face it, real medicine isn’t just about words.

Doctors rely heavily on what they can see – skin conditions, readings from machines, lab reports. As the Google team rightly points out, even simple instant messaging platforms “allow static multimodal information (e.g., images and documents) to enrich discussions.”

Text-only AI was missing a huge piece of the puzzle. The big question, as the researchers put it, was “Whether LLMs can conduct diagnostic clinical conversations that incorporate this more complex type of information.”

Google teaches AMIE to look and reason

Google’s engineers have beefed up AMIE using their Gemini 2.0 Flash model as the brains of the operation. They’ve combined this with what they call a “state-aware reasoning framework.” In plain English, this means the AI doesn’t just follow a script; it adapts its conversation based on what it’s learned so far and what it still needs to figure out.

It’s close to how a human clinician works: gathering clues, forming ideas about what might be wrong, and then asking for more specific information – including visual evidence – to narrow things down.

“This enables AMIE to request relevant multimodal artifacts when needed, interpret their findings accurately, integrate this information seamlessly into the ongoing dialogue, and use it to refine diagnoses,” Google explains.

Think of the conversation flowing through stages: first gathering the patient’s history, then moving towards diagnosis and management suggestions, and finally follow-up. The AI constantly assesses its own understanding, asking for that skin photo or lab result if it senses a gap in its knowledge.

To get this right without endless trial-and-error on real people, Google built a detailed simulation lab.

Google created lifelike patient cases, pulling realistic medical images and data from sources like the PTB-XL ECG database and the SCIN dermatology image set, adding plausible backstories using Gemini. Then, they let AMIE ‘chat’ with simulated patients within this setup and automatically check how well it performed on things like diagnostic accuracy and avoiding errors (or ‘hallucinations’).

The virtual OSCE: Google puts AMIE through its paces

The real test came in a setup designed to mirror how medical students are assessed: the Objective Structured Clinical Examination (OSCE).

Google ran a remote study involving 105 different medical scenarios. Real actors, trained to portray patients consistently, interacted either with the new multimodal AMIE or with actual human primary care physicians (PCPs). These chats happened through an interface where the ‘patient’ could upload images, just like you might in a modern messaging app.

Afterwards, specialist doctors (in dermatology, cardiology, and internal medicine) and the patient actors themselves reviewed the conversations.

The human doctors scored everything from how well history was taken, the accuracy of the diagnosis, the quality of the suggested management plan, right down to communication skills and empathy—and, of course, how well the AI interpreted the visual information.

Surprising results from the simulated clinic

Here’s where it gets really interesting. In this head-to-head comparison within the controlled study environment, Google found AMIE didn’t just hold its own—it often came out ahead.

The AI was rated as being better than the human PCPs at interpreting the multimodal data shared during the chats. It also scored higher on diagnostic accuracy, producing differential diagnosis lists (the ranked list of possible conditions) that specialists deemed more accurate and complete based on the case details.

Specialist doctors reviewing the transcripts tended to rate AMIE’s performance higher across most areas. They particularly noted “the quality of image interpretation and reasoning,” the thoroughness of its diagnostic workup, the soundness of its management plans, and its ability to flag when a situation needed urgent attention.

Perhaps one of the most surprising findings came from the patient actors: they often found the AI to be more empathetic and trustworthy than the human doctors in these text-based interactions.

And, on a critical safety note, the study found no statistically significant difference between how often AMIE made errors based on the images (hallucinated findings) compared to the human physicians.

Technology never stands still, so Google also ran some early tests swapping out the Gemini 2.0 Flash model for the newer Gemini 2.5 Flash.

Using their simulation framework, the results hinted at further gains, particularly in getting the diagnosis right (Top-3 Accuracy) and suggesting appropriate management plans.

While promising, the team is quick to add a dose of realism: these are just automated results, and “rigorous assessment through expert physician review is essential to confirm these performance benefits.”

Important reality checks

Google is commendably upfront about the limitations here. “This study explores a research-only system in an OSCE-style evaluation using patient actors, which substantially under-represents the complexity… of real-world care,” they state clearly.

Simulated scenarios, however well-designed, aren’t the same as dealing with the unique complexities of real patients in a busy clinic. They also stress that the chat interface doesn’t capture the richness of a real video or in-person consultation.

So, what’s the next step? Moving carefully towards the real world. Google is already partnering with Beth Israel Deaconess Medical Center for a research study to see how AMIE performs in actual clinical settings with patient consent.

The researchers also acknowledge the need to eventually move beyond text and static images towards handling real-time video and audio—the kind of interaction common in telehealth today.

Giving AI the ability to ‘see’ and interpret the kind of visual evidence doctors use every day offers a glimpse of how AI might one day assist clinicians and patients. However, the path from these promising findings to a safe and reliable tool for everyday healthcare is still a long one that requires careful navigation.

(Photo by Alexander Sinn)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google AMIE: AI doctor learns to ‘see’ medical images appeared first on AI News.

Claude Integrations: Anthropic adds AI to your favourite work tools

Ryan Daws — Thu, 01 May 2025 17:02:33 +0000

Anthropic just launched ‘Integrations’ for Claude that enables the AI to talk directly to your favourite daily work tools. In addition, the company has launched a beefed-up ‘Advanced Research’ feature for digging deeper than ever before.

Starting with Integrations, the feature builds on a technical standard Anthropic released last year (the Model Context Protocol, or MCP), but makes it much easier to use. Before, setting this up was a bit technical and local. Now, developers can build secure bridges allowing Claude to connect safely with apps over the web or on your desktop.

For end-users of Claude, this means you can now hook it up to a growing list of popular work software. Right out of the gate, they’ve included support for ten big names: Atlassian’s Jira and Confluence (hello, project managers and dev teams!), the automation powerhouse Zapier, Cloudflare, customer comms tool Intercom, plus Asana, Square, Sentry, PayPal, Linear, and Plaid. Stripe and GitLab are joining the party soon.

So, what’s the big deal? The real advantage here is context. When Claude can see your project history in Jira, read your team’s knowledge base in Confluence, or check task updates in Asana, it stops guessing and starts understanding what you’re working on.

“When you connect your tools to Claude, it gains deep context about your work—understanding project histories, task statuses, and organisational knowledge—and can take actions across every surface,” explains Anthropic.

They add, “Claude becomes a more informed collaborator, helping you execute complex projects in one place with expert assistance at every step.”

Let’s look at what this means in practice. Connect Zapier, and you suddenly give Claude the keys to thousands of apps linked by Zapier’s workflows. You could just ask Claude, conversationally, to trigger a complex sequence – maybe grab the latest sales numbers from HubSpot, check your calendar, and whip up some meeting notes, all without you lifting a finger in those apps.

For teams using Atlassian’s Jira and Confluence, Claude could become a serious helper. Think drafting product specs, summarising long Confluence documents so you don’t have to wade through them, or even creating batches of linked Jira tickets at once. It might even spot potential roadblocks by analysing project data.

And if you use Intercom for customer chats, this integration could be a game-changer. Intercom’s own AI assistant, Fin, can now work with Claude to do things like automatically create a bug report in Linear if a customer flags an issue. You could also ask Claude to sift through your Intercom chat history to spot patterns, help debug tricky problems, or summarise what customers are saying – making the whole journey from feedback to fix much smoother.

Anthropic is also making it easier for developers to build even more of these connections. They reckon that using their tools (or platforms like Cloudflare that handle the tricky bits like security and setup), developers can whip up a custom Integration with Claude in about half an hour. This could mean connecting Claude to your company’s unique internal systems or specialised industry software.

Beyond tool integrations, Claude gets a serious research upgrade

Alongside these new connections, Anthropic has given Claude’s Research feature a serious boost. It could already search the web and your Google Workspace files, but the new ‘Advanced Research’ mode is built for when you need to dig really deep.

Flip the switch for this advanced mode, and Claude tackles big questions differently. Instead of just one big search, it intelligently breaks your request down into smaller chunks, investigates each part thoroughly – using the web, your Google Docs, and now tapping into any apps you’ve connected via Integrations – before pulling it all together into a detailed report.

Now, this deeper digging takes a bit more time. While many reports might only take five to fifteen minutes, Anthropic says the really complex investigations could have Claude working away for up to 45 minutes. That might sound like a while, but compare it to the hours you might spend grinding through that research manually, and it starts to look pretty appealing.

Importantly, you can trust the results. When Claude uses information from any source – whether it’s a website, an internal doc, a Jira ticket, or a Confluence page – it gives you clear links straight back to the original. No more wondering where the AI got its information from; you can check it yourself.

These shiny new Integrations and the Advanced Research mode are rolling out now in beta for folks on Anthropic’s paid Max, Team, and Enterprise plans. If you’re on the Pro plan, don’t worry – access is coming your way soon.

Also worth noting: the standard web search feature inside Claude is now available everywhere, for everyone on any paid Claude.ai plan (Pro and up). No more geographical restrictions on that front.

Putting it all together, these updates and integrations show Anthropic is serious about making Claude genuinely useful in a professional context. By letting it plug directly into the tools we already use and giving it more powerful ways to analyse information, they’re pushing Claude towards being less of a novelty and more of an essential part of the modern toolkit.

(Image credit: Anthropic)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Claude Integrations: Anthropic adds AI to your favourite work tools appeared first on AI News.

Meta beefs up AI security with new Llama tools

Ryan Daws — Wed, 30 Apr 2025 13:35:22 +0000

If you’re building with AI, or trying to defend against the less savoury side of the technology, Meta just dropped new Llama security tools.

The improved security tools for the Llama AI models arrive alongside fresh resources from Meta designed to help cybersecurity teams harness AI for defence. It’s all part of their push to make developing and using AI a bit safer for everyone involved.

Developers working with the Llama family of models now have some upgraded kit to play with. You can grab these latest Llama Protection tools directly from Meta’s own Llama Protections page, or find them where many developers live: Hugging Face and GitHub.

First up is Llama Guard 4. Think of it as an evolution of Meta’s customisable safety filter for AI. The big news here is that it’s now multimodal so it can understand and apply safety rules not just to text, but to images as well. That’s crucial as AI applications get more visual. This new version is also being baked into Meta’s brand-new Llama API, which is currently in a limited preview.

Then there’s LlamaFirewall. This is a new piece of the puzzle from Meta, designed to act like a security control centre for AI systems. It helps manage different safety models working together and hooks into Meta’s other protection tools. Its job? To spot and block the kind of risks that keep AI developers up at night – things like clever ‘prompt injection’ attacks designed to trick the AI, potentially dodgy code generation, or risky behaviour from AI plug-ins.

Meta has also given its Llama Prompt Guard a tune-up. The main Prompt Guard 2 (86M) model is now better at sniffing out those pesky jailbreak attempts and prompt injections. More interestingly, perhaps, is the introduction of Prompt Guard 2 22M.

Prompt Guard 2 22M is a much smaller, nippier version. Meta reckons it can slash latency and compute costs by up to 75% compared to the bigger model, without sacrificing too much detection power. For anyone needing faster responses or working on tighter budgets, that’s a welcome addition.

But Meta isn’t just focusing on the AI builders; they’re also looking at the cyber defenders on the front lines of digital security. They’ve heard the calls for better AI-powered tools to help in the fight against cyberattacks, and they’re sharing some updates aimed at just that.

The CyberSec Eval 4 benchmark suite has been updated. This open-source toolkit helps organisations figure out how good AI systems actually are at security tasks. This latest version includes two new tools:

CyberSOC Eval: Built with the help of cybersecurity experts CrowdStrike, this framework specifically measures how well AI performs in a real Security Operation Centre (SOC) environment. It’s designed to give a clearer picture of AI’s effectiveness in threat detection and response. The benchmark itself is coming soon.
AutoPatchBench: This benchmark tests how good Llama and other AIs are at automatically finding and fixing security holes in code before the bad guys can exploit them.

To help get these kinds of tools into the hands of those who need them, Meta is kicking off the Llama Defenders Program. This seems to be about giving partner companies and developers special access to a mix of AI solutions – some open-source, some early-access, some perhaps proprietary – all geared towards different security challenges.

As part of this, Meta is sharing an AI security tool they use internally: the Automated Sensitive Doc Classification Tool. It automatically slaps security labels on documents inside an organisation. Why? To stop sensitive info from walking out the door, or to prevent it from being accidentally fed into an AI system (like in RAG setups) where it could be leaked.

They’re also tackling the problem of fake audio generated by AI, which is increasingly used in scams. The Llama Generated Audio Detector and Llama Audio Watermark Detector are being shared with partners to help them spot AI-generated voices in potential phishing calls or fraud attempts. Companies like ZenDesk, Bell Canada, and AT&T are already lined up to integrate these.

Finally, Meta gave a sneak peek at something potentially huge for user privacy: Private Processing. This is new tech they’re working on for WhatsApp. The idea is to let AI do helpful things like summarise your unread messages or help you draft replies, but without Meta or WhatsApp being able to read the content of those messages.

Meta is being quite open about the security side, even publishing their threat model and inviting security researchers to poke holes in the architecture before it ever goes live. It’s a sign they know they need to get the privacy aspect right.

Overall, it’s a broad set of AI security announcements from Meta. They’re clearly trying to put serious muscle behind securing the AI they build, while also giving the wider tech community better tools to build safely and defend effectively.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta beefs up AI security with new Llama tools appeared first on AI News.

OpenAI’s latest LLM opens doors for China’s AI startups

Muhammad Zulhusni — Tue, 29 Apr 2025 16:41:59 +0000

At the Apsara Conference in Hangzhou, hosted by Alibaba Cloud, China’s AI startups emphasised their efforts to develop large language models.

The companies’ efforts follow the announcement of OpenAI’s latest LLMs, including the o1 generative pre-trained transformer model backed by Microsoft. The model is intended to tackle difficult tasks, paving the way for advances in science, coding, and mathematics.

During the conference, Kunal Zhilin, founder of Moonshot AI, underlined the importance of the o1 model, adding that it has the potential to reshape various industries and create new opportunities for AI startups.

Zhilin stated that reinforcement learning and scalability might be pivotal for AI development. He spoke of the scaling law, which states that larger models with more training data perform better.

“This approach pushes the ceiling of AI capabilities,” Zhilin said, adding that OpenAI o1 has the potential to disrupt sectors and generate new opportunities for startups.

OpenAI has also stressed the model’s ability to solve complex problems, which it says operate in a manner similar to human thinking. By refining its strategies and learning from mistakes, the model improves its problem-solving capabilities.

Zhilin said companies with enough computing power will be able to innovate not only in algorithms, but also in foundational AI models. He sees this as pivotal, as AI engineers rely increasingly on reinforcement learning to generate new data after exhausting available organic data sources.

StepFun CEO Jiang Daxin concurred with Zhilin but stated that computational power remains a big challenge for many start-ups, particularly due to US trade restrictions that hinder Chinese enterprises’ access to advanced semiconductors.

“The computational requirements are still substantial,” Daxin stated.

An insider at Baichuan AI has said that only a small group of Chinese AI start-ups — including Moonshot AI, Baichuan AI, Zhipu AI, and MiniMax — are in a position to make large-scale investments in reinforcement learning. These companies — collectively referred to as the “AI tigers” — are involved heavily in LLM development, pushing the next generation of AI.

More from the Apsara Conference

Also at the conference, Alibaba Cloud made several announcements, including the release of its Qwen 2.5 model family, which features advances in coding and mathematics. The models range from 0.5 billion to 72 billion parameters and support approximately 29 languages, including Chinese, English, French, and Spanish.

Specialised models such as Qwen2.5-Coder and Qwen2.5-Math have already gained some traction, with over 40 million downloads on platforms Hugging Face and ModelScope.

Alibaba Cloud added to its product portfolio, delivering a text-to-video model in its picture generator, Tongyi Wanxiang. The model can create videos in realistic and animated styles, with possible uses in advertising and filmmaking.

Alibaba Cloud unveiled Qwen 2-VL, the latest version of its vision language model. It handles videos longer than 20 minutes, supports video-based question-answering, and is optimised for mobile devices and robotics.

For more information on the conference, click here.

(Photo by: @Guy_AI_Wise via X)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI’s latest LLM opens doors for China’s AI startups appeared first on AI News.

Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost

Ryan Daws — Fri, 25 Apr 2025 12:28:01 +0000

Baidu has unveiled ERNIE X1 Turbo and 4.5 Turbo, two fast models that boast impressive performance alongside dramatic cost reductions.

Developed as enhancements to the existing ERNIE X1 and 4.5 models, both new Turbo versions highlight multimodal processing, robust reasoning skills, and aggressive pricing strategies designed to capture developer interest and marketshare.

Baidu ERNIE X1 Turbo: Deep reasoning meets cost efficiency

Positioned as a deep-thinking reasoning model, ERNIE X1 Turbo tackles complex tasks requiring sophisticated understanding. It enters a competitive field, claiming superior performance in some benchmarks against rivals like DeepSeek R1, V3, and OpenAI o1:

Key to X1 Turbo’s enhanced capabilities is an advanced “chain of thought” process, enabling more structured and logical problem-solving.

Furthermore, ERNIE X1 Turbo boasts improved multimodal functions – the ability to understand and process information beyond just text, potentially including images or other data types – alongside refined tool utilisation abilities. This makes it particularly well-suited for nuanced applications such as literary creation, complex logical reasoning challenges, code generation, and intricate instruction following.

ERNIE X1 Turbo achieves this performance while undercutting competitor pricing. Input token costs start at $0.14 per million tokens, with output tokens priced at $0.55 per million. This pricing structure is approximately 25% of DeepSeek R1.

Baidu ERNIE 4.5 Turbo: Multimodal muscle at a fraction of the cost

Sharing the spotlight is ERNIE 4.5 Turbo, which focuses on delivering upgraded multimodal features and significantly faster response times compared to its non-Turbo counterpart. The emphasis here is on providing a versatile, responsive AI experience while slashing operational costs.

The model achieves an 80% price reduction compared to the original ERNIE 4.5 with input set at $0.11 per million tokens and output at $0.44 per million tokens. This represents roughly 40% of the cost of the latest version of DeepSeek V3, again highlighting a deliberate strategy to attract users through cost-effectiveness.

Performance benchmarks further bolster its credentials. In multiple tests evaluating both multimodal and text capabilities, Baidu ERNIE 4.5 Turbo outperforms OpenAI’s highly-regarded GPT-4o model.

In multimodal capability assessments, ERNIE 4.5 Turbo achieved an average score of 77.68 to surpass GPT-4o’s score of 72.76 in the same tests.

While benchmark results always require careful interpretation, this suggests ERNIE 4.5 Turbo is a serious contender for tasks involving an integrated understanding of different data types.

Baidu continues to shake up the AI marketplace

The launch of ERNIE X1 Turbo and 4.5 Turbo signifies a growing trend in the AI sector: the democratisation of high-end capabilities. While foundational models continue to push the boundaries of performance, there is increasing demand for models that balance power with accessibility and affordability.

By lowering the price points for models with sophisticated reasoning and multimodal features, the Baidu ERNIE Turbo series could enable a wider range of developers and businesses to integrate advanced AI into their applications.

This competitive pricing puts pressure on established players like OpenAI and Anthropic, as well as emerging competitors like DeepSeek, potentially leading to further price adjustments across the market.

(Image Credit: Alpha Photo under CC BY-NC 2.0 license)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost appeared first on AI News.

Alarming rise in AI-powered scams: Microsoft reveals $4B in thwarted fraud

Dashveenjit Kaur — Thu, 24 Apr 2025 19:01:38 +0000

AI-powered scams are evolving rapidly as cybercriminals use new technologies to target victims, according to Microsoft’s latest Cyber Signals report.

Over the past year, the tech giant says it has prevented $4 billion in fraud attempts, blocking approximately 1.6 million bot sign-up attempts every hour – showing the scale of this growing threat.

The ninth edition of Microsoft’s Cyber Signals report, titled “AI-powered deception: Emerging fraud threats and countermeasures,” reveals how artificial intelligence has lowered the technical barriers for cybercriminals, enabling even low-skilled actors to generate sophisticated scams with minimal effort.

What previously took scammers days or weeks to create can now be accomplished in minutes.

The democratisation of fraud capabilities represents a shift in the criminal landscape that affects consumers and businesses worldwide.

The evolution of AI-enhanced cyber scams

Microsoft’s report highlights how AI tools can now scan and scrape the web for company information, helping cybercriminals build detailed profiles of potential targets for highly-convincing social engineering attacks.

Bad actors can lure victims into complex fraud schemes using fake AI-enhanced product reviews and AI-generated storefronts, which come complete with fabricated business histories and customer testimonials.

According to Kelly Bissell, Corporate Vice President of Anti-Fraud and Product Abuse at Microsoft Security, the threat numbers continue to increase. “Cybercrime is a trillion-dollar problem, and it’s been going up every year for the past 30 years,” per the report.

“I think we have an opportunity today to adopt AI faster so we can detect and close the gap of exposure quickly. Now we have AI that can make a difference at scale and help us build security and fraud protections into our products much faster.”

The Microsoft anti-fraud team reports that AI-powered fraud attacks happen globally, with significant activity originating from China and Europe – particularly Germany, due to its status as one of the largest e-commerce markets in the European Union.

The report notes that the larger a digital marketplace is, the more likely a proportional degree of attempted fraud will occur.

E-commerce and employment scams leading

Two particularly concerning areas of AI-enhanced fraud include e-commerce and job recruitment scams.In the ecommerce space, fraudulent websites can now be created in minutes using AI tools with minimal technical knowledge.

Sites often mimic legitimate businesses, using AI-generated product descriptions, images, and customer reviews to fool consumers into believing they’re interacting with genuine merchants.

Adding another layer of deception, AI-powered customer service chatbots can interact convincingly with customers, delay chargebacks by stalling with scripted excuses, and manipulate complaints with AI-generated responses that make scam sites appear professional.

Job seekers are equally at risk. According to the report, generative AI has made it significantly easier for scammers to create fake listings on various employment platforms. Criminals generate fake profiles with stolen credentials, fake job postings with auto-generated descriptions, and AI-powered email campaigns to phish job seekers.

AI-powered interviews and automated emails enhance the credibility of these scams, making them harder to identify. “Fraudsters often ask for personal information, like resumes or even bank account details, under the guise of verifying the applicant’s information,” the report says.

Red flags include unsolicited job offers, requests for payment and communication through informal platforms like text messages or WhatsApp.

Microsoft’s countermeasures to AI fraud

To combat emerging threats, Microsoft says it has implemented a multi-pronged approach across its products and services. Microsoft Defender for Cloud provides threat protection for Azure resources, while Microsoft Edge, like many browsers, features website typo protection and domain impersonation protection. Edge is noted by the Microsoft report as using deep learning technology to help users avoid fraudulent websites.

The company has also enhanced Windows Quick Assist with warning messages to alert users about possible tech support scams before they grant access to someone claiming to be from IT support. Microsoft now blocks an average of 4,415 suspicious Quick Assist connection attempts daily.

Microsoft has also introduced a new fraud prevention policy as part of its Secure Future Initiative (SFI). As of January 2025, Microsoft product teams must perform fraud prevention assessments and implement fraud controls as part of their design process, ensuring products are “fraud-resistant by design.”

As AI-powered scams continue to evolve, consumer awareness remains important. Microsoft advises users to be cautious of urgency tactics, verify website legitimacy before making purchases, and never provide personal or financial information to unverified sources.

For enterprises, implementing multi-factor authentication and deploying deepfake-detection algorithms can help mitigate risk.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Alarming rise in AI-powered scams: Microsoft reveals $4B in thwarted fraud appeared first on AI News.

Coalition opposes OpenAI shift from nonprofit roots

Ryan Daws — Thu, 24 Apr 2025 15:02:57 +0000

A coalition of experts, including former OpenAI employees, has voiced strong opposition to the company’s shift away from its nonprofit roots.

In an open letter addressed to the Attorneys General of California and Delaware, the group – which also includes legal experts, corporate governance specialists, AI researchers, and nonprofit representatives – argues that the proposed changes fundamentally threaten OpenAI’s original charitable mission.

OpenAI was founded with a unique structure. Its core purpose, enshrined in its Articles of Incorporation, is “to ensure that artificial general intelligence benefits all of humanity” rather than serving “the private gain of any person.”

The letter’s signatories contend that the planned restructuring – transforming the current for-profit subsidiary (OpenAI-profit) controlled by the original nonprofit entity (OpenAI-nonprofit) into a Delaware public benefit corporation (PBC) – would dismantle crucial governance safeguards.

This shift, the signatories argue, would transfer ultimate control over the development and deployment of potentially transformative Artificial General Intelligence (AGI) from a charity focused on humanity’s benefit to a for-profit enterprise accountable to shareholders.

Original vision of OpenAI: Nonprofit control as a bulwark

OpenAI defines AGI as “highly autonomous systems that outperform humans at most economically valuable work”. While acknowledging AGI’s potential to “elevate humanity,” OpenAI’s leadership has also warned of “serious risk of misuse, drastic accidents, and societal disruption.”

Co-founder Sam Altman and others have even signed statements equating mitigating AGI extinction risks with preventing pandemics and nuclear war.

The company’s founders – including Altman, Elon Musk, and Greg Brockman – were initially concerned about AGI being developed by purely commercial entities like Google. They established OpenAI as a nonprofit specifically “unconstrained by a need to generate financial return”. As Altman stated in 2017, “The only people we want to be accountable to is humanity as a whole.”

Even when OpenAI introduced a “capped-profit” subsidiary in 2019 to attract necessary investment, it emphasised that the nonprofit parent would retain control and that the mission remained paramount. Key safeguards included:

Nonprofit control: The for-profit subsidiary was explicitly “controlled by OpenAI Nonprofit’s board”.
Capped profits: Investor returns were capped, with excess value flowing back to the nonprofit for humanity’s benefit.
Independent board: A majority of nonprofit board members were required to be independent, holding no financial stake in the subsidiary.
Fiduciary duty: The board’s legal duty was solely to the nonprofit’s mission, not to maximising investor profit.
AGI ownership: AGI technologies were explicitly reserved for the nonprofit to govern.

Altman himself testified to Congress in 2023 that this “unusual structure” “ensures it remains focused on [its] long-term mission.”

A threat to the mission?

The critics argue the move to a PBC structure would jeopardise these safeguards:

Subordination of mission: A PBC board – while able to consider public benefit – would also have duties to shareholders, potentially balancing profit against the mission rather than prioritising the mission above all else.
Loss of enforceable duty: The current structure gives Attorneys General the power to enforce the nonprofit’s duty to the public. Under a PBC, this direct public accountability – enforceable by regulators – would likely vanish, leaving shareholder derivative suits as the primary enforcement mechanism.
Uncapped profits?: Reports suggest the profit cap might be removed, potentially reallocating vast future wealth from the public benefit mission to private shareholders.
Board independence uncertain: Commitments to a majority-independent board overseeing AI development could disappear.
AGI control shifts: Ownership and control of AGI would likely default to the PBC and its investors, not the mission-focused nonprofit. Reports even suggest OpenAI and Microsoft have discussed removing contractual restrictions on Microsoft’s access to future AGI.
Charter commitments at risk: Commitments like the “stop-and-assist” clause (pausing competition to help a safer, aligned AGI project) might not be honoured by a profit-driven entity.

OpenAI has publicly cited competitive pressures (i.e. attracting investment and talent against rivals with conventional equity structures) as reasons for the change.

However, the letter counters that competitive advantage isn’t the charitable purpose of OpenAI and that its unique nonprofit structure was designed to impose certain competitive costs in favour of safety and public benefit.

“Obtaining a competitive advantage by abandoning the very governance safeguards designed to ensure OpenAI remains true to its mission is unlikely to, on balance, advance the mission,” the letter states.

The authors also question why OpenAI abandoning nonprofit control is necessary merely to simplify the capital structure, suggesting the core issue is the subordination of investor interests to the mission. They argue that while the nonprofit board can consider investor interests if it serves the mission, the restructuring appears aimed at allowing these interests to prevail at the expense of the mission.

Many of these arguments have also been pushed by Elon Musk in his legal action against OpenAI. Earlier this month, OpenAI counter-sued Musk for allegedly orchestrating a “relentless” and “malicious” campaign designed to “take down OpenAI” after he left the company years ago and started rival AI firm xAI.

Call for intervention

The signatories of the open letter urge intervention, demanding answers from OpenAI about how the restructuring away from a nonprofit serves its mission and why safeguards previously deemed essential are now obstacles.

Furthemore, the signatories request a halt to the restructuring, preservation of nonprofit control and other safeguards, and measures to ensure the board’s independence and ability to oversee management effectively in line with the charitable purpose.

“The proposed restructuring would eliminate essential safeguards, effectively handing control of, and profits from, what could be the most powerful technology ever created to a for-profit entity with legal duties to prioritise shareholder returns,” the signatories conclude.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Coalition opposes OpenAI shift from nonprofit roots appeared first on AI News.

How does AI judge? Anthropic studies the values of Claude

Ryan Daws — Wed, 23 Apr 2025 12:04:53 +0000

AI models like Anthropic Claude are increasingly asked not just for factual recall, but for guidance involving complex human values. Whether it’s parenting advice, workplace conflict resolution, or help drafting an apology, the AI’s response inherently reflects a set of underlying principles. But how can we truly understand which values an AI expresses when interacting with millions of users?

In a research paper, the Societal Impacts team at Anthropic details a privacy-preserving methodology designed to observe and categorise the values Claude exhibits “in the wild.” This offers a glimpse into how AI alignment efforts translate into real-world behaviour.

The core challenge lies in the nature of modern AI. These aren’t simple programs following rigid rules; their decision-making processes are often opaque.

Anthropic says it explicitly aims to instil certain principles in Claude, striving to make it “helpful, honest, and harmless.” This is achieved through techniques like Constitutional AI and character training, where preferred behaviours are defined and reinforced.

However, the company acknowledges the uncertainty. “As with any aspect of AI training, we can’t be certain that the model will stick to our preferred values,” the research states.

“What we need is a way of rigorously observing the values of an AI model as it responds to users ‘in the wild’ […] How rigidly does it stick to the values? How much are the values it expresses influenced by the particular context of the conversation? Did all our training actually work?”

Analysing Anthropic Claude to observe AI values at scale

To answer these questions, Anthropic developed a sophisticated system that analyses anonymised user conversations. This system removes personally identifiable information before using language models to summarise interactions and extract the values being expressed by Claude. The process allows researchers to build a high-level taxonomy of these values without compromising user privacy.

The study analysed a substantial dataset: 700,000 anonymised conversations from Claude.ai Free and Pro users over one week in February 2025, predominantly involving the Claude 3.5 Sonnet model. After filtering out purely factual or non-value-laden exchanges, 308,210 conversations (approximately 44% of the total) remained for in-depth value analysis.

The analysis revealed a hierarchical structure of values expressed by Claude. Five high-level categories emerged, ordered by prevalence:

Practical values: Emphasising efficiency, usefulness, and goal achievement.
Epistemic values: Relating to knowledge, truth, accuracy, and intellectual honesty.
Social values: Concerning interpersonal interactions, community, fairness, and collaboration.
Protective values: Focusing on safety, security, well-being, and harm avoidance.
Personal values: Centred on individual growth, autonomy, authenticity, and self-reflection.

These top-level categories branched into more specific subcategories like “professional and technical excellence” or “critical thinking.” At the most granular level, frequently observed values included “professionalism,” “clarity,” and “transparency” – fitting for an AI assistant.

Critically, the research suggests Anthropic’s alignment efforts are broadly successful. The expressed values often map well onto the “helpful, honest, and harmless” objectives. For instance, “user enablement” aligns with helpfulness, “epistemic humility” with honesty, and values like “patient wellbeing” (when relevant) with harmlessness.

Nuance, context, and cautionary signs

However, the picture isn’t uniformly positive. The analysis identified rare instances where Claude expressed values starkly opposed to its training, such as “dominance” and “amorality.”

Anthropic suggests a likely cause: “The most likely explanation is that the conversations that were included in these clusters were from jailbreaks, where users have used special techniques to bypass the usual guardrails that govern the model’s behavior.”

Far from being solely a concern, this finding highlights a potential benefit: the value-observation method could serve as an early warning system for detecting attempts to misuse the AI.

The study also confirmed that, much like humans, Claude adapts its value expression based on the situation.

When users sought advice on romantic relationships, values like “healthy boundaries” and “mutual respect” were disproportionately emphasised. When asked to analyse controversial history, “historical accuracy” came strongly to the fore. This demonstrates a level of contextual sophistication beyond what static, pre-deployment tests might reveal.

Furthermore, Claude’s interaction with user-expressed values proved multifaceted:

Mirroring/strong support (28.2%): Claude often reflects or strongly endorses the values presented by the user (e.g., mirroring “authenticity”). While potentially fostering empathy, the researchers caution it could sometimes verge on sycophancy.
Reframing (6.6%): In some cases, especially when providing psychological or interpersonal advice, Claude acknowledges the user’s values but introduces alternative perspectives.
Strong resistance (3.0%): Occasionally, Claude actively resists user values. This typically occurs when users request unethical content or express harmful viewpoints (like moral nihilism). Anthropic posits these moments of resistance might reveal Claude’s “deepest, most immovable values,” akin to a person taking a stand under pressure.

Limitations and future directions

Anthropic is candid about the method’s limitations. Defining and categorising “values” is inherently complex and potentially subjective. Using Claude itself to power the categorisation might introduce bias towards its own operational principles.

This method is designed for monitoring AI behaviour post-deployment, requiring substantial real-world data and cannot replace pre-deployment evaluations. However, this is also a strength, enabling the detection of issues – including sophisticated jailbreaks – that only manifest during live interactions.

The research concludes that understanding the values AI models express is fundamental to the goal of AI alignment.

“AI models will inevitably have to make value judgments,” the paper states. “If we want those judgments to be congruent with our own values […] then we need to have ways of testing which values a model expresses in the real world.”

This work provides a powerful, data-driven approach to achieving that understanding. Anthropic has also released an open dataset derived from the study, allowing other researchers to further explore AI values in practice. This transparency marks a vital step in collectively navigating the ethical landscape of sophisticated AI.

We’ve made the dataset of Claude’s expressed values open for anyone to download and explore for themselves.

Download the data: https://t.co/rxwPsq6hXf
— Anthropic (@AnthropicAI) April 21, 2025

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post How does AI judge? Anthropic studies the values of Claude appeared first on AI News.

China’s MCP adoption: AI assistants that actually do things

Dashveenjit Kaur — Wed, 23 Apr 2025 12:03:11 +0000

China’s tech companies will drive adoption of the MCP (Model Context Protocol) standard that transforms AI assistants from simple chatbots into powerful digital helpers.

MCP works like a universal connector that lets AI assistants interact directly with favourite apps and services – enabling them to make payments, book appointments, check maps, and access information on different platforms on users’ behalves.

As reported by the South China Morning Post, companies like Ant Group, Alibaba Cloud, and Baidu are deploying MCP-based services and positioning AI agents as the next step, after chatbots and large language models. But will China’s MCP adoption truly transform the AI landscape, or is it simply another step in the technology’s evolution?

Why China’s MCP adoption matters for AI’s evolution

The Model Context Protocol was initially introduced by Anthropic in November 2024, at the time described as a standard that connects AI agents “to the systems where data lives, including content repositories, business tools and development environments.”

MCP serves as what Ant Group calls a “USB-C port for AI applications” – a universal connector allowing AI agents to integrate with multiple systems.

The standardisation is particularly significant for AI agents like Butterfly Effect’s Manus, which are designed to autonomously perform tasks by creating plans consisting of specific subtasks using available resources.

Unlike traditional chatbots that just respond to queries, AI agents can actively interact with different systems, collect feedback, and incorporate that feedback into new actions.

Chinese tech giants lead the MCP movement

China’s MCP adoption by tech leaders highlights the importance placed on AI agents as the next evolution in artificial intelligence:

Ant Group, Alibaba’s fintech affiliate, has unveiled its “MCP server for payment services,” that lets AI agents connect with Alipay’s payment platform. The integration allows users to “easily make payments, check payment statuses and initiate refunds using simple natural language commands,” according to Ant Group’s statement.
Additionally, Ant Group’s AI agent development platform, Tbox, now supports deployment of more than 30 MCP services currently on the market, including those for Alipay, Amap Maps, Google MCP, and Amazon Web Services’ knowledge base retrieval server.
Alibaba Cloud launched an MCP marketplace through its AI model hosting platform ModelScope, offering more than 1,000 services connecting to mapping tools, office collaboration platforms, online storage services, and various Google services.
Baidu, China’s leading search and AI company, has indicated that its support for MCP would foster “abundant use cases for [AI] applications and solutions.”

Beyond chatbots: Why AI agents represent the next frontier

China’s MCP adoption signals a shift in focus from large language models and chatbots to more capable AI agents. As Red Xiao Hong, founder and CEO of Butterfly Effect, described, an AI agent is “more like a human being” compared to how chatbots perform.

The agents not only respond to questions but “interact with the environment, collect feedback and use the feedback as a new prompt.” This distinction is held to be important by companies driving progress in AI.

While chatbots and LLMs can generate text and respond to queries, AI agents can take actions on multiple platforms and services. They represent an advance from the limited capabilities of conventional AI applications toward autonomous systems capable of completing more complex tasks with less human intervention.

The rapid embrace of MCP by Chinese tech companies suggests they view AI agents as a new avenue for innovation and commercial opportunity that go beyond what’s possible with existing chatbots and language models.

China’s MCP adoption could position its tech companies at the forefront of practical AI implementation. By creating standardised ways for AI agents to interact with services, Chinese companies are building ecosystems where AI could deliver more comprehensive experiences.

Challenges and considerations of China’s MCP adoption

Despite the developments in China’s MCP adoption, several factors may influence the standard’s longer-term impact:

International standards competition. While Chinese tech companies are racing to implement MCP, its global success depends on widespread adoption. Originally developed by Anthropic, the protocol faces potential competition from alternative standards that might emerge from other major AI players like OpenAI, Google, or Microsoft.
Regulatory environments. As AI agents gain more autonomy in performing tasks, especially those involving payments and sensitive user data, regulatory scrutiny will inevitably increase. China’s regulatory landscape for AI is still evolving, and how authorities respond to these advancements will significantly impact MCP’s trajectory.
Security and privacy. The integration of AI agents with multiple systems via MCP creates new potential vulnerabilities. Ensuring robust security measures across all connected platforms will be important for maintaining user trust.
Technical integration challenges. While the concept of universal connectivity is appealing, achieving integration across diverse systems with varying architectures, data structures, and security protocols presents significant technical challenges.

The outlook for China’s AI ecosystem

China’s MCP adoption represents a strategic bet on AI agents as the next evolution in artificial intelligence. If successful, it could accelerate the practical implementation of AI in everyday applications, potentially transforming how users interact with digital services.

As Red Xiao Hong noted, AI agents are designed to interact with their environment in ways that more closely resemble human behaviour than traditional AI applications. The capacity for interaction and adaptation could be what finally bridges the gap between narrow AI tools and the more generalised assistants that tech companies have long promised.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post China’s MCP adoption: AI assistants that actually do things appeared first on AI News.

Google introduces AI reasoning control in Gemini 2.5 Flash

Dashveenjit Kaur — Wed, 23 Apr 2025 07:01:20 +0000

Google has introduced an AI reasoning control mechanism for its Gemini 2.5 Flash model that allows developers to limit how much processing power the system expends on problem-solving.

Released on April 17, this “thinking budget” feature responds to a growing industry challenge: advanced AI models frequently overanalyse straightforward queries, consuming unnecessary computational resources and driving up operational and environmental costs.

While not revolutionary, the development represents a practical step toward addressing efficiency concerns that have emerged as reasoning capabilities become standard in commercial AI software.

The new mechanism enables precise calibration of processing resources before generating responses, potentially changing how organisations manage financial and environmental impacts of AI deployment.

“The model overthinks,” acknowledges Tulsee Doshi, Director of Product Management at Gemini. “For simple prompts, the model does think more than it needs to.”

The admission reveals the challenge facing advanced reasoning models – the equivalent of using industrial machinery to crack a walnut.

The shift toward reasoning capabilities has created unintended consequences. Where traditional large language models primarily matched patterns from training data, newer iterations attempt to work through problems logically, step by step. While this approach yields better results for complex tasks, it introduces significant inefficiency when handling simpler queries.

Balancing cost and performance

The financial implications of unchecked AI reasoning are substantial. According to Google’s technical documentation, when full reasoning is activated, generating outputs becomes approximately six times more expensive than standard processing. The cost multiplier creates a powerful incentive for fine-tuned control.

Nathan Habib, an engineer at Hugging Face who studies reasoning models, describes the problem as endemic across the industry. “In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight,” he explained to MIT Technology Review.

The waste isn’t merely theoretical. Habib demonstrated how a leading reasoning model, when attempting to solve an organic chemistry problem, became trapped in a recursive loop, repeating “Wait, but…” hundreds of times – essentially experiencing a computational breakdown and consuming processing resources.

Kate Olszewska, who evaluates Gemini models at DeepMind, confirmed Google’s systems sometimes experience similar issues, getting stuck in loops that drain computing power without improving response quality.

Granular control mechanism

Google’s AI reasoning control provides developers with a degree of precision. The system offers a flexible spectrum ranging from zero (minimal reasoning) to 24,576 tokens of “thinking budget” – the computational units representing the model’s internal processing. The granular approach allows for customised deployment based on specific use cases.

Jack Rae, principal research scientist at DeepMind, says that defining optimal reasoning levels remains challenging: “It’s really hard to draw a boundary on, like, what’s the perfect task right now for thinking.”

Shifting development philosophy

The introduction of AI reasoning control potentially signals a change in how artificial intelligence evolves. Since 2019, companies have pursued improvements by building larger models with more parameters and training data. Google’s approach suggests an alternative path focusing on efficiency rather than scale.

“Scaling laws are being replaced,” says Habib, indicating that future advances may emerge from optimising reasoning processes rather than continuously expanding model size.

The environmental implications are equally significant. As reasoning models proliferate, their energy consumption grows proportionally. Research indicates that inferencing – generating AI responses – now contributes more to the technology’s carbon footprint than the initial training process. Google’s reasoning control mechanism offers a potential mitigating factor for this concerning trend.

Competitive dynamics

Google isn’t operating in isolation. The “open weight” DeepSeek R1 model, which emerged earlier this year, demonstrated powerful reasoning capabilities at potentially lower costs, triggering market volatility that reportedly caused nearly a trillion-dollar stock market fluctuation.

Unlike Google’s proprietary approach, DeepSeek makes its internal settings publicly available for developers to implement locally.

Despite the competition, Google DeepMind’s chief technical officer Koray Kavukcuoglu maintains that proprietary models will maintain advantages in specialised domains requiring exceptional precision: “Coding, math, and finance are cases where there’s high expectation from the model to be very accurate, to be very precise, and to be able to understand really complex situations.”

Industry maturation signs

The development of AI reasoning control reflects an industry now confronting practical limitations beyond technical benchmarks. While companies continue to push reasoning capabilities forward, Google’s approach acknowledges a important reality: efficiency matters as much as raw performance in commercial applications.

The feature also highlights tensions between technological advancement and sustainability concerns. Leaderboards tracking reasoning model performance show that single tasks can cost upwards of $200 to complete – raising questions about scaling such capabilities in production environments.

By allowing developers to dial reasoning up or down based on actual need, Google addresses both financial and environmental aspects of AI deployment.

“Reasoning is the key capability that builds up intelligence,” states Kavukcuoglu. “The moment the model starts thinking, the agency of the model has started.” The statement reveals both the promise and the challenge of reasoning models – their autonomy creates both opportunities and resource management challenges.

For organisations deploying AI solutions, the ability to fine-tune reasoning budgets could democratise access to advanced capabilities while maintaining operational discipline.

Google claims Gemini 2.5 Flash delivers “comparable metrics to other leading models for a fraction of the cost and size” – a value proposition strengthened by the ability to optimise reasoning resources for specific applications.

Practical implications

The AI reasoning control feature has immediate practical applications. Developers building commercial applications can now make informed trade-offs between processing depth and operational costs.

For simple applications like basic customer queries, minimal reasoning settings preserve resources while still using the model’s capabilities. For complex analysis requiring deep understanding, the full reasoning capacity remains available.

Google’s reasoning ‘dial’ provides a mechanism for establishing cost certainty while maintaining performance standards.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google introduces AI reasoning control in Gemini 2.5 Flash appeared first on AI News.

Huawei to begin mass shipments of Ascend 910C amid US curbs

Muhammad Zulhusni — Wed, 23 Apr 2025 06:56:04 +0000

Huawei is expected to begin large-scale shipments of the Ascend 910C AI chip as early as next month, according to people familiar with the matter.

While limited quantities have already been delivered, mass deployment would mark an important step for Chinese firms seeking domestic alternatives to US-made semiconductors.

The move comes at a time when Chinese developers face tighter restrictions on access to Nvidia hardware. The US government recently informed Nvidia that sales of its H20 AI chip to China require an export licence. That’s left developers in China looking for options that can support large-scale training and inference workloads.

The Huawei Ascend 910C chip isn’t built on the most advanced process nodes, but it represents a workaround. The chip is essentially a dual-package version of the earlier 910B, with two processors to double the performance and memory. Sources familiar with the chip say it performs comparably to Nvidia’s H100.

Rather than relying on cutting-edge manufacturing, Huawei has adopted a brute-force approach, combining multiple chips and high-speed optical interconnects to scale up performance. This approach is central to Huawei’s CloudMatrix 384 system, a full rack-scale AI platform for training large models.

The CloudMatrix 384 features 384 Huawei Ascend 910C chips deployed in 16 racks comprising of 12 compute racks and four networking. Unlike copper-based systems, Huawei’s platform is uses optical interconnects, enabling high-bandwidth communication between components of the system. According to analysis from SemiAnalysis, the architecture includes 6,912 800G LPO optical transceivers to form an optical all-to-all mesh network.

This allows Huawei’s system to deliver approximately 300 petaFLOPs of BF16 compute power – outpacing Nvidia’s GB200 NVL72 system, which reaches around 180 BF16 petaFLOPs. The CloudMatrix also claims advantages in higher memory bandwidth and capacity, offering more than double the bandwidth and over 3.6 times the high-bandwidth memory (HBM) capacity.

The gains, however, are not without drawbacks. The Huawei system is predicted to be 2.3 times less efficient per floating point operation than Nvidia’s GB200 and has lower power efficiency per unit of memory bandwidth and capacity. Despite the lower performance per watt, Huawei’s system still provides the infrastructure needed to train advanced AI models at scale.

Sources indicate that China’s largest chip foundry, SMIC, is producing some of the main components for the 910C using its 7nm N+2 process. Yield levels remain a concern, however, and some of the 910C units reportedly include chips produced by TSMC for Chinese firm Sophgo. Huawei has denied using TSMC-made parts.

The US Commerce Department is currently investigating the relationship between TSMC and Sophgo after a Sophgo-designed chip was found in Huawei’s earlier 910B processor. TSMC has maintained that it has not supplied Huawei since 2020 and continues to comply with export regulations.

In late 2023, Huawei began distributing early samples of the 910C to selected technology firms and opened its order books. Consulting firm Albright Stonebridge Group suggested the chip is likely to become the go-to choice for Chinese companies building large AI models or deploying inference capacity, given the ongoing export controls on US-made chips.

While the Huawei Ascend 910C may not match Nvidia in power efficiency or process technology, it signals a broader trend. Chinese technology firms are developing homegrown alternatives to foreign components, even if it means using less advanced methods to achieve similar outcomes.

As global AI demand surges and export restrictions tighten, Huawei’s ability to deliver a scalable AI hardware solution domestically could help shape China’s artificial intelligence future – especially as developers look to secure long-term supply chains and reduce exposure to geopolitical risk.

(Photo via Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Huawei to begin mass shipments of Ascend 910C amid US curbs appeared first on AI News.

Meta FAIR advances human-like AI with five major releases

Ryan Daws — Thu, 17 Apr 2025 16:00:05 +0000

The Fundamental AI Research (FAIR) team at Meta has announced five projects advancing the company’s pursuit of advanced machine intelligence (AMI).

The latest releases from Meta focus heavily on enhancing AI perception – the ability for machines to process and interpret sensory information – alongside advancements in language modelling, robotics, and collaborative AI agents.

Meta stated its goal involves creating machines “that are able to acquire, process, and interpret sensory information about the world around us and are able to use this information to make decisions with human-like intelligence and speed.”

The five new releases represent diverse but interconnected efforts towards achieving this ambitious goal.

Perception Encoder: Meta sharpens the ‘vision’ of AI

Central to the new releases is the Perception Encoder, described as a large-scale vision encoder designed to excel across various image and video tasks.

Vision encoders function as the “eyes” for AI systems, allowing them to understand visual data.

Meta highlights the increasing challenge of building encoders that meet the demands of advanced AI, requiring capabilities that bridge vision and language, handle both images and videos effectively, and remain robust under challenging conditions, including potential adversarial attacks.

The ideal encoder, according to Meta, should recognise a wide array of concepts while distinguishing subtle details—citing examples like spotting “a stingray burrowed under the sea floor, identifying a tiny goldfinch in the background of an image, or catching a scampering agouti on a night vision wildlife camera.”

Meta claims the Perception Encoder achieves “exceptional performance on image and video zero-shot classification and retrieval, surpassing all existing open source and proprietary models for such tasks.”

Furthermore, its perceptual strengths reportedly translate well to language tasks.

When aligned with a large language model (LLM), the encoder is said to outperform other vision encoders in areas like visual question answering (VQA), captioning, document understanding, and grounding (linking text to specific image regions). It also reportedly boosts performance on tasks traditionally difficult for LLMs, such as understanding spatial relationships (e.g., “if one object is behind another”) or camera movement relative to an object.

“As Perception Encoder begins to be integrated into new applications, we’re excited to see how its advanced vision capabilities will enable even more capable AI systems,” Meta said.

Perception Language Model (PLM): Open research in vision-language

Complementing the encoder is the Perception Language Model (PLM), an open and reproducible vision-language model aimed at complex visual recognition tasks.

PLM was trained using large-scale synthetic data combined with open vision-language datasets, explicitly without distilling knowledge from external proprietary models.

Recognising gaps in existing video understanding data, the FAIR team collected 2.5 million new, human-labelled samples focused on fine-grained video question answering and spatio-temporal captioning. Meta claims this forms the “largest dataset of its kind to date.”

PLM is offered in 1, 3, and 8 billion parameter versions, catering to academic research needs requiring transparency.

Alongside the models, Meta is releasing PLM-VideoBench, a new benchmark specifically designed to test capabilities often missed by existing benchmarks, namely “fine-grained activity understanding and spatiotemporally grounded reasoning.”

Meta hopes the combination of open models, the large dataset, and the challenging benchmark will empower the open-source community.

Meta Locate 3D: Giving robots situational awareness

Bridging the gap between language commands and physical action is Meta Locate 3D. This end-to-end model aims to allow robots to accurately localise objects in a 3D environment based on open-vocabulary natural language queries.

Meta Locate 3D processes 3D point clouds directly from RGB-D sensors (like those found on some robots or depth-sensing cameras). Given a textual prompt, such as “flower vase near TV console,” the system considers spatial relationships and context to pinpoint the correct object instance, distinguishing it from, say, a “vase on the table.”

The system comprises three main parts: a preprocessing step converting 2D features to 3D featurised point clouds; the 3D-JEPA encoder (a pretrained model creating a contextualised 3D world representation); and the Locate 3D decoder, which takes the 3D representation and the language query to output bounding boxes and masks for the specified objects.

Alongside the model, Meta is releasing a substantial new dataset for object localisation based on referring expressions. It includes 130,000 language annotations across 1,346 scenes from the ARKitScenes, ScanNet, and ScanNet++ datasets, effectively doubling existing annotated data in this area.

Meta sees this technology as crucial for developing more capable robotic systems, including its own PARTNR robot project, enabling more natural human-robot interaction and collaboration.

Dynamic Byte Latent Transformer: Efficient and robust language modelling

Following research published in late 2024, Meta is now releasing the model weights for its 8-billion parameter Dynamic Byte Latent Transformer.

This architecture represents a shift away from traditional tokenisation-based language models, operating instead at the byte level. Meta claims this approach achieves comparable performance at scale while offering significant improvements in inference efficiency and robustness.

Traditional LLMs break text into ‘tokens’, which can struggle with misspellings, novel words, or adversarial inputs. Byte-level models process raw bytes, potentially offering greater resilience.

Meta reports that the Dynamic Byte Latent Transformer “outperforms tokeniser-based models across various tasks, with an average robustness advantage of +7 points (on perturbed HellaSwag), and reaching as high as +55 points on tasks from the CUTE token-understanding benchmark.”

By releasing the weights alongside the previously shared codebase, Meta encourages the research community to explore this alternative approach to language modelling.

Collaborative Reasoner: Meta advances socially-intelligent AI agents

The final release, Collaborative Reasoner, tackles the complex challenge of creating AI agents that can effectively collaborate with humans or other AIs.

Meta notes that human collaboration often yields superior results, and aims to imbue AI with similar capabilities for tasks like helping with homework or job interview preparation.

Such collaboration requires not just problem-solving but also social skills like communication, empathy, providing feedback, and understanding others’ mental states (theory-of-mind), often unfolding over multiple conversational turns.

Current LLM training and evaluation methods often neglect these social and collaborative aspects. Furthermore, collecting relevant conversational data is expensive and difficult.

Collaborative Reasoner provides a framework to evaluate and enhance these skills. It includes goal-oriented tasks requiring multi-step reasoning achieved through conversation between two agents. The framework tests abilities like disagreeing constructively, persuading a partner, and reaching a shared best solution.

Meta’s evaluations revealed that current models struggle to consistently leverage collaboration for better outcomes. To address this, they propose a self-improvement technique using synthetic interaction data where an LLM agent collaborates with itself.

Generating this data at scale is enabled by a new high-performance model serving engine called Matrix. Using this approach on maths, scientific, and social reasoning tasks reportedly yielded improvements of up to 29.4% compared to the standard ‘chain-of-thought’ performance of a single LLM.

By open-sourcing the data generation and modelling pipeline, Meta aims to foster further research into creating truly “social agents that can partner with humans and other agents.”

These five releases collectively underscore Meta’s continued heavy investment in fundamental AI research, particularly focusing on building blocks for machines that can perceive, understand, and interact with the world in more human-like ways.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta FAIR advances human-like AI with five major releases appeared first on AI News.