DeepMind - AI News https://www.artificialintelligence-news.com/categories/ai-companies/deepmind/ Artificial Intelligence News Fri, 02 May 2025 12:38:13 +0000 en-GB hourly 1 https://wordpress.org/?v=6.8.1 https://www.artificialintelligence-news.com/wp-content/uploads/2020/09/cropped-ai-icon-32x32.png DeepMind - AI News https://www.artificialintelligence-news.com/categories/ai-companies/deepmind/ 32 32 Google AMIE: AI doctor learns to ‘see’ medical images https://www.artificialintelligence-news.com/news/google-amie-ai-doctor-learns-to-see-medical-images/ https://www.artificialintelligence-news.com/news/google-amie-ai-doctor-learns-to-see-medical-images/#respond Fri, 02 May 2025 12:38:12 +0000 https://www.artificialintelligence-news.com/?p=106274 Google is giving its diagnostic AI the ability to understand visual medical information with its latest research on AMIE (Articulate Medical Intelligence Explorer). Imagine chatting with an AI about a health concern, and instead of just processing your words, it could actually look at the photo of that worrying rash or make sense of your […]

The post Google AMIE: AI doctor learns to ‘see’ medical images appeared first on AI News.

]]>
Google is giving its diagnostic AI the ability to understand visual medical information with its latest research on AMIE (Articulate Medical Intelligence Explorer).

Imagine chatting with an AI about a health concern, and instead of just processing your words, it could actually look at the photo of that worrying rash or make sense of your ECG printout. That’s what Google is aiming for.

We already knew AMIE showed promise in text-based medical chats, thanks to earlier work published in Nature. But let’s face it, real medicine isn’t just about words.

Doctors rely heavily on what they can see – skin conditions, readings from machines, lab reports. As the Google team rightly points out, even simple instant messaging platforms “allow static multimodal information (e.g., images and documents) to enrich discussions.”

Text-only AI was missing a huge piece of the puzzle. The big question, as the researchers put it, was “Whether LLMs can conduct diagnostic clinical conversations that incorporate this more complex type of information.”

Google teaches AMIE to look and reason

Google’s engineers have beefed up AMIE using their Gemini 2.0 Flash model as the brains of the operation. They’ve combined this with what they call a “state-aware reasoning framework.” In plain English, this means the AI doesn’t just follow a script; it adapts its conversation based on what it’s learned so far and what it still needs to figure out.

It’s close to how a human clinician works: gathering clues, forming ideas about what might be wrong, and then asking for more specific information – including visual evidence – to narrow things down.

“This enables AMIE to request relevant multimodal artifacts when needed, interpret their findings accurately, integrate this information seamlessly into the ongoing dialogue, and use it to refine diagnoses,” Google explains.

Think of the conversation flowing through stages: first gathering the patient’s history, then moving towards diagnosis and management suggestions, and finally follow-up. The AI constantly assesses its own understanding, asking for that skin photo or lab result if it senses a gap in its knowledge.

To get this right without endless trial-and-error on real people, Google built a detailed simulation lab.

Google created lifelike patient cases, pulling realistic medical images and data from sources like the PTB-XL ECG database and the SCIN dermatology image set, adding plausible backstories using Gemini. Then, they let AMIE ‘chat’ with simulated patients within this setup and automatically check how well it performed on things like diagnostic accuracy and avoiding errors (or ‘hallucinations’).

The virtual OSCE: Google puts AMIE through its paces

The real test came in a setup designed to mirror how medical students are assessed: the Objective Structured Clinical Examination (OSCE).

Google ran a remote study involving 105 different medical scenarios. Real actors, trained to portray patients consistently, interacted either with the new multimodal AMIE or with actual human primary care physicians (PCPs). These chats happened through an interface where the ‘patient’ could upload images, just like you might in a modern messaging app.

Afterwards, specialist doctors (in dermatology, cardiology, and internal medicine) and the patient actors themselves reviewed the conversations.

The human doctors scored everything from how well history was taken, the accuracy of the diagnosis, the quality of the suggested management plan, right down to communication skills and empathy—and, of course, how well the AI interpreted the visual information.

Surprising results from the simulated clinic

Here’s where it gets really interesting. In this head-to-head comparison within the controlled study environment, Google found AMIE didn’t just hold its own—it often came out ahead.

The AI was rated as being better than the human PCPs at interpreting the multimodal data shared during the chats. It also scored higher on diagnostic accuracy, producing differential diagnosis lists (the ranked list of possible conditions) that specialists deemed more accurate and complete based on the case details.

Specialist doctors reviewing the transcripts tended to rate AMIE’s performance higher across most areas. They particularly noted “the quality of image interpretation and reasoning,” the thoroughness of its diagnostic workup, the soundness of its management plans, and its ability to flag when a situation needed urgent attention.

Perhaps one of the most surprising findings came from the patient actors: they often found the AI to be more empathetic and trustworthy than the human doctors in these text-based interactions.

And, on a critical safety note, the study found no statistically significant difference between how often AMIE made errors based on the images (hallucinated findings) compared to the human physicians.

Technology never stands still, so Google also ran some early tests swapping out the Gemini 2.0 Flash model for the newer Gemini 2.5 Flash.

Using their simulation framework, the results hinted at further gains, particularly in getting the diagnosis right (Top-3 Accuracy) and suggesting appropriate management plans.

While promising, the team is quick to add a dose of realism: these are just automated results, and “rigorous assessment through expert physician review is essential to confirm these performance benefits.”

Important reality checks

Google is commendably upfront about the limitations here. “This study explores a research-only system in an OSCE-style evaluation using patient actors, which substantially under-represents the complexity… of real-world care,” they state clearly. 

Simulated scenarios, however well-designed, aren’t the same as dealing with the unique complexities of real patients in a busy clinic. They also stress that the chat interface doesn’t capture the richness of a real video or in-person consultation.

So, what’s the next step? Moving carefully towards the real world. Google is already partnering with Beth Israel Deaconess Medical Center for a research study to see how AMIE performs in actual clinical settings with patient consent.

The researchers also acknowledge the need to eventually move beyond text and static images towards handling real-time video and audio—the kind of interaction common in telehealth today.

Giving AI the ability to ‘see’ and interpret the kind of visual evidence doctors use every day offers a glimpse of how AI might one day assist clinicians and patients. However, the path from these promising findings to a safe and reliable tool for everyday healthcare is still a long one that requires careful navigation.

(Photo by Alexander Sinn)

See also: Are AI chatbots really changing the world of work?

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google AMIE: AI doctor learns to ‘see’ medical images appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/google-amie-ai-doctor-learns-to-see-medical-images/feed/ 0
Google introduces AI reasoning control in Gemini 2.5 Flash https://www.artificialintelligence-news.com/news/google-introduces-ai-reasoning-control-gemini-2-5-flash/ https://www.artificialintelligence-news.com/news/google-introduces-ai-reasoning-control-gemini-2-5-flash/#respond Wed, 23 Apr 2025 07:01:20 +0000 https://www.artificialintelligence-news.com/?p=105376 Google has introduced an AI reasoning control mechanism for its Gemini 2.5 Flash model that allows developers to limit how much processing power the system expends on problem-solving. Released on April 17, this “thinking budget” feature responds to a growing industry challenge: advanced AI models frequently overanalyse straightforward queries, consuming unnecessary computational resources and driving […]

The post Google introduces AI reasoning control in Gemini 2.5 Flash appeared first on AI News.

]]>
Google has introduced an AI reasoning control mechanism for its Gemini 2.5 Flash model that allows developers to limit how much processing power the system expends on problem-solving.

Released on April 17, this “thinking budget” feature responds to a growing industry challenge: advanced AI models frequently overanalyse straightforward queries, consuming unnecessary computational resources and driving up operational and environmental costs.

While not revolutionary, the development represents a practical step toward addressing efficiency concerns that have emerged as reasoning capabilities become standard in commercial AI software.

The new mechanism enables precise calibration of processing resources before generating responses, potentially changing how organisations manage financial and environmental impacts of AI deployment.

“The model overthinks,” acknowledges Tulsee Doshi, Director of Product Management at Gemini. “For simple prompts, the model does think more than it needs to.”

The admission reveals the challenge facing advanced reasoning models – the equivalent of using industrial machinery to crack a walnut.

The shift toward reasoning capabilities has created unintended consequences. Where traditional large language models primarily matched patterns from training data, newer iterations attempt to work through problems logically, step by step. While this approach yields better results for complex tasks, it introduces significant inefficiency when handling simpler queries.

Balancing cost and performance

The financial implications of unchecked AI reasoning are substantial. According to Google’s technical documentation, when full reasoning is activated, generating outputs becomes approximately six times more expensive than standard processing. The cost multiplier creates a powerful incentive for fine-tuned control.

Nathan Habib, an engineer at Hugging Face who studies reasoning models, describes the problem as endemic across the industry. “In the rush to show off smarter AI, companies are reaching for reasoning models like hammers even where there’s no nail in sight,” he explained to MIT Technology Review.

The waste isn’t merely theoretical. Habib demonstrated how a leading reasoning model, when attempting to solve an organic chemistry problem, became trapped in a recursive loop, repeating “Wait, but…” hundreds of times – essentially experiencing a computational breakdown and consuming processing resources.

Kate Olszewska, who evaluates Gemini models at DeepMind, confirmed Google’s systems sometimes experience similar issues, getting stuck in loops that drain computing power without improving response quality.

Granular control mechanism

Google’s AI reasoning control provides developers with a degree of precision. The system offers a flexible spectrum ranging from zero (minimal reasoning) to 24,576 tokens of “thinking budget” – the computational units representing the model’s internal processing. The granular approach allows for customised deployment based on specific use cases.

Jack Rae, principal research scientist at DeepMind, says that defining optimal reasoning levels remains challenging: “It’s really hard to draw a boundary on, like, what’s the perfect task right now for thinking.”

Shifting development philosophy

The introduction of AI reasoning control potentially signals a change in how artificial intelligence evolves. Since 2019, companies have pursued improvements by building larger models with more parameters and training data. Google’s approach suggests an alternative path focusing on efficiency rather than scale.

“Scaling laws are being replaced,” says Habib, indicating that future advances may emerge from optimising reasoning processes rather than continuously expanding model size.

The environmental implications are equally significant. As reasoning models proliferate, their energy consumption grows proportionally. Research indicates that inferencing – generating AI responses – now contributes more to the technology’s carbon footprint than the initial training process. Google’s reasoning control mechanism offers a potential mitigating factor for this concerning trend.

Competitive dynamics

Google isn’t operating in isolation. The “open weight” DeepSeek R1 model, which emerged earlier this year, demonstrated powerful reasoning capabilities at potentially lower costs, triggering market volatility that reportedly caused nearly a trillion-dollar stock market fluctuation.

Unlike Google’s proprietary approach, DeepSeek makes its internal settings publicly available for developers to implement locally.

Despite the competition, Google DeepMind’s chief technical officer Koray Kavukcuoglu maintains that proprietary models will maintain advantages in specialised domains requiring exceptional precision: “Coding, math, and finance are cases where there’s high expectation from the model to be very accurate, to be very precise, and to be able to understand really complex situations.”

Industry maturation signs

The development of AI reasoning control reflects an industry now confronting practical limitations beyond technical benchmarks. While companies continue to push reasoning capabilities forward, Google’s approach acknowledges a important reality: efficiency matters as much as raw performance in commercial applications.

The feature also highlights tensions between technological advancement and sustainability concerns. Leaderboards tracking reasoning model performance show that single tasks can cost upwards of $200 to complete – raising questions about scaling such capabilities in production environments.

By allowing developers to dial reasoning up or down based on actual need, Google addresses both financial and environmental aspects of AI deployment.

“Reasoning is the key capability that builds up intelligence,” states Kavukcuoglu. “The moment the model starts thinking, the agency of the model has started.” The statement reveals both the promise and the challenge of reasoning models – their autonomy creates both opportunities and resource management challenges.

For organisations deploying AI solutions, the ability to fine-tune reasoning budgets could democratise access to advanced capabilities while maintaining operational discipline.

Google claims Gemini 2.5 Flash delivers “comparable metrics to other leading models for a fraction of the cost and size” – a value proposition strengthened by the ability to optimise reasoning resources for specific applications.

Practical implications

The AI reasoning control feature has immediate practical applications. Developers building commercial applications can now make informed trade-offs between processing depth and operational costs.

For simple applications like basic customer queries, minimal reasoning settings preserve resources while still using the model’s capabilities. For complex analysis requiring deep understanding, the full reasoning capacity remains available.

Google’s reasoning ‘dial’ provides a mechanism for establishing cost certainty while maintaining performance standards.

See also: Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google introduces AI reasoning control in Gemini 2.5 Flash appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/google-introduces-ai-reasoning-control-gemini-2-5-flash/feed/ 0
Machines Can See 2025 – Dubai AI event https://www.artificialintelligence-news.com/news/machines-can-see-2025-dubai-ai-event/ https://www.artificialintelligence-news.com/news/machines-can-see-2025-dubai-ai-event/#respond Thu, 17 Apr 2025 14:48:32 +0000 https://www.artificialintelligence-news.com/?p=105362 An AI investment and networking event, Machines Can See, will take place April 23-24 in Dubai at the iconic Museum of the Future, as part of Dubai AI week. Machines Can See is staged by the Polynome Group, a machine vision, AI, robotic, and industrial design company based in the city. This is the third […]

The post Machines Can See 2025 – Dubai AI event appeared first on AI News.

]]>
An AI investment and networking event, Machines Can See, will take place April 23-24 in Dubai at the iconic Museum of the Future, as part of Dubai AI week.

Machines Can See is staged by the Polynome Group, a machine vision, AI, robotic, and industrial design company based in the city.

This is the third year of the event, and will bring investors, business leaders, and policymakers together to explore AI-centric expansion opportunities. Machines Can See, as the name suggests, will have a particular focus on computer vision.

Each discussion and keynote is designed to be firmly rooted in practical applications of AI technology, but organisers hope that the show will be permeated with a sense of discovery and that attendees will be able to explore the possibilities of the tech on show. “We are not just shaping the future of AI, we are defining how AI shapes the world,” said Alexander Khanin, head of the Polynome Group.

UAE Government officials attending the event include H.E. Omar Sultan Al Olama, UAE Minister of State for Artificial Intelligence, Digital Economy, and Remote Work Applications, and H.E. Hamad Obaid Al Mansoori, the Director General of Digital Dubai.

Polynome Group has said that X will be the official streaming partner for Machines Can See 2025, and the US company will host workshops titled “X and AI” to show solutions that merge AI and streaming technologies, with GRok X central to those sessions. Via interactive demos, attendees will gain firsthand experience of GRok’s potential in AI delivery, analysis and optimisation.

Investment and business opportunities

UAE’s AI market is projected to grow by $8.4 billion in the next two years, and the summit is designed to serve as a venue for investors to engage with AI startups, existing enterprises, and government decision-makers. Attendees at Machines Can See will get to meet with investors and venture capital firms, be given the opportunity to meet executives from AI companies (including IBM and Amazon), and connect with startups seeking investment.

The summit is supported by Amazon Prime Video & Studios, Amazon Web Services, Dubai Police, MBZUAI, IBM, SAP, Adia Lab, QuantumBlack and Yango. The involvement of many organisations and large-scale enterprises should provide many opportunities for funding and collaborations that extend the commercial use of AI.

Local and international investors include Eddy Farhat, Executive Director at e& capital, Faris Al Mazrui, Head of Growth Investments at Mubadala, Major General Khalid Nasser Alrazooqi General Director of Artificial Intelligence, Dubai Police UEA, and Dr. Najwa Aaraj, the CEO of TII.

Speakers and insights

The summit will feature several US-based AI professionals, including Namik Hrle, IBM Fellow and Vice President of Development at the IBM Software Group, Michael Bronstein, DeepMind Professor of AI at Oxford University, Marc Pollefeys, Professor of Computer Science at ETH Zurich, Gerard Medioni, VP and Distinguished Scientist at Amazon Prime Video & Studio, and Deva Ramanan, Professor at the Robotics Institute of Carnegie Mellon University.

The event will feature a ministerial session composed of international government representatives to discuss the role of national IT development.

Among speakers already confirmed for the event are Gobind Singh Deo, Malaysia’s Minister of Digital, H.E. Zhaslan Madiyev, Minister of Digital Development, Innovation, and Aerospace Industry of Kazakhstan, and H.E. Omar Sultan Al Olama, UAE Minister of State for Artificial Intelligence, Digital Economy, and Remote Work Applications.

Event organisers expect to announce more representatives from overseas in the coming days. Read more here.

The post Machines Can See 2025 – Dubai AI event appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/machines-can-see-2025-dubai-ai-event/feed/ 0
DolphinGemma: Google AI model understands dolphin chatter https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/ https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/#respond Mon, 14 Apr 2025 14:18:49 +0000 https://www.artificialintelligence-news.com/?p=105315 Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication. The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations. Google, collaborating with engineers at […]

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
Google has developed an AI model called DolphinGemma to decipher how dolphins communicate and one day facilitate interspecies communication.

The intricate clicks, whistles, and pulses echoing through the underwater world of dolphins have long fascinated scientists. The dream has been to understand and decipher the patterns within their complex vocalisations.

Google, collaborating with engineers at the Georgia Institute of Technology and leveraging the field research of the Wild Dolphin Project (WDP), has unveiled DolphinGemma to help realise that goal.

Announced around National Dolphin Day, the foundational AI model represents a new tool in the effort to comprehend cetacean communication. Trained specifically to learn the structure of dolphin sounds, DolphinGemma can even generate novel, dolphin-like audio sequences.

Over decades, the Wild Dolphin Project – operational since 1985 – has run the world’s longest continuous underwater study of dolphins to develop a deep understanding of context-specific sounds, such as:

  • Signature “whistles”: Serving as unique identifiers, akin to names, crucial for interactions like mothers reuniting with calves.
  • Burst-pulse “squawks”: Commonly associated with conflict or aggressive encounters.
  • Click “buzzes”: Often detected during courtship activities or when dolphins chase sharks.

WDP’s ultimate goal is to uncover the inherent structure and potential meaning within these natural sound sequences, searching for the grammatical rules and patterns that might signify a form of language.

This long-term, painstaking analysis has provided the essential grounding and labelled data crucial for training sophisticated AI models like DolphinGemma.

DolphinGemma: The AI ear for cetacean sounds

Analysing the sheer volume and complexity of dolphin communication is a formidable task ideally suited for AI.

DolphinGemma, developed by Google, employs specialised audio technologies to tackle this. It uses the SoundStream tokeniser to efficiently represent dolphin sounds, feeding this data into a model architecture adept at processing complex sequences.

Based on insights from Google’s Gemma family of lightweight, open models (which share technology with the powerful Gemini models), DolphinGemma functions as an audio-in, audio-out system.

Fed with sequences of natural dolphin sounds from WDP’s extensive database, DolphinGemma learns to identify recurring patterns and structures. Crucially, it can predict the likely subsequent sounds in a sequence—much like human language models predict the next word.

With around 400 million parameters, DolphinGemma is optimised to run efficiently, even on the Google Pixel smartphones WDP uses for data collection in the field.

As WDP begins deploying the model this season, it promises to accelerate research significantly. By automatically flagging patterns and reliable sequences previously requiring immense human effort to find, it can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication.

The CHAT system and two-way interaction

While DolphinGemma focuses on understanding natural communication, a parallel project explores a different avenue: active, two-way interaction.

The CHAT (Cetacean Hearing Augmentation Telemetry) system – developed by WDP in partnership with Georgia Tech – aims to establish a simpler, shared vocabulary rather than directly translating complex dolphin language.

The concept relies on associating specific, novel synthetic whistles (created by CHAT, distinct from natural sounds) with objects the dolphins enjoy interacting with, like scarves or seaweed. Researchers demonstrate the whistle-object link, hoping the dolphins’ natural curiosity leads them to mimic the sounds to request the items.

As more natural dolphin sounds are understood through work with models like DolphinGemma, these could potentially be incorporated into the CHAT interaction framework.

Google Pixel enables ocean research

Underpinning both the analysis of natural sounds and the interactive CHAT system is crucial mobile technology. Google Pixel phones serve as the brains for processing the high-fidelity audio data in real-time, directly in the challenging ocean environment.

The CHAT system, for instance, relies on Google Pixel phones to:

  • Detect a potential mimic amidst background noise.
  • Identify the specific whistle used.
  • Alert the researcher (via underwater bone-conducting headphones) about the dolphin’s ‘request’.

This allows the researcher to respond quickly with the correct object, reinforcing the learned association. While a Pixel 6 initially handled this, the next generation CHAT system (planned for summer 2025) will utilise a Pixel 9, integrating speaker/microphone functions and running both deep learning models and template matching algorithms simultaneously for enhanced performance.

Google Pixel 9 phone that will be used for the next generation DolphinGemma CHAT system.

Using smartphones like the Pixel dramatically reduces the need for bulky, expensive custom hardware. It improves system maintainability, lowers power requirements, and shrinks the physical size. Furthermore, DolphinGemma’s predictive power integrated into CHAT could help identify mimics faster, making interactions more fluid and effective.

Recognising that breakthroughs often stem from collaboration, Google intends to release DolphinGemma as an open model later this summer. While trained on Atlantic spotted dolphins, its architecture holds promise for researchers studying other cetaceans, potentially requiring fine-tuning for different species’ vocal repertoires..

The aim is to equip researchers globally with powerful tools to analyse their own acoustic datasets, accelerating the collective effort to understand these intelligent marine mammals. We are shifting from passive listening towards actively deciphering patterns, bringing the prospect of bridging the communication gap between our species perhaps just a little closer.

See also: IEA: The opportunities and challenges of AI for global energy

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post DolphinGemma: Google AI model understands dolphin chatter appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/dolphingemma-google-ai-model-understands-dolphin-chatter/feed/ 0
Deep Cogito open LLMs use IDA to outperform same size models https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/ https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/#respond Wed, 09 Apr 2025 08:03:15 +0000 https://www.artificialintelligence-news.com/?p=105246 Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence. The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each […]

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.

The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks”.

Impressively, the 70B model from Deep Cogito even surpasses the performance of the recently released Llama 4 109B Mixture-of-Experts (MoE) model.   

Iterated Distillation and Amplification (IDA)

Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). 

Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.

The IDA process involves two key steps iterated repeatedly:

  • Amplification: Using more computation to enable the model to derive better solutions or capabilities, akin to advanced reasoning techniques.
  • Distillation: Internalising these amplified capabilities back into the model’s parameters.

Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.

“When we study superintelligent systems,” the research notes, referencing successes like AlphaGo, “we find two key ingredients enabled this breakthrough: Advanced Reasoning and Iterative Self-Improvement”. IDA is presented as a way to integrate both into LLM training.

Deep Cogito claims IDA is efficient, stating the new models were developed by a small team in approximately 75 days. They also highlight IDA’s potential scalability compared to methods like Reinforcement Learning from Human Feedback (RLHF) or standard distillation from larger models.

As evidence, the company points to their 70B model outperforming Llama 3.3 70B (distilled from a 405B model) and Llama 4 Scout 109B (distilled from a 2T parameter model).

Capabilities and performance of Deep Cogito models

The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases.

A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.

Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes.

Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

For instance, the Cogito 70B model achieves 91.73% on MMLU in standard mode (+6.40% vs Llama 3.3 70B) and 91.00% in thinking mode (+4.40% vs Deepseek R1 Distill 70B). Livebench scores also show improvements.

Here are benchmarks of 14B models for a medium-sized comparison:

Benchmark comparison of medium 14B size large language models from Deep Cogito compared to Alibaba Qwen and DeepSeek R1

While acknowledging benchmarks don’t fully capture real-world utility, Deep Cogito expresses confidence in practical performance.

This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.

(Photo by Pietro Mattia)

See also: Alibaba Cloud targets global AI growth with new models and tools

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Deep Cogito open LLMs use IDA to outperform same size models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/deep-cogito-open-llms-use-ida-outperform-same-size-models/feed/ 0
Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/ https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/#respond Wed, 26 Mar 2025 17:17:26 +0000 https://www.artificialintelligence-news.com/?p=105017 Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date. The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks. According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 […]

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
Gemini 2.5 is being hailed by Google DeepMind as its “most intelligent AI model” to date.

The first model from this latest generation is an experimental version of Gemini 2.5 Pro, which DeepMind says has achieved state-of-the-art results across a wide range of benchmarks.

According to Koray Kavukcuoglu, CTO of Google DeepMind, the Gemini 2.5 models are “thinking models”.  This signifies their capability to reason through their thoughts before generating a response, leading to enhanced performance and improved accuracy.    

The capacity for “reasoning” extends beyond mere classification and prediction, Kavukcuoglu explains. It encompasses the system’s ability to analyse information, deduce logical conclusions, incorporate context and nuance, and ultimately, make informed decisions.

DeepMind has been exploring methods to enhance AI’s intelligence and reasoning capabilities for some time, employing techniques such as reinforcement learning and chain-of-thought prompting. This groundwork led to the recent introduction of their first thinking model, Gemini 2.0 Flash Thinking.    

“Now, with Gemini 2.5,” says Kavukcuoglu, “we’ve achieved a new level of performance by combining a significantly enhanced base model with improved post-training.”

Google plans to integrate these thinking capabilities directly into all of its future models—enabling them to tackle more complex problems and support more capable, context-aware agents.    

Gemini 2.5 Pro secures the LMArena leaderboard top spot

Gemini 2.5 Pro Experimental is positioned as DeepMind’s most advanced model for handling intricate tasks. As of writing, it has secured the top spot on the LMArena leaderboard – a key metric for assessing human preferences – by a significant margin, demonstrating a highly capable model with a high-quality style:

Screenshot of LMArena leaderboard where the new Gemini 2.5 Pro Experimental AI model from Google DeepMind has just taken the top spot.

Gemini 2.5 is a ‘pro’ at maths, science, coding, and reasoning

Gemini 2.5 Pro has demonstrated state-of-the-art performance across various benchmarks that demand advanced reasoning.

Notably, it leads in maths and science benchmarks – such as GPQA and AIME 2025 – without relying on test-time techniques that increase costs, like majority voting. It also achieved a state-of-the-art score of 18.8% on Humanity’s Last Exam, a dataset designed by subject matter experts to evaluate the human frontier of knowledge and reasoning.

DeepMind has placed significant emphasis on coding performance, and Gemini 2.5 represents a substantial leap forward compared to its predecessor, 2.0, with further improvements in the pipeline. 2.5 Pro excels in creating visually compelling web applications and agentic code applications, as well as code transformation and editing.

On SWE-Bench Verified, the industry standard for agentic code evaluations, Gemini 2.5 Pro achieved a score of 63.8% using a custom agent setup. The model’s reasoning capabilities also enable it to create a video game by generating executable code from a single-line prompt.

Building on its predecessors’ strengths

Gemini 2.5 builds upon the core strengths of earlier Gemini models, including native multimodality and a long context window. 2.5 Pro launches with a one million token context window, with plans to expand this to two million tokens soon. This enables the model to comprehend vast datasets and handle complex problems from diverse information sources, spanning text, audio, images, video, and even entire code repositories.    

Developers and enterprises can now begin experimenting with Gemini 2.5 Pro in Google AI Studio. Gemini Advanced users can also access it via the model dropdown on desktop and mobile platforms. The model will be rolled out on Vertex AI in the coming weeks.    

Google DeepMind encourages users to provide feedback, which will be used to further enhance Gemini’s capabilities.

(Photo by Anshita Nair)

See also: DeepSeek V3-0324 tops non-reasoning AI models in open-source first

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemini 2.5: Google cooks up its ‘most intelligent’ AI model to date appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemini-2-5-google-cooks-most-intelligent-ai-model-to-date/feed/ 0
Gemini 2.0: Google ushers in the agentic AI era  https://www.artificialintelligence-news.com/news/gemini-2-0-google-ushers-in-agentic-ai-era/ https://www.artificialintelligence-news.com/news/gemini-2-0-google-ushers-in-agentic-ai-era/#respond Wed, 11 Dec 2024 16:52:09 +0000 https://www.artificialintelligence-news.com/?p=16694 Google CEO Sundar Pichai has announced the launch of Gemini 2.0, a model that represents the next step in Google’s ambition to revolutionise AI. A year after introducing the Gemini 1.0 model, this major upgrade incorporates enhanced multimodal capabilities, agentic functionality, and innovative user tools designed to push boundaries in AI-driven technology. Leap towards transformational […]

The post Gemini 2.0: Google ushers in the agentic AI era  appeared first on AI News.

]]>
Google CEO Sundar Pichai has announced the launch of Gemini 2.0, a model that represents the next step in Google’s ambition to revolutionise AI.

A year after introducing the Gemini 1.0 model, this major upgrade incorporates enhanced multimodal capabilities, agentic functionality, and innovative user tools designed to push boundaries in AI-driven technology.

Leap towards transformational AI  

Reflecting on Google’s 26-year mission to organise and make the world’s information accessible, Pichai remarked, “If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful.”

Gemini 1.0, released in December 2022, was notable for being Google’s first natively multimodal AI model. The first iteration excelled at understanding and processing text, video, images, audio, and code. Its enhanced 1.5 version became widely embraced by developers for its long-context understanding, enabling applications such as the productivity-focused NotebookLM.

Now, with Gemini 2.0, Google aims to accelerate the role of AI as a universal assistant capable of native image and audio generation, better reasoning and planning, and real-world decision-making capabilities. In Pichai’s words, the development represents the dawn of an “agentic era.”

“We have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision,” Pichai explained.

Gemini 2.0: Core features and availability

At the heart of today’s announcement is the experimental release of Gemini 2.0 Flash, the flagship model of Gemini’s second generation. It builds upon the foundations laid by its predecessors while delivering faster response times and advanced performance.

Gemini 2.0 Flash supports multimodal inputs and outputs, including the ability to generate native images in conjunction with text and produce steerable text-to-speech multilingual audio. Additionally, users can benefit from native tool integration such as Google Search and even third-party user-defined functions.

Developers and businesses will gain access to Gemini 2.0 Flash via the Gemini API in Google AI Studio and Vertex AI, while larger model sizes are scheduled for broader release in January 2024.

For global accessibility, the Gemini app now features a chat-optimised version of the 2.0 Flash experimental model. Early adopters can experience this updated assistant on desktop and mobile, with a mobile app rollout imminent.

Products such as Google Search are also being enhanced with Gemini 2.0, unlocking the ability to handle complex queries like advanced math problems, coding enquiries, and multimodal questions.

Comprehensive suite of AI innovations  

The launch of Gemini 2.0 comes with compelling new tools that showcase its capabilities.

One such feature, Deep Research, functions as an AI research assistant, simplifying the process of investigating complex topics by compiling information into comprehensive reports. Another upgrade enhances Search with Gemini-enabled AI Overviews that tackle intricate, multi-step user queries.

The model was trained using Google’s sixth-generation Tensor Processing Units (TPUs), known as Trillium, which Pichai notes “powered 100% of Gemini 2.0 training and inference.”

Trillium is now available for external developers, allowing them to benefit from the same infrastructure that supports Google’s own advancements.

Pioneering agentic experiences  

Accompanying Gemini 2.0 are experimental “agentic” prototypes built to explore the future of human-AI collaboration, including:

  • Project Astra: A universal AI assistant

First introduced at I/O earlier this year, Project Astra taps into Gemini 2.0’s multimodal understanding to improve real-world AI interactions. Trusted testers have trialled the assistant on Android, offering feedback that has helped refine its multilingual dialogue, memory retention, and integration with Google tools like Search, Lens, and Maps. Astra has also demonstrated near-human conversational latency, with further research underway for its application in wearable technology, such as prototype AI glasses.

  • Project Mariner: Redefining web automation 

Project Mariner is an experimental web-browsing assistant that uses Gemini 2.0’s ability to reason across text, images, and interactive elements like forms within a browser. In initial tests, it achieved an 83.5% success rate on the WebVoyager benchmark for completing end-to-end web tasks. Early testers using a Chrome extension are helping to refine Mariner’s capabilities while Google evaluates safety measures that ensure the technology remains user-friendly and secure.

  • Jules: A coding agent for developers  

Jules, an AI-powered assistant built for developers, integrates directly into GitHub workflows to address coding challenges. It can autonomously propose solutions, generate plans, and execute code-based tasks—all under human supervision. This experimental endeavour is part of Google’s long-term goal to create versatile AI agents across various domains.

  • Gaming applications and beyond  

Extending Gemini 2.0’s reach into virtual environments, Google DeepMind is working with gaming partners like Supercell on intelligent game agents. These experimental AI companions can interpret game actions in real-time, suggest strategies, and even access broader knowledge via Search. Research is also being conducted into how Gemini 2.0’s spatial reasoning could support robotics, opening doors for physical-world applications in the future.

Addressing responsibility in AI development

As AI capabilities expand, Google emphasises the importance of prioritising safety and ethical considerations.

Google claims Gemini 2.0 underwent extensive risk assessments, bolstered by the Responsibility and Safety Committee’s oversight to mitigate potential risks. Additionally, its embedded reasoning abilities allow for advanced “red-teaming,” enabling developers to evaluate security scenarios and optimise safety measures at scale.

Google is also exploring safeguards to address user privacy, prevent misuse, and ensure AI agents remain reliable. For instance, Project Mariner is designed to prioritise user instructions while resisting malicious prompt injections, preventing threats like phishing or fraudulent transactions. Meanwhile, privacy controls in Project Astra make it easy for users to manage session data and deletion preferences.

Pichai reaffirmed the company’s commitment to responsible development, stating, “We firmly believe that the only way to build AI is to be responsible from the start.”

With the Gemini 2.0 Flash release, Google is edging closer to its vision of building a universal assistant capable of transforming interactions across domains.

See also: Machine unlearning: Researchers make AI models ‘forget’ data

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Gemini 2.0: Google ushers in the agentic AI era  appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/gemini-2-0-google-ushers-in-agentic-ai-era/feed/ 0
Google launches Veo and Imagen 3 generative AI models https://www.artificialintelligence-news.com/news/google-launches-veo-and-imagen-3-generative-ai-models/ https://www.artificialintelligence-news.com/news/google-launches-veo-and-imagen-3-generative-ai-models/#respond Tue, 03 Dec 2024 14:30:05 +0000 https://www.artificialintelligence-news.com/?p=16626 Google Cloud has launched two generative AI models on its Vertex AI platform, Veo and Imagen 3, amid reports of surging revenue growth among enterprises leveraging the technology. According to Google Cloud’s data, 86% of enterprise companies currently using generative AI in production environments have witnessed increased revenue, with an estimated average growth of 6%.  […]

The post Google launches Veo and Imagen 3 generative AI models appeared first on AI News.

]]>
Google Cloud has launched two generative AI models on its Vertex AI platform, Veo and Imagen 3, amid reports of surging revenue growth among enterprises leveraging the technology.

According to Google Cloud’s data, 86% of enterprise companies currently using generative AI in production environments have witnessed increased revenue, with an estimated average growth of 6%. 

This metric has driven the tech giant’s latest innovation push, resulting in the introduction of Veo – its most sophisticated video generation model to date – and Imagen 3, an advanced text-to-image generation system.

Breaking ground

Veo, now available in private preview on Vertex AI, represents a milestone as Google becomes the first hyperscaler to offer an image-to-video model. The technology enables businesses to generate high-quality videos from simple text or image prompts, potentially revolutionising video production workflows across industries.

Imagen 3 – scheduled for release to all Vertex AI customers next week – promises unprecedented realism in generated images, with marked improvements in detail, lighting, and artifact reduction. The model includes new features for enterprise customers on an allowlist, including advanced editing capabilities and brand customisation options.

Example images generated by the Imagen 3 generative AI (GenAI) model by Google, available on its Vertex AI platform.

Transforming operations

Several major firms have begun implementing these technologies into their operations.

Mondelez International, the company behind brands such as Oreo, Cadbury, and Chips Ahoy!, is using the technology to accelerate campaign content creation across its global portfolio of brands.

Jon Halvorson, SVP of Consumer Experience & Digital Commerce at Mondelez International, explained: “Our collaboration with Google Cloud has been instrumental in harnessing the power of generative AI, notably through Imagen 3, to revolutionise content production.

“This technology has enabled us to produce hundreds of thousands of customised assets, enhancing creative quality while significantly reducing both time to market and costs.”

Knowledge sharing platform Quora has developed Poe, a platform that enables users to interact with generative AI models. Veo and Imagen are now integrated with Poe.

Spencer Chan, Product Lead for Poe at Quora, commented: “We created Poe to democratise access to the world’s best gen AI models. With Veo, we’re now enabling millions of users to bring their ideas to life through stunning, high-quality generative video.”

Safety and security

In response to growing concerns about AI-generated content, Google has implemented robust safety features in both models. These include:

  • Digital watermarking through Google DeepMind’s SynthID.
  • Built-in safety filters to prevent harmful content creation.
  • Strict data governance policies ensure customer data protection.
  • Industry-first copyright indemnity for generative AI services.

The launch of these new models signals Google’s growing influence in the enterprise AI space and suggests a shift toward more sophisticated, integrated AI solutions for business applications.

(Imagery Credit: Google Cloud)

See also: Alibaba Marco-o1: Advancing LLM reasoning capabilities

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google launches Veo and Imagen 3 generative AI models appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/google-launches-veo-and-imagen-3-generative-ai-models/feed/ 0
New AI training techniques aim to overcome current challenges https://www.artificialintelligence-news.com/news/o1-model-llm-ai-openai-training-research-next-generation/ https://www.artificialintelligence-news.com/news/o1-model-llm-ai-openai-training-research-next-generation/#respond Thu, 28 Nov 2024 11:58:28 +0000 https://www.artificialintelligence-news.com/?p=16574 OpenAI and other leading AI companies are developing new training techniques to overcome limitations of current methods. Addressing unexpected delays and complications in the development of larger, more powerful language models, these fresh techniques focus on human-like behaviour to teach algorithms to ‘think. Reportedly led by a dozen AI researchers, scientists, and investors, the new […]

The post New AI training techniques aim to overcome current challenges appeared first on AI News.

]]>
OpenAI and other leading AI companies are developing new training techniques to overcome limitations of current methods. Addressing unexpected delays and complications in the development of larger, more powerful language models, these fresh techniques focus on human-like behaviour to teach algorithms to ‘think.

Reportedly led by a dozen AI researchers, scientists, and investors, the new training techniques, which underpin OpenAI’s recent ‘o1’ model (formerly Q* and Strawberry), have the potential to transform the landscape of AI development. The reported advances may influence the types or quantities of resources AI companies need continuously, including specialised hardware and energy to aid the development of AI models.

The o1 model is designed to approach problems in a way that mimics human reasoning and thinking, breaking down numerous tasks into steps. The model also utilises specialised data and feedback provided by experts in the AI industry to enhance its performance.

Since ChatGPT was unveiled by OpenAI in 2022, there has been a surge in AI innovation, and many technology companies claim existing AI models require expansion, be it through greater quantities of data or improved computing resources. Only then can AI models consistently improve.

Now, AI experts have reported limitations in scaling up AI models. The 2010s were a revolutionary period for scaling, but Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, says that the training of AI models, particularly in the understanding language structures and patterns, has levelled off.

“The 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again. Scaling the right thing matters more now,” they said.

In recent times, AI lab researchers have experienced delays in and challenges to developing and releasing large language models (LLM) that are more powerful than OpenAI’s GPT-4 model.

First, there is the cost of training large models, often running into tens of millions of dollars. And, due to complications that arise, like hardware failing due to system complexity, a final analysis of how these models run can take months.

In addition to these challenges, training runs require substantial amounts of energy, often resulting in power shortages that can disrupt processes and impact the wider electriciy grid. Another issue is the colossal amount of data large language models use, so much so that AI models have reportedly used up all accessible data worldwide.

Researchers are exploring a technique known as ‘test-time compute’ to improve current AI models when being trained or during inference phases. The method can involve the generation of multiple answers in real-time to decide on a range of best solutions. Therefore, the model can allocate greater processing resources to difficult tasks that require human-like decision-making and reasoning. The aim – to make the model more accurate and capable.

Noam Brown, a researcher at OpenAI who helped develop the o1 model, shared an example of how a new approach can achieve surprising results. At the TED AI conference in San Francisco last month, Brown explained that “having a bot think for just 20 seconds in a hand of poker got the same boosting performance as scaling up the model by 100,000x and training it for 100,000 times longer.”

Rather than simply increasing the model size and training time, this can change how AI models process information and lead to more powerful, efficient systems.

It is reported that other AI labs have been developing versions of the o1 technique. The include xAI, Google DeepMind, and Anthropic. Competition in the AI world is nothing new, but we could see a significant impact on the AI hardware market as a result of new techniques. Companies like Nvidia, which currently dominates the supply of AI chips due to the high demand for their products, may be particularly affected by updated AI training techniques.

Nvidia became the world’s most valuable company in October, and its rise in fortunes can be largely attributed to its chips’ use in AI arrays. New techniques may impact Nvidia’s market position, forcing the company to adapt its products to meet the evolving AI hardware demand. Potentially, this could open more avenues for new competitors in the inference market.

A new age of AI development may be on the horizon, driven by evolving hardware demands and more efficient training methods such as those deployed in the o1 model. The future of both AI models and the companies behind them could be reshaped, unlocking unprecedented possibilities and greater competition.

See also: Anthropic urges AI regulation to avoid catastrophes

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, a

The post New AI training techniques aim to overcome current challenges appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/o1-model-llm-ai-openai-training-research-next-generation/feed/ 0
Google announces restructuring to accelerate AI initiatives https://www.artificialintelligence-news.com/news/google-announces-restructuring-accelerate-ai-initiatives/ https://www.artificialintelligence-news.com/news/google-announces-restructuring-accelerate-ai-initiatives/#respond Fri, 18 Oct 2024 15:50:30 +0000 https://www.artificialintelligence-news.com/?p=16328 Google CEO Sundar Pichai has announced a series of structural changes and leadership appointments aimed at accelerating the company’s AI initiatives. The restructuring sees the Gemini app team, led by Sissie Hsiao, joining Google DeepMind under the leadership of Demis Hassabis. “Bringing the teams closer together will improve feedback loops, enable fast deployment of our […]

The post Google announces restructuring to accelerate AI initiatives appeared first on AI News.

]]>
Google CEO Sundar Pichai has announced a series of structural changes and leadership appointments aimed at accelerating the company’s AI initiatives.

The restructuring sees the Gemini app team, led by Sissie Hsiao, joining Google DeepMind under the leadership of Demis Hassabis.

“Bringing the teams closer together will improve feedback loops, enable fast deployment of our new models in the Gemini app, make our post-training work proceed more efficiently and build on our great product momentum,” Pichai explained.

Additionally, the Assistant teams focusing on devices and home experiences will be integrated into the Platforms & Devices division. This reorganisation aims to align these teams more closely with the product surfaces they are developing for and consolidate AI smart home initiatives at Google under one umbrella.

Prabhakar Raghavan, a 12-year Google veteran, will transition from his current role to become the Chief Technologist at Google. Pichai praised Raghavan’s contributions, highlighting his leadership across various divisions including Research, Workspace, Ads, and Knowledge & Information (K&I).

“Prabhakar’s leadership journey at Google has been remarkable,” Pichai noted. “He led the Gmail team in launching Smart Reply and Smart Compose as early examples of using AI to improve products, and took Gmail and Drive past one billion users.”

Taking the helm of the K&I division will be Nick Fox, a long-standing Googler and member of Raghavan’s leadership team. Fox’s appointment as SVP of K&I comes on the back of his extensive experience across various facets of the company, including Product and Design in Search and Assistant, as well as Shopping, Travel, and Payments products.

“Nick has been instrumental in shaping Google’s AI product roadmap and collaborating closely with Prabhakar and his leadership team on K&I’s strategy,” comments Pichai. “I frequently turn to Nick to tackle our most challenging product questions and he consistently delivers progress with tenacity, speed, and optimism.”

The restructuring comes amid a flurry of AI-driven innovations across Google’s product lineup. Recent developments include the viral success of NotebookLM with Audio Overviews, enhancements to information discovery in Search and Lens, the launch of a revamped Google Shopping platform tailored for the AI era, advancements like AlphaProteo that could revolutionise protein design, and updates to the Gemini family of models.

Pichai also highlighted a significant milestone in Google’s healthcare AI initiatives, revealing that their AI system for detecting diabetic retinopathy has conducted 600,000 screenings to date. The company plans to expand access to this technology across India and Thailand.

“AI moves faster than any technology before it. To keep increasing the pace of progress, we’ve been making shifts to simplify our structures along the way,” Pichai explained.

(Photo by Mitchell Luo)

See also: Telefónica’s Wayra backs AI answer engine Perplexity

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Google announces restructuring to accelerate AI initiatives appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/google-announces-restructuring-accelerate-ai-initiatives/feed/ 0
AlphaProteo: Google DeepMind unveils protein design system https://www.artificialintelligence-news.com/news/alphaproteo-google-deepmind-protein-design-system/ https://www.artificialintelligence-news.com/news/alphaproteo-google-deepmind-protein-design-system/#respond Fri, 06 Sep 2024 14:55:36 +0000 https://www.artificialintelligence-news.com/?p=15994 Google DeepMind has unveiled an AI system called AlphaProteo that can design novel proteins that successfully bind to target molecules, potentially revolutionising drug design and disease research. AlphaProteo can generate new protein binders for diverse target proteins, including VEGF-A, which is associated with cancer and diabetes complications. Notably, this is the first time an AI […]

The post AlphaProteo: Google DeepMind unveils protein design system appeared first on AI News.

]]>
Google DeepMind has unveiled an AI system called AlphaProteo that can design novel proteins that successfully bind to target molecules, potentially revolutionising drug design and disease research.

AlphaProteo can generate new protein binders for diverse target proteins, including VEGF-A, which is associated with cancer and diabetes complications. Notably, this is the first time an AI tool has successfully designed a protein binder for VEGF-A.

The system’s performance is particularly impressive, achieving higher experimental success rates and binding affinities that are up to 300 times better than existing methods across seven target proteins tested:

Chart demonstrating Google DeepMind's AlphaProteo success rate
(Credit: Google DeepMind)

Trained on vast amounts of protein data from the Protein Data Bank and over 100 million predicted structures from AlphaFold, AlphaProteo has learned the intricacies of molecular binding. Given the structure of a target molecule and preferred binding locations, the system generates a candidate protein designed to bind at those specific sites.

To validate AlphaProteo’s capabilities, the team designed binders for a diverse range of target proteins, including viral proteins involved in infection and proteins associated with cancer, inflammation, and autoimmune diseases. The results were promising, with high binding success rates and best-in-class binding strengths observed across the board.

For instance, when targeting the viral protein BHRF1, 88% of AlphaProteo’s candidate molecules bound successfully in wet lab testing. On average, AlphaProteo binders exhibited 10 times stronger binding than the best existing design methods across the targets tested.

The system’s performance suggests it could significantly reduce the time required for initial experiments involving protein binders across a wide range of applications. However, the team acknowledges that AlphaProteo has limitations, as it was unable to design successful binders against TNFɑ (a protein associated with autoimmune diseases like rheumatoid arthritis.)

To ensure responsible development, Google DeepMind is collaborating with external experts to inform their phased approach to sharing this work and contributing to community efforts in developing best practices—including the NTI’s new AI Bio Forum.

As the technology evolves, the team plans to work with the scientific community to leverage AlphaProteo on impactful biology problems and understand its limitations. They are also exploring drug design applications at Isomorphic Labs.

While AlphaProteo represents a significant step forward in protein design, achieving strong binding is typically just the first step in designing proteins for practical applications. There remain many bioengineering challenges to overcome in the research and development process.

Nevertheless, Google DeepMind’s advancement holds tremendous potential for accelerating progress across a broad spectrum of research, including drug development, cell and tissue imaging, disease understanding and diagnosis, and even crop resistance to pests.

You can find the full AlphaProteo whitepaper here (PDF)

See also: Paige and Microsoft unveil next-gen AI models for cancer diagnosis

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post AlphaProteo: Google DeepMind unveils protein design system appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/alphaproteo-google-deepmind-protein-design-system/feed/ 0
As AI improves, what does it mean for user-generated content? https://www.artificialintelligence-news.com/news/as-ai-improves-what-does-it-mean-for-user-generated-content/ https://www.artificialintelligence-news.com/news/as-ai-improves-what-does-it-mean-for-user-generated-content/#respond Thu, 22 Aug 2024 14:18:46 +0000 https://www.artificialintelligence-news.com/?p=15807 The rise of the creator economy was one of the most disruptive forces to emerge from the internet, paving the way for independent writers, artists, musicians, podcasters, YouTubers and social media influencers to connect with audiences directly and earn money from doing so.  Creators have flocked to platforms such as Facebook, Instagram, Vimeo, Substack, TikTok […]

The post As AI improves, what does it mean for user-generated content? appeared first on AI News.

]]>
The rise of the creator economy was one of the most disruptive forces to emerge from the internet, paving the way for independent writers, artists, musicians, podcasters, YouTubers and social media influencers to connect with audiences directly and earn money from doing so. 

Creators have flocked to platforms such as Facebook, Instagram, Vimeo, Substack, TikTok and more, where they can not only create but also publish and share their user-generated content. Social media enables individuals to become self-publishers and independent producers of content, disrupting existing business models and enabling an entire generation of creative minds to establish their own path to success. 

Until recently, the creativity such individuals express was always thought to be a uniquely human quality and therefore invulnerable to disruption by advancing technology. However, the rise of generative AI, which comes so soon after the emergence of the creator economy, threatens to disrupt this nascent industry and significantly alter the way new content is produced. With generative AI models, anyone can churn out paragraphs of text, lines of software code, high quality images, audio, video and more, using simple prompts. 

How does AI aid with user-generated content?

Generative AI burst into the public consciousness with the arrival of ChatGPT in late 2022, taking the internet by storm, and since then tech companies have rushed to create all manner of consumer-friendly applications that can aid in content creation. 

For instance there’s ChatGPT itself, which is all about text-generation, capable of writing blog posts, essays, marketing copy, email pitches, documents and more, based on a simple prompt where the user tells it what to write. 

More impressive forms of content generation include image generating models such as Midjourney, which can create dramatic pictures based on user’s ideas of what they want to see, and there are now even video generators, such as OpenAI’s Sora, Google DeepMind’s Veo and Runway that can do the same. 

Generative AI is also having an impact on video game content generation. Take the novel technology developed by AMGI Studios for its hit Web3 game My Pet Hooligan, which uses proprietary motion capture and AI algorithms to capture the gamer’s facial expressions and replicate them on their in-game avatars. It further uses generative AI to provide each user character (which is a unique NFT) with its own distinctive personality that users can learn about through a chat interface. 

Other ways people use generative AI to enhance creativity include Buzzfeed’s personalized content creation tools, which enable users to quickly create customized quizzes tailored to each individual, and its generative AI recipe creator, which can serve up ideas for meals based on whatever the user has in the fridge. 

Three ways this can go

In the eyes of some, AI-generated content has emerged as a major threat to user-generated content, but not everyone sees it that way. It’s unclear what kind of impact generative AI will ultimately have on the creator economy, but there are a number of possible scenarios that may unfold. 

Scenario 1: AI enhances creativity

In the first scenario, it’s possible to imagine a world in which there’s an explosion of AI-assisted innovation, in which content creators themselves adopt AI to improve their performance and productivity. For instance, designers can use AI to quickly generate basic ideas and outlines, before using their human expertise to fine-tune those creations, be it a logo or a product design or something else. Rather than replace designers entirely, generative AI simply becomes a tool that they use to improve their output and get more work done. 

An example of this is GitHub’s coding assistant Copilot, which is a generative AI tool that acts as a kind of programming assistant, helping developers to generate code. It doesn’t replace their role entirely, but simply assists them in generating code snippets – such as the lines of code required to program an app to perform standard actions. But the developer is the one who oversees this and uses his creativity to design all of the intricacies of the app. 

AMGI’s in-game content generation tools are another example of how AI augments human creativity, creating unique in-game characters and situations that are ultimately based on the user’s actions. 

Such a scenario isn’t a threat to creative workers and user-generated content. Rather than taking people’s jobs, AI will simply support the people who do those jobs and make them better at it. They’ll be able to work faster and more efficiently, getting more work done in shorter time frames, spending more of their time prompting the AI tools they use and editing their outputs. It will enable creative projects to move forward much faster, accelerating innovation. 

Scenario 2: AI monopolises creativity 

A more dystopian scenario is the one where algorithmic models leverage their unfair advantage to totally dominate the world of content creation. It’s a future where human designers, writers, coders and perhaps even highly skilled professionals like physicists are drowned out by AI models that can not only work faster, but at much lower costs than humans can.

From a business perspective, if they can replace costly human creators with cheap and cheerful AI, that’s great, translating to more profitability. But there are concerns, not only for the humans that lose their livelihoods, but also on the impact of creativity itself. 

As impressive as generative AI-created content sometimes is, the outputs of these algorithms are all based on existing content – namely the data they’re trained on. Most AI models have a habit of regurgitating similar content. Take an AI writer that always seems to write prose in the same, instantly recognizable and impersonal way, or AI image generators that constantly churn images with the same aesthetic

An even more alarming example of this is the AI music generators Suno and Uncharted Labs, whose tools are said to have been trained on millions of music videos posted on YouTube. Musicians represented by the Recording Industry Association of America recently filed lawsuits against those companies, accusing them of copyright infringement. Their evidence? Numerous examples of supposedly original songs that sound awfully familiar to existing ones created by humans. 

For instance, the lawsuit describes a song generated using Suno, called “Deep down in Louisiana close to New Orle” which seems to mirror the lyrics and style of Chuck Berry’s “Johnny B. Goode.” It also highlights a second track, “Prancing Queen” that seems to be a blatant rip off of the ABBA hit “Dancing Queen.”

These examples raise questions over AI’s ability to create truly original content. If AI were to monopolise creativity, it could result in true innovation and creativity screeching to a halt, leading to a future that’s sterile and bland. 

Scenario 3: Human creativity stands out

Given AI’s lack of true authenticity and originality, a third possible way this could play out is that there is a kind of backlash against it. With consumers being overwhelmed by a sea of mundane, synthetic imagery and prose, those with an eye for flair will likely be able to identify true, human creativity and pay a premium for that content. After all, humans have always shown a preference for true originality, and such a scenario could well play into the hands of the most talented content creators. 

It’s a future where being human gives creators a competitive edge over their algorithmic rivals, with their unparalleled ability to come up with truly original ideas setting their work apart. Human culture, fashions and trends seem to evolve faster than generative AI models are created, and that means that the most original thinkers will always be one step ahead. It’s a more reassuring future where humans will continue to create and be rewarded for their work, and where machines will only ever be able to copy and iterate on existing ideas.  

This is perhaps the most likely scenario and, reassuringly, it means there will always be a need for humans in the mix. Humans, after all, are characterised by their creativity – everything that exists in the modern world today was created by someone, whether it’s the shoes on your feet, the device you’re reading this article with, or the language you speak. They’re all human creations, inspired by original ideas rooted in the human brain, and humans – especially those who find AI can do their jobs for them – will have more time to sit and think and potentially come up with even better ideas than the ones we’ve had so far. 

The post As AI improves, what does it mean for user-generated content? appeared first on AI News.

]]>
https://www.artificialintelligence-news.com/news/as-ai-improves-what-does-it-mean-for-user-generated-content/feed/ 0