copyright Archives - AI News

Tony Blair Institute AI copyright report sparks backlash

Ryan Daws — Wed, 02 Apr 2025 11:04:11 +0000

The Tony Blair Institute (TBI) has released a report calling for the UK to lead in navigating the complex intersection of arts and AI.

According to the report, titled ‘Rebooting Copyright: How the UK Can Be a Global Leader in the Arts and AI,’ the global race for cultural and technological leadership is still up for grabs, and the UK has a golden opportunity to take the lead.

The report emphasises that countries that “embrace change and harness the power of artificial intelligence in creative ways will set the technical, aesthetic, and regulatory standards for others to follow.”

Highlighting that we are in the midst of another revolution in media and communication, the report notes that AI is disrupting how textual, visual, and auditive content is created, distributed, and experienced, much like the printing press, gramophone, and camera did before it.

“AI will usher in a new era of interactive and bespoke works, as well as a counter-revolution that celebrates everything that AI can never be,” the report states.

However, far from signalling the end of human creativity, the TBI suggests AI will open up “new ways of being original.”

The AI revolution’s impact isn’t limited to the creative industries; it’s being felt across all areas of society. Scientists are using AI to accelerate discoveries, healthcare providers are employing it to analyse X-ray images, and emergency services utilise it to locate houses damaged by earthquakes.

The report stresses that these cross-industry advancements are just the beginning, with future AI systems set to become increasingly capable, fuelled by advancements in computing power, data, model architectures, and access to talent.

The UK government has expressed its ambition to be a global leader in AI through its AI Opportunities Action Plan, announced by Prime Minister Keir Starmer on 13 January 2025. For its part, the TBI welcomes the UK government’s ambition, stating that “if properly designed and deployed, AI can make human lives healthier, safer, and more prosperous.”

However, the rapid spread of AI across sectors raises urgent policy questions, particularly concerning the data used for AI training. The application of UK copyright law to the training of AI models is currently contested, with the debate often framed as a “zero-sum game” between AI developers and rights holders. The TBI argues that this framing “misrepresents the nature of the challenge and the opportunity before us.”

The report emphasises that “bold policy solutions are needed to provide all parties with legal clarity and unlock investments that spur innovation, job creation, and economic growth.”

According to the TBI, AI presents opportunities for creators—noting its use in various fields from podcasts to filmmaking. The report draws parallels with past technological innovations – such as the printing press and the internet – which were initially met with resistance, but ultimately led to societal adaptation and human ingenuity prevailing.

The TBI proposes that the solution lies not in clinging to outdated copyright laws but in allowing them to “co-evolve with technological change” to remain effective in the age of AI.

The UK government has proposed a text and data mining exception with an opt-out option for rights holders. While the TBI views this as a good starting point for balancing stakeholder interests, it acknowledges the “significant implementation and enforcement challenges” that come with it, spanning legal, technical, and geopolitical dimensions.

In the report, the Tony Blair Institute for Global Change “assesses the merits of the UK government’s proposal and outlines a holistic policy framework to make it work in practice.”

The report includes recommendations and examines novel forms of art that will emerge from AI. It also delves into the disagreement between rights holders and developers on copyright, the wider implications of copyright policy, and the serious hurdles the UK’s text and data mining proposal faces.

Furthermore, the Tony Blair Institute explores the challenges of governing an opt-out policy, implementation problems with opt-outs, making opt-outs useful and accessible, and tackling the diffusion problem. AI summaries and the problems they present regarding identity are also addressed, along with defensive tools as a partial solution and solving licensing problems.

The report also seeks to clarify the standards on human creativity, address digital watermarking, and discuss the uncertainty around the impact of generative AI on the industry. It proposes establishing a Centre for AI and the Creative Industries and discusses the risk of judicial review, the benefits of a remuneration scheme, and the advantages of a targeted levy on ISPs to raise funding for the Centre.

However, the report has faced strong criticism. Ed Newton-Rex, CEO of Fairly Trained, raised several concerns on Bluesky. These concerns include:

The report repeats the “misleading claim” that existing UK copyright law is uncertain, which Newton-Rex asserts is not the case.
The suggestion that an opt-out scheme would give rights holders more control over how their works are used is misleading. Newton-Rex argues that licensing is currently required by law, so moving to an opt-out system would actually decrease control, as some rights holders will inevitably miss the opt-out.
The report likens machine learning (ML) training to human learning, a comparison that Newton-Rex finds shocking, given the vastly different scalability of the two.
The report’s claim that AI developers won’t make long-term profits from training on people’s work is questioned, with Newton-Rex pointing to the significant funding raised by companies like OpenAI.
Newton-Rex suggests the report uses strawman arguments, such as stating that generative AI may not replace all human paid activities.
A key criticism is that the report omits data showing how generative AI replaces demand for human creative labour.
Newton-Rex also criticises the report’s proposed solutions, specifically the suggestion to set up an academic centre, which he notes “no one has asked for.”
Furthermore, he highlights the proposal to tax every household in the UK to fund this academic centre, arguing that this would place the financial burden on consumers rather than the AI companies themselves, and the revenue wouldn’t even go to creators.

Adding to these criticisms, British novelist and author Jonathan Coe noted that “the five co-authors of this report on copyright, AI, and the arts are all from the science and technology sectors. Not one artist or creator among them.”

While the report from Tony Blair Institute for Global Change supports the government’s ambition to be an AI leader, it also raises critical policy questions—particularly around copyright law and AI training data.

(Photo by Jez Timms)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Tony Blair Institute AI copyright report sparks backlash appeared first on AI News.

Study claims OpenAI trains AI models on copyrighted data

Ryan Daws — Wed, 02 Apr 2025 09:04:28 +0000

A new study from the AI Disclosures Project has raised questions about the data OpenAI uses to train its large language models (LLMs). The research indicates the GPT-4o model from OpenAI demonstrates a “strong recognition” of paywalled and copyrighted data from O’Reilly Media books.

The AI Disclosures Project, led by technologist Tim O’Reilly and economist Ilan Strauss, aims to address the potentially harmful societal impacts of AI’s commercialisation by advocating for improved corporate and technological transparency. The project’s working paper highlights the lack of disclosure in AI, drawing parallels with financial disclosure standards and their role in fostering robust securities markets.

The study used a legally-obtained dataset of 34 copyrighted O’Reilly Media books to investigate whether LLMs from OpenAI were trained on copyrighted data without consent. The researchers applied the DE-COP membership inference attack method to determine if the models could differentiate between human-authored O’Reilly texts and paraphrased LLM versions.

Key findings from the report include:

GPT-4o shows “strong recognition” of paywalled O’Reilly book content, with an AUROC score of 82%. In contrast, OpenAI’s earlier model, GPT-3.5 Turbo, does not show the same level of recognition (AUROC score just above 50%)

GPT-4o exhibits stronger recognition of non-public O’Reilly book content compared to publicly accessible samples (82% vs 64% AUROC scores respectively)

GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples than non-public ones (64% vs 54% AUROC scores)

GPT-4o Mini, a smaller model, showed no knowledge of public or non-public O’Reilly Media content when tested (AUROC approximately 50%)

The researchers suggest that access violations may have occurred via the LibGen database, as all of the O’Reilly books tested were found there. They also acknowledge that newer LLMs have an improved ability to distinguish between human-authored and machine-generated language, which does not reduce the method’s ability to classify data.

The study highlights the potential for “temporal bias” in the results, due to language changes over time. To account for this, the researchers tested two models (GPT-4o and GPT-4o Mini) trained on data from the same period.

The report notes that while the evidence is specific to OpenAI and O’Reilly Media books, it likely reflects a systemic issue around the use of copyrighted data. It argues that uncompensated training data usage could lead to a decline in the internet’s content quality and diversity, as revenue streams for professional content creation diminish.

The AI Disclosures Project emphasises the need for stronger accountability in AI companies’ model pre-training processes. They suggest that liability provisions that incentivise improved corporate transparency in disclosing data provenance may be an important step towards facilitating commercial markets for training data licensing and remuneration.

The EU AI Act’s disclosure requirements could help trigger a positive disclosure-standards cycle if properly specified and enforced. Ensuring that IP holders know when their work has been used in model training is seen as a crucial step towards establishing AI markets for content creator data.

Despite evidence that AI companies may be obtaining data illegally for model training, a market is emerging in which AI model developers pay for content through licensing deals. Companies like Defined.ai facilitate the purchasing of training data, obtaining consent from data providers and stripping out personally identifiable information.

The report concludes by stating that using 34 proprietary O’Reilly Media books, the study provides empirical evidence that OpenAI likely trained GPT-4o on non-public, copyrighted data.

(Image by Sergei Tokmakov)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Study claims OpenAI trains AI models on copyrighted data appeared first on AI News.

Meta accused of using pirated data for AI development

Ryan Daws — Fri, 10 Jan 2025 12:16:52 +0000

Plaintiffs in the case of Kadrey et al. vs. Meta have filed a motion alleging the firm knowingly used copyrighted works in the development of its AI models.

The plaintiffs, which include author Richard Kadrey, filed their “Reply in Support of Plaintiffs’ Motion for Leave to File Third Amended Consolidated Complaint” in the United States District Court in the Northern District of California.

The filing accuses Meta of systematically torrenting and stripping copyright management information (CMI) from pirated datasets, including works from the notorious shadow library LibGen.

According to documents recently submitted to the court, evidence reveals highly incriminating practices involving Meta’s senior leaders. Plaintiffs allege that Meta CEO Mark Zuckerberg gave explicit approval for the use of the LibGen dataset, despite internal concerns raised by the company’s AI executives.

A December 2024 memo from internal Meta discussions acknowledged LibGen as “a dataset we know to be pirated,” with debates arising about the ethical and legal ramifications of using such materials. Documents also revealed that top engineers hesitated to torrent the datasets, citing concerns about using corporate laptops for potentially unlawful activities.

Additionally, internal communications suggest that after acquiring the LibGen dataset, Meta stripped CMI from the copyrighted works contained within—a practice that plaintiffs highlight as central to claims of copyright infringement.

According to the deposition of Michael Clark – a corporate representative for Meta – the company implemented scripts designed to remove any information identifying these works as copyrighted, including keywords like “copyright,” “acknowledgements,” or lines commonly used in such texts. Clark attested that this practice was done intentionally to prepare the dataset for training Meta’s Llama AI models.

“Doesn’t feel right”

The allegations against Meta paint a portrait of a company knowingly partaking in a widespread piracy scheme facilitated through torrenting.

According to a string of emails included as exhibits, Meta engineers expressed concerns about the optics of torrenting pirated datasets from within corporate spaces. One engineer noted that “torrenting from a [Meta-owned] corporate laptop doesn’t feel right,” but despite hesitation, the rapid downloading and distribution – or “seeding” – of pirated data took place.

Legal counsel for the plaintiffs has stated that as late as January 2024, Meta had “already torrented (both downloaded and distributed) data from LibGen.” Moreover, records show that hundreds of related documents were initially obtained by Meta months prior but were withheld during early discovery processes. Plaintiffs argue this delayed disclosure amounts to bad-faith attempts by Meta to obstruct access to vital evidence.

During a deposition on 17 December 2024, Zuckerberg himself reportedly admitted that such activities would raise “lots of red flags” and stated it “seems like a bad thing,” though he provided limited direct responses regarding Meta’s broader AI training practices.

This case originally began as an intellectual property infringement action on behalf of authors and publishers claiming violations relating to AI use of their materials. However, the plaintiffs are now seeking to add two major claims to their suit: a violation of the Digital Millennium Copyright Act (DMCA) and a breach of the California Comprehensive Data Access and Fraud Act (CDAFA).

Under the DMCA, the plaintiffs assert that Meta knowingly removed copyright protections to conceal unauthorised uses of copyrighted texts in its Llama models.

As cited in the complaint, Meta allegedly stripped CMI “to reduce the chance that the models will memorise this data” and that this removal of rights management indicators made discovering the infringement more difficult for copyright holders.

The CDAFA allegations involve Meta’s methods for obtaining the LibGen dataset, including allegedly engaging in torrenting to acquire copyrighted datasets without permission. Internal documentation shows Meta engineers openly discussed concerns that seeding and torrenting might prove to be “legally not ok.”

Meta case may impact emerging legislation around AI development

At the heart of this expanding legal battle lies growing concern over the intersection of copyright law and AI.

Plaintiffs argue the stripping of copyright protections from textual datasets denies rightful compensation to copyright owners and allows Meta to build AI systems like Llama on the financial ruins of authors’ and publishers’ creative efforts.

The timing of these allegations arises amidst heightened global scrutiny surrounding “generative AI” technologies. Companies like OpenAI, Google, and Meta have all come under fire regarding the use of copyrighted data to train their models. Courts across jurisdictions are currently grappling with the long-term impact of AI on rights management, with potentially landmark cases being decided in both the US and the UK.

In this particular case, US courts have shown increasing willingness to hear complaints about AI’s potential harm to long-established copyright law precedents. Plaintiffs, in their motion, referred to The Intercept Media v. OpenAI, a recent decision from New York in which a similar DMCA claim was allowed to proceed.

Meta continues to deny all allegations in the case and has yet to publicly respond to Zuckerberg’s reported deposition statements.

Whether or not plaintiffs succeed in these amendments, authors across the world face growing anxieties about how their creative works are handled within the context of AI. With copyright law struggling to keep pace with technological advances, this case underscores the need for clearer guidance at an international level to protect both creators and innovators.

For Meta, these claims also represent a reputational risk. As AI becomes the central focus of its future strategy, the allegations of reliance on pirated libraries are unlikely to help its ambitions of maintaining leadership in the field.

The unfolding case of Kadrey et al. vs. Meta could have far-reaching ramifications for the development of AI models moving forward, potentially setting legal precedents in the US and beyond.

(Photo by Amy Syiek)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Meta accused of using pirated data for AI development appeared first on AI News.

Coalition of news publishers sue Microsoft and OpenAI

Ryan Daws — Wed, 01 May 2024 13:21:44 +0000

A coalition of major news publishers has filed a lawsuit against Microsoft and OpenAI, accusing the tech giants of unlawfully using copyrighted articles to train their generative AI models without permission or payment.

First reported by The Verge, the group of eight publications owned by Alden Global Capital (AGC) – including the Chicago Tribune, New York Daily News, and Orlando Sentinel – allege the companies have purloined “millions” of their articles without permission and without payment “to fuel the commercialisation of their generative artificial intelligence products, including ChatGPT and Copilot.”

The lawsuit is the latest legal action taken against Microsoft and OpenAI over their alleged misuse of copyrighted content to build large language models (LLMs) that power AI technologies like ChatGPT. In the complaint, the AGC publications claim the companies’ chatbots can reproduce their articles verbatim shortly after publication, without providing prominent links back to the original sources.

“This lawsuit is not a battle between new technology and old technology. It is not a battle between a thriving industry and an industry in transition. It is most surely not a battle to resolve the phalanx of social, political, moral, and economic issues that GenAI raises,” the complaint reads.

“This lawsuit is about how Microsoft and OpenAI are not entitled to use copyrighted newspaper content to build their new trillion-dollar enterprises without paying for that content.”

The plaintiffs also accuse the AI models of “hallucinations,” attributing inaccurate reporting to their publications. They reference OpenAI’s previous admission that it would be “impossible” to train today’s leading AI models without using copyrighted materials.

The allegations echo those made by The New York Times in a separate lawsuit filed last year. The Times claimed Microsoft and OpenAI used almost a century’s worth of copyrighted content to allow their AI to mimic its expressive style without a licensing agreement.

In seeking to dismiss key parts of the Times’ lawsuit, Microsoft accused the paper of “doomsday futurology” by suggesting generative AI could threaten independent journalism.

The AGC publications argue that OpenAI, now valued at $90 billion after becoming a for-profit company, and Microsoft – which has seen hundreds of billions of dollars added to its market value from ChatGPT and Copilot – are profiting from the unauthorised use of copyrighted works.

The news publishers are seeking unspecified damages and an order for Microsoft and OpenAI to destroy any GPT and LLM models utilising their copyrighted content.

Earlier this week, OpenAI signed a licensing partnership with The Financial Times to lawfully integrate the newspaper’s journalism. However, the latest lawsuit from AGC highlights the growing tensions between tech companies developing generative AI and content creators concerned about the unchecked use of their works to train profitable AI systems.

(Photo by Wesley Tingey)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Coalition of news publishers sue Microsoft and OpenAI appeared first on AI News.

FT and OpenAI ink partnership amid web scraping criticism

Ryan Daws — Mon, 29 Apr 2024 15:57:06 +0000

The Financial Times and OpenAI have announced a strategic partnership and licensing agreement that will integrate the newspaper’s journalism into ChatGPT and collaborate on developing new AI products for FT readers. However, just because OpenAI is cozying up to publishers doesn’t mean it’s not still scraping information from the web without permission.

Through the deal, ChatGPT users will be able to see selected attributed summaries, quotes, and rich links to FT journalism in response to relevant queries. Additionally, the FT became a customer of ChatGPT Enterprise earlier this year, providing access for all employees to familiarise themselves with the technology and benefit from its potential productivity gains.

“This is an important agreement in a number of respects,” said John Ridding, FT Group CEO. “It recognises the value of our award-winning journalism and will give us early insights into how content is surfaced through AI.”

In 2023, technology companies faced numerous lawsuits and widespread criticism for allegedly using copyrighted material from artists and publishers to train their AI models without proper authorisation.

OpenAI, in particular, drew significant backlash for training its GPT models on data obtained from the internet without obtaining consent from the respective content creators. This issue escalated to the point where The New York Times filed a lawsuit against OpenAI and Microsoft last year, accusing them of copyright infringement.

While emphasising the FT’s commitment to human journalism, Ridding noted the agreement would broaden the reach of its newsroom’s work while deepening the understanding of reader interests.

“Apart from the benefits to the FT, there are broader implications for the industry. It’s right, of course, that AI platforms pay publishers for the use of their material. OpenAI understands the importance of transparency, attribution, and compensation – all essential for us,” explained Ridding.

Earlier this month, The New York Times reported that OpenAI was utilising scripts from YouTube videos to train its AI models. According to the publication, this practice violates copyright laws, as content creators who upload videos to YouTube retain the copyright ownership of the material they produce.

However, OpenAI maintains that its use of online content falls under the fair use doctrine. The company, along with numerous other technology firms, argues that their large language models (LLMs) transform the information gathered from the internet into an entirely new and distinct creation.

In January, OpenAI asserted to a UK parliamentary committee that it would be “impossible” to develop today’s leading AI systems without using vast amounts of copyrighted data.

Brad Lightcap, COO of OpenAI, expressed his enthusiasm about the FT partnership: “Our partnership and ongoing dialogue with the FT is about finding creative and productive ways for AI to empower news organisations and journalists, and enrich the ChatGPT experience with real-time, world-class journalism for millions of people around the world.”

This agreement between OpenAI and the Financial Times is the most recent in a series of new collaborations that OpenAI has forged with major news publishers worldwide.

While the financial details of these contracts were not revealed, OpenAI’s recent partnerships with publishers will enable the company to continue training its algorithms on web content, but with the crucial difference being that it now has obtained the necessary permissions to do so.

Ridding said the FT values “the opportunity to be inside the development loop as people discover content in new ways.” He acknowledged the potential for significant advancements and challenges with transformative technologies like AI but emphasised, “what’s never possible is turning back time.”

“It’s important for us to represent quality journalism as these products take shape – with the appropriate safeguards in place to protect the FT’s content and brand,” Ridding added.

The FT has embraced new technologies throughout its history. “We’ll continue to operate with both curiosity and vigilance as we navigate this next wave of change,” Ridding concluded.

(Photo by Utsav Srestha)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post FT and OpenAI ink partnership amid web scraping criticism appeared first on AI News.

Nightshade ‘poisons’ AI models to fight copyright theft

Ryan Daws — Tue, 24 Oct 2023 14:49:13 +0000

University of Chicago researchers have unveiled Nightshade, a tool designed to disrupt AI models attempting to learn from artistic imagery.

The tool – still in its developmental phase – allows artists to protect their work by subtly altering pixels in images, rendering them imperceptibly different to the human eye but confusing to AI models.

Many artists and creators have expressed concern over the use of their work in training commercial AI products without their consent.

AI models rely on vast amounts of multimedia data – including written material and images, often scraped from the web – to function effectively. Nightshade offers a potential solution by sabotaging this data.

When integrated into digital artwork, Nightshade misleads AI models, causing them to misidentify objects and scenes.

For instance, Nightshade transformed images of dogs into data that appeared to AI models as cats. After exposure to a mere 100 poison samples, the AI reliably generated a cat when asked for a dog—demonstrating the tool’s effectiveness.

This technique not only confuses AI models but also challenges the fundamental way in which generative AI operates. By exploiting the clustering of similar words and ideas in AI models, Nightshade can manipulate responses to specific prompts and further undermine the accuracy of AI-generated content.

Developed by computer science professor Ben Zhao and his team, Nightshade is an extension of their prior product, Glaze, which cloaks digital artwork and distorts pixels to baffle AI models regarding artistic style.

While the potential for misuse of Nightshade is acknowledged, the researchers’ primary objective is to shift the balance of power from AI companies back to artists and discourage intellectual property violations.

The introduction of Nightshade presents a major challenge to AI developers. Detecting and removing images with poisoned pixels is a complex task, given the imperceptible nature of the alterations.

If integrated into existing AI training datasets, these images necessitate removal and potential retraining of AI models, posing a substantial hurdle for companies relying on stolen or unauthorised data.

As the researchers await peer review of their work, Nightshade is a beacon of hope for artists seeking to protect their creative endeavours.

(Photo by Josie Weiss on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Nightshade ‘poisons’ AI models to fight copyright theft appeared first on AI News.

UMG files landmark lawsuit against AI developer Anthropic

Ryan Daws — Thu, 19 Oct 2023 15:54:37 +0000

Universal Music Group (UMG) has filed a lawsuit against Anthropic, the developer of Claude AI.

This landmark case represents the first major legal battle where the music industry confronts an AI developer head-on. UMG – along with several other key industry players including Concord Music Group, ABKCO, Worship Together Music, and Plaintiff Capital CMG – is seeking $75 million in damages.

The lawsuit centres around the alleged unauthorised use of copyrighted music by Anthropic to train its AI models. The publishers claim that Anthropic illicitly incorporated songs from artists they represent into its AI dataset without obtaining the necessary permissions.

Legal representatives for the publishers have asserted that the action was taken to address the “systematic and widespread infringement” of copyrighted song lyrics by Anthropic.

The lawsuit, spanning 60 pages and posted online by The Hollywood Reporter, emphasises the publishers’ support for innovation and ethical AI use. However, they contend that Anthropic has violated these principles and must be held accountable under established copyright laws.

Anthropic, despite positioning itself as an AI ‘safety and research’ company, stands accused of copyright infringement without regard for the law or the creative community whose works underpin its services, according to the lawsuit.

In addition to the significant monetary damages, the publishers have demanded a jury trial. They also seek reimbursement for legal fees, the destruction of all infringing material, public disclosure of how Anthropic’s AI model was trained, and financial penalties of up to $150,000 per infringed work.

This latest lawsuit follows a string of legal battles between AI developers and creators. Each new case is worth observing to see the precedent that is set for future battles.

(Photo by Jason Rosewell on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post UMG files landmark lawsuit against AI developer Anthropic appeared first on AI News.

GitLab: Developers view AI as ‘essential’ despite concerns

Ryan Daws — Wed, 06 Sep 2023 09:48:08 +0000

A survey by GitLab has shed light on the views of developers on the landscape of AI in software development.

The report, titled ‘The State of AI in Software Development,’ presents insights from over 1,000 global senior technology executives, developers, and security and operations professionals.

The report reveals a complex relationship between enthusiasm for AI adoption and concerns about data privacy, intellectual property, and security.

“Enterprises are seeking out platforms that allow them to harness the power of AI while addressing potential privacy and security risks,” said Alexander Johnston, Research Analyst in the Data, AI & Analytics Channel at 451 Research, a part of S&P Global Market Intelligence.

While 83 percent of the survey’s respondents view AI implementation as essential to stay competitive, a significant 79 percent expressed worries about AI tools accessing sensitive information and intellectual property.

Impact on developer productivity

AI is perceived as a boon for developer productivity, with 51 percent of all respondents citing it as a key benefit of AI implementation. However, security professionals are apprehensive that AI-generated code might lead to an increase in security vulnerabilities, potentially creating more work for them.

Only seven percent of developers’ time is currently spent identifying and mitigating security vulnerabilities, compared to 11 percent allocated to testing code. This raises questions about the widening gap between developers and security professionals in the AI era.

Privacy and intellectual property concerns

The survey underscores the paramount importance of data privacy and intellectual property protection when selecting AI tools. 95 percent of senior technology executives prioritise these aspects when choosing AI solutions.

Moreover, 32 percent of respondents admitted to being “very” or “extremely” concerned about introducing AI into the software development lifecycle. Within this group, 39 percent cited worries about AI-generated code introducing security vulnerabilities, and 48 percent expressed concerns that AI-generated code may not receive the same copyright protection as code produced by humans.

AI skills gap

Despite optimism about AI’s potential, the report identifies a disconnect between organisations’ provision of AI training resources and practitioners’ satisfaction with them.

While 75 percent of respondents stated that their organisations offer training and resources for using AI, an equivalent proportion expressed the need to seek resources independently—suggesting that the available training may be insufficient.

A striking 81 percent of respondents said they require more training to effectively utilise AI in their daily work. Furthermore, 65 percent of those planning to use AI for software development indicated that their organsations plan to hire new talent to manage AI implementation.

David DeSanto, Chief Product Officer at GitLab, said:

“According to the GitLab Global DevSecOps Report, only 25 percent of developers’ time is spent on code generation, but the data shows AI can boost productivity and collaboration in nearly 60 percent of developers’ day-to-day work.

To realise AI’s full potential, it needs to be embedded across the software development lifecycle, allowing everyone involved in delivering secure software – not just developers – to benefit from the efficiency boost.”

While AI holds immense promise for the software development industry, GitLab’s report makes it clear that addressing cybersecurity and privacy concerns, bridging the skills gap, and fostering collaboration between developers and security professionals are pivotal to successful AI adoption.

(Photo by Luca Bravo on Unsplash)

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post GitLab: Developers view AI as ‘essential’ despite concerns appeared first on AI News.

Getty is suing Stable Diffusion’s creator for copyright infringement

Ryan Daws — Wed, 18 Jan 2023 09:05:33 +0000

Stock image service Getty Images is suing Stable Diffusion creator Stability AI over alleged copyright infringement.

Stable Diffusion is one of the most popular text-to-image tools. Unlike many of its rivals, the generative AI model can run on a local computer.

Apple is a supporter of the Stable Diffusion project and recently optimised its performance on M-powered Macs. Last month, AI News reported that M2 Macs can now generate images using Stable Diffusion in under 18 seconds.

Text-to-image generators like Stable Diffusion have come under the spotlight for potential copyright infringement. Human artists have complained their creations have been used to train the models without permission or compensation.

Getty Images has now accused Stability AI of using its content and has commenced legal proceedings.

In a statement, Getty Images wrote:

“This week Getty Images commenced legal proceedings in the High Court of Justice in London against Stability AI claiming Stability AI infringed intellectual property rights including copyright in content owned or represented by Getty Images. It is Getty Images’ position that Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images absent a license to benefit Stability AI’s commercial interests and to the detriment of the content creators.

Getty Images believes artificial intelligence has the potential to stimulate creative endeavors. Accordingly, Getty Images provided licenses to leading technology innovators for purposes related to training artificial intelligence systems in a manner that respects personal and intellectual property rights. Stability AI did not seek any such license from Getty Images and instead, we believe, chose to ignore viable licensing options and long-standing legal protections in pursuit of their stand-alone commercial interests.”

While the images used for training alternatives like DALL-E 2 haven’t been disclosed, Stability AI has been transparent about how their model is trained. However, that may now have put the biz in hot water.

In an independent analysis of 12 million of the 2.3 billion images used to train Stable Diffusion, conducted by Andy Baio and Simon Willison, they found it was trained using images from nonprofit Common Crawl which scrapes billions of webpages monthly.

“Unsurprisingly, a large number came from stock image sites. 123RF was the biggest with 497k, 171k images came from Adobe Stock’s CDN at ftcdn.net, 117k from PhotoShelter, 35k images from Dreamstime, 23k from iStockPhoto, 22k from Depositphotos, 22k from Unsplash, 15k from Getty Images, 10k from VectorStock, and 10k from Shutterstock, among many others,” wrote the researchers.

Platforms with high amounts of user-generated content such as Pinterest, WordPress, Blogspot, Flickr, DeviantArt, and Tumblr were also found to be large sources of images that were scraped for training purposes.

The concerns around the use of copyrighted content for training AI models appear to be warranted. It’s likely we’ll see a growing number of related lawsuits over the coming months and years unless a balance is found between enabling AI training and respecting the work of human creators.

In October, Shutterstock announced that it was expanding its partnership with DALL-E creator OpenAI. As part of the expanded partnership, Shutterstock will offer DALL-E images to customers.

The partnership between Shutterstock and OpenAI will see the former create frameworks that will compensate artists when their intellectual property is used and when their works have contributed to the development of AI models.

(Photo by Tingey Injury Law Firm on Unsplash)

Relevant: Adobe to begin selling AI-generated stock images

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Getty is suing Stable Diffusion’s creator for copyright infringement appeared first on AI News.

OpenAI and Microsoft hit with lawsuit over GitHub Copilot

Ryan Daws — Wed, 09 Nov 2022 12:17:52 +0000

A class-action lawsuit has been launched against OpenAI and Microsoft over GitHub Copilot.

GitHub Copilot uses technology from OpenAI to help generate code and speed up software development. Microsoft says that it is trained on “billions of lines of public code … written by others.”

Last month, developer and lawyer Matthew Butterick announced that he’d partnered with the Joseph Saveri Law Firm to investigate whether Copilot infringed on the rights of developers by scraping their code and not providing due attribution.

This could unwittingly cause serious legal problems for GitHub Copilot users.

“Copilot leaves copyleft compliance as an exercise for the user. Users likely face growing liability that only increases as Copilot improves,” wrote Bradley M. Kuhn of Software Freedom Conservancy earlier this year.

“Users currently have no methods besides serendipity and educated guesses to know whether Copilot’s output is copyrighted by someone else.”

Copilot is powered by Codex, an AI system created by OpenAI and licensed to Microsoft. Codex currently offers suggestions on how to finish a line but Microsoft has touted its ability to suggest larger blocks of code, like the entire body of a function.

Butterick and litigators from the Joseph Saveri Law Firm have now filed a class-action lawsuit against Microsoft, GitHub, and OpenAI in a US federal court in San Francisco.

In addition to violating attribution requirements of open-source licenses, the claimants allege the defendants have violated:

GitHub’s own terms of service and privacy policies;
DMCA § 1202, which forbids the removal of copyright-management information;
the California Consumer Privacy Act;
and other laws giving rise to related legal claims.

The claimants acknowledge that this is the first step in what will likely be a long journey.

In a post on the claim’s website, Butterick wrote:

“As far as we know, this is the first class-action case in the US challenging the training and output of AI systems. It will not be the last. AI systems are not exempt from the law.

Those who create and operate these systems must remain accountable. If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still.

AI needs to be fair & ethical for everyone. If it’s not, then it can never achieve its vaunted aims of elevating humanity. It will just become another way for the privileged few to profit from the work of the many.”

AI News will keep you updated on the progress of the lawsuit as it emerges.

(Photo by Conny Schneider on Unsplash)

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post OpenAI and Microsoft hit with lawsuit over GitHub Copilot appeared first on AI News.

Experts debate whether GitHub’s latest AI tool violates copyright law

Ryan Daws — Tue, 06 Jul 2021 15:47:53 +0000

GitHub’s impressive new code-assisting AI tool called Copilot is receiving both praise and criticism.

Copilot draws context from the code that a developer is working on and can suggest entire lines or functions. The system, from OpenAI, claims to be “significantly more capable than GPT-3” in generating code and can help even veteran programmers to discover new APIs or ways to solve problems.

Critics claim the system is using copyrighted code that GitHub then plans to charge for:

If @GitHub (Microsoft) truly believes copilot isn't infringing on anyone's work, I want to offer them a chance to prove it: I'll donate $50k to a charity of their choice (or @EFF if we can't agree) if they release a Copilot version trained solely on Windows kernel source. 1/ https://t.co/WMWD6FTcR2
— Jake Williams (@MalwareJake) July 3, 2021

Julia Reda, a researcher and former MEP, published a blog post arguing that “GitHub Copilot is not infringing your copyright”.

GitHub – and therefore its owner, Microsoft – is using the huge number of repositories it hosts with ‘copyleft’ licenses for its tool. Copyleft allows open-source software or documentation to be modified and distributed back to the community.

Reda argues in her post that clamping down on tools such as GitHub’s through tighter copyright laws would harm copyleft and the benefits it offers.

One commenter isn’t entirely convinced:

“Lots of people have demonstrated that it pretty much regurgitates code verbatim from codebases with abandon. Putting GPL code inside a neural network does not remove the license if the output is the same as the input.
A large portion of what Copilot outputs is already full of copyright/license violations, even without extensions.”

Because the code is machine-generated, Reda also claims that it cannot be determined to be ‘derivative work’ that would face the wrath of intellectual property laws.

“Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work,” says Reda. “This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either.”

There is, of course, also a debate over whether the increasing amounts of machine-generated work should be covered under IP laws. We’ll let you decide your own position on the matter.

(Photo by Markus Winkler on Unsplash)

Find out more about Digital Transformation Week North America, taking place on November 9-10 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post Experts debate whether GitHub’s latest AI tool violates copyright law appeared first on AI News.