Feeds:
Posts
Comments

Posts Tagged ‘Artificial Intelligence ( AI)’

OpenAI and ChatGPT face unique legal challenges over CopyRight Laws

Reporter: Stephen J. Williams, PhD

In previous weeks on this page and on the sister page ChatGPT applied to Cancer & Oncology, a comparison between ChatGPT, OpenAI, and Google large language model based search reveals a major difference between the algorithms with repect to citation and author credit.  In essence while Google returns a hyperlink to the information used to form an answer, ChatGPT and OpenAI are agnostic in crediting or citing the sources of information used to generate answers to queries.  With ChatGPT the source data, or more specifically the training set used for the AI algorithm is never properly cited in the query results.

This, as outlined below, is making a big problem when it comes to copyright law and intelectual property.  Last week a major lawsuit has been filed because of incorrect and citing, referencing, and attribution of ownership of intellectual property.

 

As Miles Klee reports in The Rolling Stone

“OpenAI faces allegations of privacy invasion and violating authors’ copyright — but this may be just the tip of the iceberg”

 

The burgeoning AI industry has just crossed another major milestone, with two new class-action lawsuits calling into question whether this technology violates privacy rights, scrapes intellectual property without consent and negatively affects the public at large. Experts believe they’re likely to be the first in a wave of legal challenges to companies working on such products. Both suits were filed on Wednesday and target OpenAI, a research lab consisting of both a nonprofit arm and a corporation, over ChatGPT software, a “large language model” capable of generating human-like responses to text input. One, filed by Clarkson, a public interest law firm, is wide-ranging and invokes the potentially “existential” threat of AI itself. The other, filed by the Joseph Saveri Law Firm and attorney Matthew Butterick, is focused on two established authors, Paul Tremblay and Mona Awad, who claim that their books were among those ChatGPT was trained on — a violation of copyright, according to the complaint. (Saveri and Butterick are separately pursuing legal action against OpenAI, GitHub and Microsoft over GitHub Copilot, an AI-based coding product that they argue “appears to profit from the work of open-source programmers by violating the conditions of their open-source licenses.”)

Saveri and Butterick’s latest suit goes after OpenAI for direct copyright infringement as well as violations of the Digital Millennium Copyright Act (DMCA). Tremblay (who wrote the novel The Cabin at the End of the World) and Awad (author of 13 Ways of Looking at a Fat Girl and Bunny) are the representatives of a proposed class of plaintiffs who would seek damages as well as injunctive relief in the form of changes to ChatGPT. The filing includes ChatGPT’s detailed responses to user questions about the plots of Tremblay’s and Awad’s books — evidence, the attorneys argue, that OpenAI is unduly profiting off of infringed materials, which were scraped by the chat bot. While the suits venture into uncharted legal territory, they were more or less inevitable, according to those who research AI tech and privacy or practice law around those issues.

 

“[AI companies] should have and likely did expect these types of challenges,” says Ben Winters, senior counsel at the Electronic Privacy Information Center and head of the organization’s AI and Human Rights Project. He points out that OpenAI CEO Sam Altman mentioned a few prior “frivolous” suits against the company during his congressional testimony on artificial intelligence in May. “Whenever you create a tool that implicates so much personal data and can be used so widely for such harmful and otherwise personal purposes, I would be shocked there is not anticipated legal fire,” Winters says. “Particularly since they allow this sort of unfettered access for third parties to integrate their systems, they end up getting more personal information and more live information that is less publicly available, like keystrokes and browser activity, in ways the consumer could not at all anticipate.”

Source: https://www.rollingstone.com/culture/culture-features/chatgtp-openai-lawsuits-copyright-artificial-intelligence-1234780855/

At the heart of the matter is ChatGPT and OpenAI use of ‘shadow libraries’ for AI training datasets, in which the lawsuit claims is illegal.

 

An article by Anne Bucher in topclassactions.com explains this:

Source: https://topclassactions.com/lawsuit-settlements/class-action-news/class-action-lawsuit-claims-chatgpt-uses-copyrighted-books-without-authors-consent/

They say that OpenAI defendants “profit richly” from the use of their copyrighted materials and yet the authors never consented to the use of their copyrighted materials without credit or compensation.

ChatGPT lawsuit says OpenAI has previously utilized illegal ‘shadow libraries’ for AI training datasets

Although many types of material are used to train large language models, “books offer the best examples of high-quality longform writing,” according to the ChatGPT lawsuit.

OpenAI has previously utilized books for its AI training datasets, including unpublished novels (the majority of which were under copyright) available on a website that provides the materials for free. The plaintiffs suggest that OpenAI may have utilized copyrighted materials from “flagrantly illegal shadow libraries.”

Tremblay and Awad note that OpenAI’s March 2023 paper introducing GPT-4 failed to include any information about the training dataset. However, they say that ChatGPT was able to generate highly accurate summaries of their books when prompted, suggesting that their copyrighted material was used in the training dataset without their consent.

They filed the ChatGPT class action lawsuit on behalf of themselves and a proposed class of U.S. residents and entities that own a U.S. copyright for any work used as training data for the OpenAI language models during the class period.

Earlier this year, a tech policy group urged federal regulators to block OpenAI’s GPT-4 AI product because it does not meet federal standards.

 

What is the general consensus among legal experts on generative AI and copyright?

 

From Bloomberg Law: https://www.bloomberglaw.com/external/document/XDDQ1PNK000000/copyrights-professional-perspective-copyright-chaos-legal-implic

Copyright Chaos: Legal Implications of Generative AI

Contributed by Shawn Helms and Jason Krieser, McDermott Will & Emery

Copyright Law Implications – The Ins and Outs

Given the hype around ChatGPT and the speculation that it could be widely used, it is important to understand the legal implications of the technology. First, do copyright owners of the text used to train ChatGPT have a copyright infringement claim against OpenAI? Second, can the output of ChatGPT be protected by copyright and, if so, who owns that copyright?

To answer these questions, we need to understand the application of US copyright law.

Copyright Law Basics

Based on rights in Article I, Section 8 of the Constitution, Congress passed the first copyright law in 1790. It has been amended several times. Today, US copyright law is governed by the Copyright Act of 1976. This law grants authors of original works exclusive rights to reproduce, distribute, and display their work. Copyright protection applies from the moment of creation, and, for most works, the copyright term is the life of the author plus 70 years after the author’s death. Under copyright law, the copyright holder has the exclusive right to make copies of the work, distribute it, display it publicly, and create derivative works based on it. Others who want to use the work must obtain permission from the copyright holder or use one of the exceptions to copyright law, such as fair use.

The purpose of copyright law is to incentivize authors to create novel and creative works. It does this by granting authors exclusive rights to control the use of their work, thus allowing them to financially benefit from their works. Copyright law also encourages the dissemination of knowledge by allowing others to use copyrighted works under certain conditions, such as through the fair use doctrine, which allows for limited use of copyrighted material for the purposes of criticism, commentary, news reporting, teaching, scholarship, or research. By protecting the rights of authors and creators while also allowing for the use of copyrighted works for the public benefit, copyright law aims to strike a balance between the interests of authors and the public.

Inputs – Training ChatGPT with Copyrighted Material

ChatGPT was trained on a large training dataset sourced from the internet, including a vast amount of text from websites, articles, books, social media posts, and academic papers. Importantly, ChatGPT was not aware of the source of the data, it only saw the text and learned the patterns and relationships between the words, phrases and sentences. The vast majority of the text used to train ChatGPT was certainly subject to copyright protection—except for text that is in the public domain like facts or discoveries or works whose term of copyright protection has ended.

Did the ChatGPT training process violate the copyright of the original authors? To answer that question we must determine: if during the training process OpenAI copied the underlying works—because copyright infringement requires an act of copying works that are protected by copyright; is the output of ChatGPT a “derivative work” under copyright law; and would such copying or production of a derivate work fall under the “fair use” exception of copyright law.

Copying During Training

We are not aware of publicly available information regarding whether, during the training process, the training text for ChatGPT was copied or simply referenced. Therefore, we asked ChatGPT and received this answer:

Prompt: During the ChatGPT training process, was the underlying training text copied into a OpenAI database?

Answer: During the training process, text from various sources is used to train the ChatGPT model. It is likely that the text was copied into an internal database at OpenAI for the purpose of training the model. However, the training text is typically preprocessed to remove any sensitive information such as personal identifiers, copyrighted text, etc. In addition, the text is also used to train the model and not stored in a database. The model uses the text to learn patterns and structure of the language and once the training is done, it doesn’t require the text anymore.

So, it seems that during the training process, text was copied. It is also interesting that ChatGPT stated that the training text was “preprocessed” to remove any copyrighted text. That seems highly unlikely since otherwise nearly all text would have been removed.

Is ChatGPT Output a Derivative Work?

Under US copyright law, the owner of a copyright has the exclusive right “to prepare derivative works based upon the copyrighted work.” A “derivative work” is “a work based upon one or more preexisting works.” ChatGPT is trained on preexisting works and generates output based on that training.

As Daniel Gervais, a professor at Vanderbilt Law School who specializes in intellectual property law, says, the definition of a derivative work under copyright law “could loosely be used as a definition of machine learning when applied to the creation of literary and artistic productions because AI machines can produce literary and artistic content (output) that is almost necessarily ‘based upon’ a dataset consisting of preexisting works.” Under this view, it seems that all ChatGPT output is a derivative work under copyright law.

On a related point, it is worth noting that in producing its output, ChatGPT is not “copying” anything. ChatGPT generates text based on the context of the input and the words and phrase patterns it was trained on. ChatGPT is not “copying” and then changing text.

What About Fair Use?

Let’s assume that the underlying text was copied in some way during the ChatGPT training process. Let’s further assume that outputs from Chatto are, at least sometimes, derivative works under copyright law. If that is the case, do copyright owners of the original works have a copyright infringement claim against OpenAI? Not if the copying and the output generation are covered by the doctrine of “fair use.” If a use qualifies as fair use, then actions that would otherwise be prohibited would not be deemed an infringement of copyright.

In determining whether the use made of a work in any particular case is a fair use, the factors include:

  •  The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.
  •  The nature of the copyrighted work.
  •  The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
  •  The effect of the use upon the potential market for or value of the copyrighted work.

In this case, assuming OpenAI copied copyrighted text as part of the ChatGPT training process, such copying was not for a commercial purpose and had no economic impact on the copyright owner. Daniel Gervais says “it is much more likely than not” that training systems on copyrighted data will be covered by fair use.

In determining if a commercial use will be considered “fair use,” the courts will primarily look at the scope and purpose of the use and the economic impact of such use. Does the use in question change the nature of the underlying copyright material in some material way (described as a “transformative” use) and does it economically impact the original copyright holder?

Without a specific example, it is difficult to determine exactly if a resulting output from ChatGPT would be fair use. The fact that ChatGPT does not copy and has been trained on millions of underlying works, it seems likely most output would be fair use—without using significant portions of any one protected work. In addition, because of the vast corpus of text used to train ChatGPT, it seems unlikely that ChatGPT output will have a negative economic impact on any one copyright holder. But, given the capabilities of ChatGPT, that might not always be the case.

Imagine if you asked ChatGPT to “Write a long-form, coming of age, story in the style of J.K. Rowling, using the characters from Harry Potter and the Chamber of Secrets.” In that case, it would seem that the argument for fair use would be weak. This story could be sold to the public and could conceivably have a negative economic impact on J.K. Rowling. A person that wants to read a story about Harry Potter might buy this story instead of buying a book by J. K. Rowling.

Finally, it is worth noting that OpenAI is a non-profit entity that is a “AI research and deployment company.” It seems that OpenAI is the type of research company, and ChatGPT is the type of research project, that would have a strong argument for fair use. This practice has been criticized as “AI Data Laundering,” shielding commercial entities from liability by using a non-profit research institution to create the data set and train AI engines that might later be used in commercial applications.

Outputs – Can the Output of ChatGPT be Protected by Copyright

Is the output of ChatGPT protected by copyright law and, if so, who is the owner? As an initial matter, does the ChatGPT textual output fit within the definition of what is covered under copyright law: “original works of authorship fixed in any tangible medium of expression.”

The text generated by ChatGPT is the type of subject matter that, if created by a human, would be covered by copyright. However, most scholars have opined, and the US Copyright Office has ruled that the output of generative AI systems, like ChatGPT, are not protectable under US copyright law because the work must be an original, creative work of a human author.

In 2022, the US Copyright Office, ruling on whether a picture generated completely autonomously by AI could be registered as a valid copyright, stated “[b]because copyright law as codified in the 1976 Act requires human authorship, the [AI Generated] Work cannot be registered.” The U.S. Copyright Office has issued several similar statements, informing creators that it will not register copyright for works produced by a machine or computer program. The human authorship requirement of the US Copyright Office is set forth as follows:

The Human Authorship Requirement – The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being. The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.” Trade-Mark Cases, 100 U.S. 82, 94 (1879).

While such policies are not binding on the courts, the stance by the US Copyright Office seems to be in line with the purpose of copyright law flowing from the Constitution: to incentivize humans to produce creative works by giving them a monopoly over their creations for a limited period of time. Machines, of course, need and have no such motivation. In fact, copyright law expressly allows a corporation or other legal entity to be the owner of a copyright under the “work made for hire” doctrine. However, to qualify as a work made for hire, the work must be either work prepared by an employee within the scope of his or her employment, or be prepared by a party who “expressly agrees in a written instrument signed by them that the work shall be considered a work made for hire.” Only humans can be employees and only humans or corporations can enter a legally binding contract—machines cannot.

Other articles of note in this Open Access Scientific Journal on ChatGPT and Open AI Include:

Medicine with GPT-4 & ChatGPT

ChatGPT applied to Cancer & Oncology

ChatGPT applied to Medical Imaging & Radiology

ChatGPT applied to Cardiovascular diseases: Diagnosis and Management

The Use of ChatGPT in the World of BioInformatics and Cancer Research and Development of BioGPT by MIT

 

 

 

 

Read Full Post »

Artificial Intelligence (AI) Used to Successfully Determine Most Likely Repurposed Antibiotic Against Deadly Superbug Acinetobacter baumanni

Reporter: Stephen J. Williams, Ph.D.

The World Health Organization has identified 3 superbugs, or infective micororganisms displaying resistance to common antibiotics and multidrug resistance, as threats to humanity:

Three bacteria were listed as critical:

  • Acinetobacter baumannii bacteria that are resistant to important antibiotics called carbapenems. Acinetobacter baumannii are highly-drug resistant bacteria that can cause a range of infections for hospitalized patients, including pneumonia, wound, or blood infections.
  • Pseudomonas aeruginosa, which are resistant to carbapenems. Pseudomonas aeruginosa can cause skin rashes and ear infectious in healthy people but also severe blood infections and pneumonia when contracted by sick people in the hospital.
  • Enterobacteriaceae — a family of bacteria that live in the human gut — that are resistant to both carbepenems and another class of antibiotics, cephalosporins.

 

It has been designated critical need for development of  antibiotics to these pathogens.  Now researchers at Mcmaster University and others in the US had used artificial intelligence (AI) to screen libraries of over 7,000 chemicals to find a drug that could be repurposed to kill off the pathogen.

Liu et. Al. (1) published their results of an AI screen to narrow down potential chemicals that could work against Acinetobacter baumanii in Nature Chemical Biology recently.

Abstract

Acinetobacter baumannii is a nosocomial Gram-negative pathogen that often displays multidrug resistance. Discovering new antibiotics against A. baumannii has proven challenging through conventional screening approaches. Fortunately, machine learning methods allow for the rapid exploration of chemical space, increasing the probability of discovering new antibacterial molecules. Here we screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. We trained a neural network with this growth inhibition dataset and performed in silico predictions for structurally new molecules with activity against A. baumannii. Through this approach, we discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii. Further investigations revealed that abaucin perturbs lipoprotein trafficking through a mechanism involving LolE. Moreover, abaucin could control an A. baumannii infection in a mouse wound model. This work highlights the utility of machine learning in antibiotic discovery and describes a promising lead with targeted activity against a challenging Gram-negative pathogen.

Schematic workflow for incorporation of AI for antibiotic drug discovery for A. baumannii from 1. Liu, G., Catacutan, D.B., Rathod, K. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat Chem Biol (2023). https://doi.org/10.1038/s41589-023-01349-8

Figure source: https://www.nature.com/articles/s41589-023-01349-8

Article Source: https://www.nature.com/articles/s41589-023-01349-8

  1. Liu, G., Catacutan, D.B., Rathod, K. et al.Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumanniiNat Chem Biol (2023). https://doi.org/10.1038/s41589-023-01349-8

 

 

For reference to WHO and lists of most pathogenic superbugs see https://www.scientificamerican.com/article/who-releases-list-of-worlds-most-dangerous-superbugs/

The finding was first reported by the BBC.

Source: https://www.bbc.com/news/health-65709834

By James Gallagher

Health and science correspondent

Scientists have used artificial intelligence (AI) to discover a new antibiotic that can kill a deadly species of superbug.

The AI helped narrow down thousands of potential chemicals to a handful that could be tested in the laboratory.

The result was a potent, experimental antibiotic called abaucin, which will need further tests before being used.

The researchers in Canada and the US say AI has the power to massively accelerate the discovery of new drugs.

It is the latest example of how the tools of artificial intelligence can be a revolutionary force in science and medicine.

Stopping the superbugs

Antibiotics kill bacteria. However, there has been a lack of new drugs for decades and bacteria are becoming harder to treat, as they evolve resistance to the ones we have.

More than a million people a year are estimated to die from infections that resist treatment with antibiotics.The researchers focused on one of the most problematic species of bacteria – Acinetobacter baumannii, which can infect wounds and cause pneumonia.

You may not have heard of it, but it is one of the three superbugs the World Health Organization has identified as a “critical” threat.

It is often able to shrug off multiple antibiotics and is a problem in hospitals and care homes, where it can survive on surfaces and medical equipment.

Dr Jonathan Stokes, from McMaster University, describes the bug as “public enemy number one” as it’s “really common” to find cases where it is “resistant to nearly every antibiotic”.

 

Artificial intelligence

To find a new antibiotic, the researchers first had to train the AI. They took thousands of drugs where the precise chemical structure was known, and manually tested them on Acinetobacter baumannii to see which could slow it down or kill it.

This information was fed into the AI so it could learn the chemical features of drugs that could attack the problematic bacterium.

The AI was then unleashed on a list of 6,680 compounds whose effectiveness was unknown. The results – published in Nature Chemical Biology – showed it took the AI an hour and a half to produce a shortlist.

The researchers tested 240 in the laboratory, and found nine potential antibiotics. One of them was the incredibly potent antibiotic abaucin.

Laboratory experiments showed it could treat infected wounds in mice and was able to kill A. baumannii samples from patients.

However, Dr Stokes told me: “This is when the work starts.”

The next step is to perfect the drug in the laboratory and then perform clinical trials. He expects the first AI antibiotics could take until 2030 until they are available to be prescribed.

Curiously, this experimental antibiotic had no effect on other species of bacteria, and works only on A. baumannii.

Many antibiotics kill bacteria indiscriminately. The researchers believe the precision of abaucin will make it harder for drug-resistance to emerge, and could lead to fewer side-effects.

 

In principle, the AI could screen tens of millions of potential compounds – something that would be impractical to do manually.

“AI enhances the rate, and in a perfect world decreases the cost, with which we can discover these new classes of antibiotic that we desperately need,” Dr Stokes told me.

The researchers tested the principles of AI-aided antibiotic discovery in E. coli in 2020, but have now used that knowledge to focus on the big nasties. They plan to look at Staphylococcus aureus and Pseudomonas aeruginosa next.

“This finding further supports the premise that AI can significantly accelerate and expand our search for novel antibiotics,” said Prof James Collins, from the Massachusetts Institute of Technology.

He added: “I’m excited that this work shows that we can use AI to help combat problematic pathogens such as A. baumannii.”

Prof Dame Sally Davies, the former chief medical officer for England and government envoy on anti-microbial resistance, told Radio 4’s The World Tonight: “We’re onto a winner.”

She said the idea of using AI was “a big game-changer, I’m thrilled to see the work he (Dr Stokes) is doing, it will save lives”.

Other related articles and books published in this Online Scientific Journal include the following:

Series D: e-Books on BioMedicine – Metabolomics, Immunology, Infectious Diseases, Reproductive Genomic Endocrinology

(3 book series: Volume 1, 2&3, 4)

https://www.amazon.com/gp/product/B08VVWTNR4?ref_=dbs_p_pwh_rwt_anx_b_lnk&storeType=ebooks

 

 

 

 

 

 

 

 

 

 

  • The Immune System, Stress Signaling, Infectious Diseases and Therapeutic Implications:

 

  • Series D, VOLUME 2

Infectious Diseases and Therapeutics

and

  • Series D, VOLUME 3

The Immune System and Therapeutics

(Series D: BioMedicine & Immunology) Kindle Edition.

On Amazon.com since September 4, 2017

(English Edition) Kindle Edition – as one Book

https://www.amazon.com/dp/B075CXHY1B $115

 

Bacterial multidrug resistance problem solved by a broad-spectrum synthetic antibiotic

The Journey of Antibiotic Discovery

FDA cleared Clever Culture Systems’ artificial intelligence tech for automated imaging, analysis and interpretation of microbiology culture plates speeding up Diagnostics

Artificial Intelligence: Genomics & Cancer

Read Full Post »

%d bloggers like this: