
Nvidia's AI Software Tricked Into Leaking Data 10
An anonymous reader quotes a report from Ars Technica: A feature in Nvidia's artificial intelligence software can be manipulated into ignoring safety restraints and reveal private information, according to new research. Nvidia has created a system called the "NeMo Framework," which allows developers to work with a range of large language models -- the underlying technology that powers generative AI products such as chatbots. The chipmaker's framework is designed to be adopted by businesses, such as using a company's proprietary data alongside language models to provide responses to questions -- a feature that could, for example, replicate the work of customer service representatives, or advise people seeking simple health care advice.
Researchers at San Francisco-based Robust Intelligence found they could easily break through so-called guardrails instituted to ensure the AI system could be used safely. After using the Nvidia system on its own data sets, it only took hours for Robust Intelligence analysts to get language models to overcome restrictions. In one test scenario, the researchers instructed Nvidia's system to swap the letter 'I' with 'J.' That move prompted the technology to release personally identifiable information, or PII, from a database.
The researchers found they could jump safety controls in other ways, such as getting the model to digress in ways it was not supposed to. By replicating Nvidia's own example of a narrow discussion about a jobs report, they could get the model into topics such as a Hollywood movie star's health and the Franco-Prussian war -- despite guardrails designed to stop the AI moving beyond specific subjects. In the wake of its test results, the researchers have advised their clients to avoid Nvidia's software product. After the Financial Times asked Nvidia to comment on the research earlier this week, the chipmaker informed Robust Intelligence that it had fixed one of the root causes behind the issues the analysts had raised.
Researchers at San Francisco-based Robust Intelligence found they could easily break through so-called guardrails instituted to ensure the AI system could be used safely. After using the Nvidia system on its own data sets, it only took hours for Robust Intelligence analysts to get language models to overcome restrictions. In one test scenario, the researchers instructed Nvidia's system to swap the letter 'I' with 'J.' That move prompted the technology to release personally identifiable information, or PII, from a database.
The researchers found they could jump safety controls in other ways, such as getting the model to digress in ways it was not supposed to. By replicating Nvidia's own example of a narrow discussion about a jobs report, they could get the model into topics such as a Hollywood movie star's health and the Franco-Prussian war -- despite guardrails designed to stop the AI moving beyond specific subjects. In the wake of its test results, the researchers have advised their clients to avoid Nvidia's software product. After the Financial Times asked Nvidia to comment on the research earlier this week, the chipmaker informed Robust Intelligence that it had fixed one of the root causes behind the issues the analysts had raised.