The Security Risks of Entering Company Confidential Information into Large Language Models like ChatGPT

By Peter Lunk

May 15, 2023 – In recent years, the rapid advancement of large language models (LLMs), such as ChatGPT, Google Bard and Copilot have transformed various industries and revolutionized the way we interact with technology. These models, powered by artificial intelligence, have the ability to generate human-like text based on the data on which they have been trained. However, as organizations begin to harness the power of LLMs, it is crucial to understand the potential security risks associated withentering company-confidential information into these models. This blog post aims to shed light on these risks and provide insights into mitigating them.

LLMs, like OpenAI & GPT-3.5, possess impressive capabilities in generating coherent and contextuallyrelevant responses. They learn from vast amounts of data, including books, articles, websites, and other textual sources. While this enables them to generate high-quality text, it also means that they lack the ability to distinguish between confidential and non-confidential information.

One of the primary security risks associated with entering company confidential information into an LLM model is data leakage. Currently, any data provided as an input into an LLM program can become part of the database of information used by the AI for its learning models. As a result, this data could be inadvertently exposed by the LLM in subsequent responses to LLM users outside of your organization, posing a significant risk to the organization’s confidentiality.

While the security risks of entering company-confidential information into an LLM model are concerning and have already happened in the real world, there are measures organizations can take to mitigate these risks. Implementing an end-to-end process to reduce the risk of exposure of company confidential information from using these tools includes three key steps. Discovering the LLM tools in use, detecting and alerting on risky behavior, and reducing risk of confidential data being exposed to the model.

LLM Usage Discovery: New LLM tools are being introduced at an unprecedented scale, creating blind spots for IT security teams. To shine a light on where these are being used, organizations should implement tools that continuously discover usage of unsanctioned SaaS applications.

Detecting and Alerting on Risky User Behavior: To properly detect risky behavior that may result in sensitive data exposure, a reporting tool needs to be able to correlate user identity, and user activity with the app in which it takes place. Users often upload, copy, and paste company confidential data in the normal course of their day-to-day work. The key is to be able to provide contextual visibility into a sequence of events like a user accessing GitHub, copying code, and then minutes later, the same user accessing an LLM and performing a paste operation. Providing context and visibility into these indicators of data exposure allows a security team to audit and alert on risky activities.

Reducing Model Exposure: Once the security tools have the context of which users are accessing the LLMs, it becomes possible to limit the scope of sensitive information shared. Security controls like preventing file uploads or copying and pasting in these models will minimize the exposure of sensitive data to the model, and dramatically reduce the risk of data leakage.

While LLM models like ChatGPT will undoubtedly transform a variety of industries, organizations must be cautious about the security risks associated with entering confidential information into these models. Data leakage through the models is a hazard that requires proactive measures. As the browser is the key interaction point with any of these models, it makes sense to initiate security measures directly in the browser. New security tools like the Mammoth Enterprise Access Browser can discover LLM usage, provide detailed levels of contextual reporting and alerting on risky behavior, and dramatically reduce the amount of confidential information entering these LLMs. By implementing a secure Enterprise Browser to enforce these controls, organizations can mitigate these risks and leverage the power of LLM models without compromising the security of their sensitive information.