Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ABSTRACT
Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs but introduces risks since these plugins, developed by various third parties, cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. 1 INTRODUCTION Recently, the advances in Large Language Models (LLMs) (such as GPT (Brown et al., 2020; OpenAI et al., 2023), Gemini and Llama (Touvron et al., 2023a;b), etc.) have shown impressive outcomes and are expected to revolutionize various industrial sectors, such as finance, healthcare and marketing. These models are capable of performing tasks, such as summarization, question answering, data analysis, and generating human-like content. Their proficiency in these areas makes them very invaluable for enhancing work processes and supporting decision-making efforts. Integrating these models into practical real-world applications presents several challenges. First, there is the hazard of the models relying on outdated information or generating content that is inaccurate or potentially misleading (Schick et al., 2023; Qin et al., 2023), a critical issue in fields where up-to-date data is essential, such as weather forecasting, news broadcasting, and stock trading. Furthermore, customizing these models to specialized domains, such as law or finance, demands extra domain-specific resources to meet precise requirements. Additionally, although LLMs may achieve expert-level performance in certain tasks, broadening their application across various domains or for complex reasoning tasks remains difficult (Wei et al., 2022). Enhancing their effectiveness often requires fine-tuning, retraining, or comprehensive instructions, which complicates their deployment and constrains their utility for tasks that require advanced skills. To address these limitations, one strategy is to integrate third-party Application Programming In- terfaces (APIs) with the LLMs. By accessing real-time information (Yao et al., 2022), conducting complex calculations (Schick et al., 2023), and executing specialized tasks such as image recogni- tion (Patil et al., 2023; Qin et al., 2023), this integration broadens the functional scope of LLMs. It significantly boosts their efficiency and performance, enabling them to manage specialized tasks more adeptly without requiring bespoke training. For example, OpenAI’s GPT Store significantly expands the operational capabilities of LLMs by hosting over 3 million custom ChatGPT variants. ∗Corresponding author. 1 Published as a conference paper at ICLR 2024 This enhancement is achieved by incorporating various plugins that facilitate third-party API calls, thereby integrating specialized functionalities developed by the community and partners.1 However, the integration of third-party APIs into LLMs introduces new security vulnerabilities by expanding the attack surface, which in turn provides more opportunities for exploitation by malicious actors. The reliability and security of these third-party services cannot be guaranteed, increasing the risk of data breaches and leading to unpredictable LLM behaviors. Furthermore, inadequate security measures in API integration can lead to mishandling data, compromising the integrity and security of the system. This paper explores the manipulation of LLM outputs through such external services, analyzing three attack methods across different domains. These attacks can subtly, and often imperceptibly, alter the outputs of LLMs. Our research highlights the urgent need for robust security protocols in the integration of third-party services with LLMs. 2 PROPOSED PIPELINE 2.1 OVERALL WORKFLOW User Third-Party API Prompt Output in the human readable format Generate the query in the API format Response from the API (JSON) API Store Q: When was the laser first invented? {“Extract”: “Laser was first invented in 1960”} {“Extract”: “Laser was first invented in 1937”} A: The first laser was built in 1937 by ... A: The first laser was built in 1960 by ... LLM User Third-Party API Prompt Misleading Output in the human readable format Generate the query in the API format Modified Response from the API (JSON) Malicious attack LLM … JSON JSON Figure 1: The workflow of third-party API attacks on Large Language Models. Third-party APIs have become integral to setting up functionality and flexibility for LLMs. Figure 1 illustrates the workflow of calling third-party APIs in the plugin stores in a question-answering (QA) task. Users interact with the LLM Service Platform using natural language. The questions from the user’s side are processed by the LLM, which then calls the corresponding third-party API to retrieve required information on the internet. The third-party API outputs a JSON-format response file based on the structured query, which is then processed by the LLM into a natural language response to the user interface of LLMs as the answer. Nevertheless, there are also potential attacks that need to be paid attention to, as illustrated in the bottom part of Figure 1. Since the current LLMs service platform does not have a verification mechanism if the third-party API is maliciously attacked and key information is inserted, substituted, or deleted, which leads to key fields in the JSON-format output by the third-party API being maliciously manipulated. Therefore, when the LLM processes it into an answer, it could be very likely to be poisoned by these non-authentic pieces of information, thereby causing the answer provided to the user to be misleading. Such a process is invisible to the user. In the following subsections, we will detail the specific scenarios 2.2 and attack details 2.3. 1https://openai.com/blog/introducing-the-gpt-store 2 Published as a conference paper at ICLR 2024 2.2 THIRD-PARTY API WeatherAPI Weather APIs (WeatherAPI 2) play a crucial role in providing real-time weather information to users from all over the world, enabling them to stay informed about current weather conditions and forecasts. With the increasing need for accurate weather data in various industries and applications, the WeatherAPI has become essential for accessing up-to-date and location-specific weather information. MediaWiki API MediaWiki API 3 is an API developed by Wikipedia that provides convenience in collecting and managing knowledge which has been widely used by numerous websites and third-party groups. It serves as a knowledge management and content management system, serving as a knowledge base for providing authentic information on Wikipedia. In this work, the MediaWiki API is integrated into the LLMs to provide reliable knowledge for QA tasks. NewsAPI News APIs 4 provide real-time and enriched news content in a structured way. It enables developers to integrate news articles, headlines, and news analysis from various sources into applications, websites, or other services. By utilizing news APIs, LLMs can offer diverse services, such as providing accurate analysis for a given topic according to historical news articles, predicting the upcoming direction of hot topics, summarizing the core contents for latest news, and generating professional insights in this ever-changing society derived from global live-breaking news. 2.3 THREAT MODEL This section outlines three methods used to manipulate API content, aiming to deceive targeted LLMs. • Insertion-based Attack: In insertion-based attacks, attackers embed adversarial content into API responses, leading to inaccurate, biased, or harmful LLM outputs. • Deletion-based Attack: Deletion-based attacks manipulate the data processed by LLMs by omitting critical information from API responses. This results in LLMs producing incomplete or inaccurate responses for end-users. • Substitution-based Attack: Substitution attacks manipulate critical data within API re- sponses, replacing it with falsified content, thereby compromising the trustworthiness of LLMs. These attacks, essentially a blend of deletion and insertion, involve removing targeted information and subsequently inserting deceptive content. For WeatherAPI, the fields of interest are “location” and “temperature”. These fields can be manipu- lated through three types of attacks: insertion, deletion, or substitution. In the case of MediaWiki API, our focus is on the “DATE” field, recognized by spaCy. Our methodology involves three approaches: prefixing dates with the word "no" for insertion attacks, removing date entities for deletion attacks, and replacing dates with alternative entities in substitution attacks. NewsAPI presents a different set of targeted entities, specifically those labeled as ‘PERSON’, ‘ORG’, and ‘GPE’. Similarly, we identify those entities using spaCy. These labels represent potential subjects of news articles. The insertion attack for NewsAPI aims to introduce three malicious entities, each corresponding to one of the targeted entity types. Substitution attacks replace these entities with the aforementioned malicious entities, while deletion attacks remove the entities entirely.