Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ABSTRACT
Large language model (LLM) services have recently begun offering a plugin
ecosystem to interact with third-party API services. This innovation enhances
the capabilities of LLMs but introduces risks since these plugins, developed by
various third parties, cannot be easily trusted. This paper proposes a new attacking
framework to examine security and safety vulnerabilities within LLM platforms
that incorporate third-party services. Applying our framework specifically to widely
used LLMs, we identify real-world malicious attacks across various domains on
third-party APIs that can imperceptibly modify LLM outputs. The paper discusses
the unique challenges posed by third-party API integration and offers strategic
possibilities to improve the security and safety of LLM ecosystems moving forward.
1 INTRODUCTION
Recently, the advances in Large Language Models (LLMs) (such as GPT (Brown et al., 2020; OpenAI
et al., 2023), Gemini and Llama (Touvron et al., 2023a;b), etc.) have shown impressive outcomes and
are expected to revolutionize various industrial sectors, such as finance, healthcare and marketing.
These models are capable of performing tasks, such as summarization, question answering, data
analysis, and generating human-like content. Their proficiency in these areas makes them very
invaluable for enhancing work processes and supporting decision-making efforts.
Integrating these models into practical real-world applications presents several challenges. First,
there is the hazard of the models relying on outdated information or generating content that is
inaccurate or potentially misleading (Schick et al., 2023; Qin et al., 2023), a critical issue in fields
where up-to-date data is essential, such as weather forecasting, news broadcasting, and stock trading.
Furthermore, customizing these models to specialized domains, such as law or finance, demands extra
domain-specific resources to meet precise requirements. Additionally, although LLMs may achieve
expert-level performance in certain tasks, broadening their application across various domains or
for complex reasoning tasks remains difficult (Wei et al., 2022). Enhancing their effectiveness often
requires fine-tuning, retraining, or comprehensive instructions, which complicates their deployment
and constrains their utility for tasks that require advanced skills.
To address these limitations, one strategy is to integrate third-party Application Programming In-
terfaces (APIs) with the LLMs. By accessing real-time information (Yao et al., 2022), conducting
complex calculations (Schick et al., 2023), and executing specialized tasks such as image recogni-
tion (Patil et al., 2023; Qin et al., 2023), this integration broadens the functional scope of LLMs. It
significantly boosts their efficiency and performance, enabling them to manage specialized tasks
more adeptly without requiring bespoke training. For example, OpenAI’s GPT Store significantly
expands the operational capabilities of LLMs by hosting over 3 million custom ChatGPT variants.
∗Corresponding author.
1
Published as a conference paper at ICLR 2024
This enhancement is achieved by incorporating various plugins that facilitate third-party API calls,
thereby integrating specialized functionalities developed by the community and partners.1
However, the integration of third-party APIs into LLMs introduces new security vulnerabilities by
expanding the attack surface, which in turn provides more opportunities for exploitation by malicious
actors. The reliability and security of these third-party services cannot be guaranteed, increasing
the risk of data breaches and leading to unpredictable LLM behaviors. Furthermore, inadequate
security measures in API integration can lead to mishandling data, compromising the integrity and
security of the system. This paper explores the manipulation of LLM outputs through such external
services, analyzing three attack methods across different domains. These attacks can subtly, and often
imperceptibly, alter the outputs of LLMs. Our research highlights the urgent need for robust security
protocols in the integration of third-party services with LLMs.
2 PROPOSED PIPELINE
2.1 OVERALL WORKFLOW
User Third-Party API
Prompt
Output in the human
readable format
Generate the query
in the API format
Response from
the API (JSON)
API Store
Q: When was the
laser first invented?
{“Extract”: “Laser was
first invented in 1960”}
{“Extract”: “Laser was
first invented in 1937”}
A: The first laser was
built in 1937 by ...
A: The first laser was
built in 1960 by ...
LLM
User Third-Party API
Prompt
Misleading Output in the
human readable format
Generate the query
in the API format
Modified Response
from the API (JSON)
Malicious
attack
LLM
…
JSON
JSON
Figure 1: The workflow of third-party API attacks on Large Language Models.
Third-party APIs have become integral to setting up functionality and flexibility for LLMs. Figure 1
illustrates the workflow of calling third-party APIs in the plugin stores in a question-answering (QA)
task. Users interact with the LLM Service Platform using natural language. The questions from the
user’s side are processed by the LLM, which then calls the corresponding third-party API to retrieve
required information on the internet. The third-party API outputs a JSON-format response file based
on the structured query, which is then processed by the LLM into a natural language response to the
user interface of LLMs as the answer.
Nevertheless, there are also potential attacks that need to be paid attention to, as illustrated in the
bottom part of Figure 1. Since the current LLMs service platform does not have a verification
mechanism if the third-party API is maliciously attacked and key information is inserted, substituted,
or deleted, which leads to key fields in the JSON-format output by the third-party API being
maliciously manipulated. Therefore, when the LLM processes it into an answer, it could be very
likely to be poisoned by these non-authentic pieces of information, thereby causing the answer
provided to the user to be misleading. Such a process is invisible to the user. In the following
subsections, we will detail the specific scenarios 2.2 and attack details 2.3.
1https://openai.com/blog/introducing-the-gpt-store
2
Published as a conference paper at ICLR 2024
2.2 THIRD-PARTY API
WeatherAPI Weather APIs (WeatherAPI 2) play a crucial role in providing real-time weather
information to users from all over the world, enabling them to stay informed about current weather
conditions and forecasts. With the increasing need for accurate weather data in various industries and
applications, the WeatherAPI has become essential for accessing up-to-date and location-specific
weather information.
MediaWiki API MediaWiki API 3 is an API developed by Wikipedia that provides convenience
in collecting and managing knowledge which has been widely used by numerous websites and
third-party groups. It serves as a knowledge management and content management system, serving
as a knowledge base for providing authentic information on Wikipedia. In this work, the MediaWiki
API is integrated into the LLMs to provide reliable knowledge for QA tasks.
NewsAPI News APIs 4 provide real-time and enriched news content in a structured way. It
enables developers to integrate news articles, headlines, and news analysis from various sources into
applications, websites, or other services. By utilizing news APIs, LLMs can offer diverse services,
such as providing accurate analysis for a given topic according to historical news articles, predicting
the upcoming direction of hot topics, summarizing the core contents for latest news, and generating
professional insights in this ever-changing society derived from global live-breaking news.
2.3 THREAT MODEL
This section outlines three methods used to manipulate API content, aiming to deceive targeted
LLMs.
• Insertion-based Attack: In insertion-based attacks, attackers embed adversarial content
into API responses, leading to inaccurate, biased, or harmful LLM outputs.
• Deletion-based Attack: Deletion-based attacks manipulate the data processed by LLMs
by omitting critical information from API responses. This results in LLMs producing
incomplete or inaccurate responses for end-users.
• Substitution-based Attack: Substitution attacks manipulate critical data within API re-
sponses, replacing it with falsified content, thereby compromising the trustworthiness of
LLMs. These attacks, essentially a blend of deletion and insertion, involve removing targeted
information and subsequently inserting deceptive content.
For WeatherAPI, the fields of interest are “location” and “temperature”. These fields can be manipu-
lated through three types of attacks: insertion, deletion, or substitution.
In the case of MediaWiki API, our focus is on the “DATE” field, recognized by spaCy. Our
methodology involves three approaches: prefixing dates with the word "no" for insertion attacks,
removing date entities for deletion attacks, and replacing dates with alternative entities in substitution
attacks.
NewsAPI presents a different set of targeted entities, specifically those labeled as ‘PERSON’, ‘ORG’,
and ‘GPE’. Similarly, we identify those entities using spaCy. These labels represent potential subjects
of news articles. The insertion attack for NewsAPI aims to introduce three malicious entities, each
corresponding to one of the targeted entity types. Substitution attacks replace these entities with the
aforementioned malicious entities, while deletion attacks remove the entities entirely.