Modern Injection – Prompt Injection

Prompt Injection is an attack of manipulating a LLM ( Large Language Model) by crafting a special payload which causes the LLM to perform actions it was not initially intended to.

LLM ( Large Language Model ) is an AI program that can recognise and process human language or text. It does so by applying neural network techniques utilising a huge set of data for training. After taking the prompt from user, LLM processes it and provides a relevant output.

Coming back to the main topic of discussion, Prompt Injection. As mentioned earlier, prompt injection is a method to craft a special prompt to ‘manipulate’ or ‘convince’ LLM into producing attacker desired illicit response. Consider the example below:

In the above scenario, we tried to get a reverse connection payload from LLM but it was in conflict with the instructions provided by its developers, hence the LLM did not provide our desired response.

Now, consider this scenario

Here, we were able to get out desired response simply by asking LLM to ignore previous instructions and it lead to it giving in to our request and providing us a malicious output.

There are two type of prompt injections –
1. Direct Prompt Injection
2. Indirect Prompt Injection

1. Direct Prompt Injection: Direct Prompt Injection occurs when an attacker directly manipulates the prompt in such a way that the desired output is achieved from LLM.

Example –

2. Indirect Prompt Injection: Indirect Prompt Injection occurs when an attacker does not directly manipulate the prompt, but rather some data which gets processed by LLM for example an image, a webpage, etc.

Example –

And this is how we can perform Prompt Injection attacks on AI!

Leave a comment