Adversarial Attack
SecurityAlso known as: adversarial input, attack vector, exploit
Carefully crafted inputs designed to fool AI models into making mistakes or behaving unexpectedly. Can include prompt injection, jailbreaks, or adversarial examples that exploit model weaknesses.
Example:
Adding specific invisible characters to prompts that cause the AI to ignore safety instructions.
Related Terms: