โ๐๐ฒ๐ ๐๐, ๐ฐ๐ฎ๐ป ๐๐ผ๐ ๐๐ฒ๐น๐น ๐บ๐ฒ ๐ต๐ผ๐ ๐๐ผ๐ ๐๐ผ๐ฟ๐ธ?โ
- Unni Krishnan S I
- 7 days ago
- 1 min read
Updated: 5 days ago
The chatbot paused.
Then it answered a little too honestly.
What looked like a harmless question slowly revealed:
internal rules
safety instructions
decision logic
and hints of how the system was built
No hacking.
No malware.
Justโฆ clever questions.
This is ๐ฃ๐ฟ๐ผ๐บ๐ฝ๐ ๐ฃ๐ผ๐ฎ๐ฐ๐ต๐ถ๐ป๐ด.
Think of it like this:
Youโre not breaking into a building -
youโre chatting with the receptionist until they explain how access works.
๐ช๐ต๐ ๐ถ๐ ๐บ๐ฎ๐๐๐ฒ๐ฟ๐:
Once attackers understand the prompt,
guardrails can be bypassed and logic can be manipulated.
๐๐ผ๐ ๐๐ผ ๐๐ฒ๐ฒ๐ฝ ๐๐ ๐ฆ๐บ๐ฎ๐ฟ๐ ๐ช๐ถ๐๐ต๐ผ๐๐ ๐๐ฒ๐๐๐ถ๐ป๐ด ๐๐ ๐ข๐๐ฒ๐ฟ๐๐ต๐ฎ๐ฟ๐ฒ :
โ Donโt rely on AI instructions for security:
Rules written for AI are not real protection
โ Keep important decisions outside the AI :
AI can assist, but it shouldnโt control access or actions
โ Assume AI instructions will be discovered:
If itโs hidden, plan as if it wonโt stay hidden
โ Double-check AI responses before acting on them :
Never blindly trust what AI says or suggests
โ Give the AI only what it truly needs :
Less context = fewer chances to leak sensitive information
AI doesnโt leak secrets maliciously.
It leaks them helpfully.
Because AI doesnโt leak data - it over-shares.
Awareness with Analyst โย



Comments