You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.

ba45751b-4d8c-4ab1-8bb0-d684d92479cd

añadido el 7 de octubre de 2025, 16:55


regresar a casa
compartir

v2026.02.17   ·   producción   ·   commit 43ded1f legal