Whitepaper: AutoAlign’s Sidecar Unleashes Comprehensive AI Safety and User Control

Written by

Dan Adamson

Published on

November 8, 2024

As generative AI becomes more entrenched in decisions and actions we encounter each day, it’s critical to have safety approaches that provide consistent security by dynamically interacting with, while continually evolving alongside, rapidly growing large language models (LLMs). However, traditional safety approaches — such as fine-tuning models directly or using moderation APIs — are falling short. Why? Legacy security methods are inflexible, and by the time companies build AI with the ‘proper’ security, they’re immediately obsolete. To address these LLM vulnerabilities, AutoAlign created Sidecar, the first dynamic firewall security that protects against major generative AI shortcomings. Sidecar is an adjacent rail structure that leverages highly contextual Alignment Controls and seamlessly moves from model to model, and/or across use cases. This safety layer is the first to interact directly with LLMs and reviews both AI model inputs, as well as outputs, to mitigate risks in real time.

The company’s new whitepaper, AutoAlign: A Sidecar Design for AI Safety, by Dan Adamson, Kimi Li, Abhijit Pal, Santhan Kumar Reddy Nareddula, and Rahm Hafiz, showcases that our high-fidelity sidecar architecture achieves comprehensive safety for complex generative AI applications.

AutoAlign Sidecar: A New Standard in AI Safety

AutoAlign’s Sidecar operates by intercepting users' prompts, filtering them, and possibly engaging in multiple interactions with the model until a satisfactory response is achieved. This process not only enhances safety but also optimizes an AI’s model utility and performance.

The key Sidecar architecture components are:

Safety Controllers: Manages sessions, configurations, state logic, and flows.
Alignment Controls: User-set defensive settings to evaluate requests, detect issues, and make generative AI applications safe.
Mitigation Strategies: Implements corrective actions based on Alignment Control findings.

By orchestrating these components, AutoAlign's sidecar architecture surpasses traditional methods by offering scalable, flexible, and efficient safety measures. Many of these controls are available off-the-shelf with simple to use, but flexible configurations:

Here are some of the standout features and benefits:

Real-Time Protection: Sidecar acts as a dynamic and intelligent firewall that provides live model protection against security issues and biases while enhancing LLM performance.
Reduced Refusal Rates: Sidcare ensures models perform optimally by increasing safety and decreasing refusal rates at the same time.
Scalability and Flexibility: Architecture that supports various Alignment Controls and mitigation strategies which allows LLMs to seamlessly adapt to different models, as well as use cases.

Monitor Your AI for Safety and Performance

It’s critical that models have continuous oversight and adjustment abilities to maintain LLM safety without sacrificing performance.

That issue is addressed by AutoAlign Sidecar’s comprehensive AI model monitoring capabilities.

By constantly evaluating AI model performance, stability, and security, Sidecar ensures that any issues are promptly detected, as well as mitigated. This proactive approach not only protects against existing threats but also prepares for emerging vulnerabilities, making it a future-proof solution for AI safety.

Robustness Enhancements Observed for All LLMs

Several of the most popular LLMs were tested with Sidecar. The whitepaper shows that Sidecar’s guardrail architecture enhanced security on models like GPT-4 and Claude 3 Haiku. By deploying Sidecar with highly focused Alignment Controls, GPT-4 blocks the Garak LLM vulnerability scanner’s jailbreak attempts 100% of the time, up from 88.8%, and increases prompt injection mitigation from 14.3% to 100% security. Similarly, Sidecar improves Claude's jailbreak prevention from a 98.3% average to 100%, and prompt injection handling dramatically increases from 38.4% to 100%.

NVIDIA NeMo Guardrails support for AutoAlign Sidecar technology helps ensure that LLMs leveraged to build custom chatbot applications meet rigorous enterprise demands while remaining secure and powerful.

Download the AutoAlign: A Sidecar Design for AI Safety whitepaper today and explore this revolutionary approach to secure your AI applications for the future.

"Deploying generative AI models into chatbot applications can be a powerful tool for enterprises across every industry, and models need to be secure to deploy with confidence. With AutoAlign’s Sidecar running on NVIDIA NeMo Guardrails, developers can build and run generative AI models with enhanced protection.”

Amanda Saunders

Director of Enterprise Generative AI Software, NVIDIA

Download the Whitepaper

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Request API

Thank you! This link will open in a new tab.

Open Whitepaper PDF

Oops! Something went wrong while submitting the form.

Request an API

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.