Using NGINX as an AI Proxy

by

in

Over the last few years, Artificial Intelligence (AI) has taken over the world by storm. The rapid growth of AI models and services has created a complex landscape where organizations are starting to combine multiple Large Language Model (LLM) providers and manage different model endpoints with varying API specifications to build their AI-powered applications. We are now witnessing the rise of AI gateways and LLM routers – specialized infrastructure components that sit between applications and AI models, orchestrating and securing the flow of AI requests.

What is an AI Proxy?

AI proxies are a simpler implementation of AI gateways, focusing on AI traffic control, model transformation, authentication & authorization, model failover mechanisms, and AI model usage logging. A fully featured AI gateway would seamlessly integrate all these features with native AI security guardrails to protect against threats specific to LLMs like prompt injection or data exfiltration attacks.

At the core of an AI proxy is traffic control, which forms the foundation of any AI traffic management tool and implements authentication and authorization mechanisms alongside rate limiting. This helps prevent model abuse and ensures fair resource allocation.

Similarly, AI model API translation is also a very valuable feature. Model translation enables seamless integration across diverse AI models and providers by providing a unified API entrypoint into different AI models, abstracting complexity from client applications. Another important capability is ensuring reliability of prompt requests by providing high availability through failover systems that gracefully handle scenarios when models become unavailable or rate limited.

Traffic observability is also mission-critical and offers comprehensive monitoring and logging to maintain operational visibility across the AI pipeline. Traffic audits of request and response payloads go hand in hand with observability and are more important than ever in the world of AI for compliance, debugging, and model performance analysis.

Configuring NGINX as an OpenAI and Anthropic AI Proxy

All the following use case examples build on each other, with the code snippets provided for guidance and with no guarantees that they will work in your environment. If you actively want to test out a functional proof of concept (PoC) NGINX AI proxy deployment, we recommend you first deploy the PoC detailed in “See an NGINX AI Proxy Working for Yourself” section below or the NGINX demos AI Proxy GitHub repo and then come back to this section. The examples provided below are based on the code contained in the PoC.

Use Case 1: AI Model/LLM Routing and Model Transformation

In the absence of a unified standard API for interacting with different LLMs, routing to distinctly different models requires transforming both requests and responses as they pass through NGINX. This ensures compatibility between the incoming requests and the specific requirements of each backend LLM. For this example, we’ll assume that incoming API requests to NGINX follow the OpenAI chat completion API specification, given its widespread adoption.

To transform incoming OpenAI-compatible requests, we will start by implementing an NJS transformation script that converts these requests into the format required by Anthropic messages API when routing to the Anthropic backend, while also transforming Anthropic responses back to the OpenAI format. Requests routed to OpenAI endpoints will pass through unchanged.

First things first, you will need to create an NJS file with scripts to handle the model transformations. These functions transform both requests and responses from Anthropic to an OpenAI compatible format, as well as determine if the requests need to be transformed based on the endpoint being queried:

aiproxy.js
// Convert an OpenAI compatible request to Anthropic's request format
function transformAnthropicRequest(requestBody) {
    // Anthropic requires max_tokens, but our API may not always specify it -> fallback to defaults if not provided
    let maxTokens = requestBody.max_completion_tokens || requestBody.max_tokens || 512;

    const anthropicRequest = {
        model: requestBody.model,
        max_tokens: maxTokens,
        stream: requestBody.stream || false,
        temperature: requestBody.temperature || 1.0,
        top_p: requestBody.top_p
    };

    // Scale Anthropic temperature based on its acceptable range (0-1) vs OpenAI (0-2)
    if (anthropicRequest.temperature > 1.0) {
        anthropicRequest.temperature = requestBody.temperature / 2.0;
    }

    // Convert stop sequences to Anthropic's format
    if (requestBody.stop) {
        anthropicRequest.stop_sequences = Array.isArray(requestBody.stop) ? requestBody.stop : [requestBody.stop];
    }

    // Separate system messages from user/assistant messages
    const systemMessages = [];
    const messages = [];

    for (let i = 0; i < requestBody.messages.length; i++) {
        const msg = requestBody.messages[i];
        if (msg.role === "system") {
            systemMessages.push({text: msg.content, type: "text"});
        } else {
            messages.push({role: msg.role, content: msg.content});
        }
    }

    // Attach system messages if present
    if (systemMessages.length > 0) {
        anthropicRequest.system = systemMessages;
    }
    anthropicRequest.messages = messages;

    return anthropicRequest;
}

// Convert an Anthropic response to an OpenAI response format
function transformAnthropicResponse(anthropicResponse) {
    const response = JSON.parse(anthropicResponse);

    // Handle error responses from Anthropic
    if (response.error) {
        return {
            error: {
                type: response.error.type,
                message: response.error.message,
                code: response.error.code
            }
        };
    }

    // Map Anthropic's successful response to OpenAI's expected structure
    const openaiResponse = {
        id: response.id,
        object: "chat.completion", // Standardize object type
        model: response.model,
        choices: [],
        usage: {
            prompt_tokens: response.usage.input_tokens,
            completion_tokens: response.usage.output_tokens,
            total_tokens: response.usage.input_tokens + response.usage.output_tokens
        }
    };

    // Convert content to choices format
    for (let i = 0; i < response.content.length; i++) {
        const content = response.content[i];
        openaiResponse.choices.push({
            index: i,
            finish_reason: response.stop_reason,
            message: {
                role: response.role,
                content: content.text
            }
        });
    }

    return openaiResponse;
}

// Attempts to call the specified model provider (Anthropic or OpenAI)
// Transforms the request as needed and issues a subrequest to the provider's location
async function tryModel(r, modelConfig, requestBody) {
    const location = modelConfig.location;
    let subrequestBody;

    // Transform request body for Anthropic, or pass through for OpenAI
    if (modelConfig.provider === "anthropic") {
        const transformedRequest = transformAnthropicRequest(requestBody);
        subrequestBody = JSON.stringify(transformedRequest);
    } else if (modelConfig.provider === "openai") {
        // For OpenAI, pass the request as-is (no transformation needed)
        subrequestBody = JSON.stringify(requestBody);
    } else {
        throw new Error(`Provider '${modelConfig.provider}' not supported`);
    }

    // Issue subrequest to the model provider
    return await r.subrequest(location, {
        method: 'POST',
        body: subrequestBody
    });
}

// Returns the response body in the correct format for the client
// Transforms Anthropic responses to OpenAI format, passes OpenAI through
function getResponseBody(modelConfig, serviceReply) {
    if (modelConfig.provider === "anthropic") {
        const transformedResponse = transformAnthropicResponse(serviceReply.responseText);
        return JSON.stringify(transformedResponse);
    } else {
        return serviceReply.responseText; // Pass through as-is for OpenAI
    }
}

You will then need to define your AI model endpoints and import the NJS scripts into your NGINX config. An example config is provided below, but do note you will need to define your OpenAI and Anthropic model API keys:

aiproxy.conf
# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

Use Case 2: Access Control

Access control is mission critical for most applications, but it is even more important when dealing with access to different AI models. Some models might be approved for usage with internal data, while others might prove more useful for day-to-day usage but should not interact with internal data. Similarly, some models might have a higher cost and thus might warrant tighter access control conditioned to explicit approval.

To enable access control to the backend LLM models through NGINX, start by creating a JSON file containing your access control configuration/user identifiers and which models they have access to:

rbac.json
{
    "users": {
        "user-a": {
            "models": [
                {
                    "name": "gpt-5",
                },
                {
                    "name": "claude-sonnet-4-20250514"
                }
            ]
        },
        "user-b": {
            "models": [
                {
                    "name": "gpt-5"
                }
            ]
        }
    },
    "models": {
        "gpt-5": {
            "provider": "openai",
            "location": "/openai"
        },
        "claude-sonnet-4-20250514": {
            "provider": "anthropic",
            "location": "/anthropic"
        }
    }
}

You will then need to create an NJS function within your NJS script to load the JSON data into a variable:

aiproxy.js
...
// Loads RBAC configuration from a JSON file and sets it to an NGINX variable
function load_rbac() {
    try {
        // Adjust the path as needed
        let config = fs.readFileSync('/etc/nginx/rbac.json', 'utf8');
        return config;
    } catch (e) {
        return JSON.stringify({
            error: "Failed to load RBAC: " + e.message
        });
    }
}
...

And finally, you will need to import the NJS function into your NGINX config as such:

aiproxy.conf
# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

# Declare variable to hold RBAC configuration
js_var $ai_proxy_config "";

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

Based on our JSON file, only User A will have access to both OpenAI and Anthropic, whilst User B will be limited to OpenAI. To test it, try querying using different users. Querying using User A should work for both models, but querying with User B should only work when querying Anthropic:

curl commands
curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Success

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}]}'

// Failure

Note: You could also define your access control configuration/user identifiers as a variable within NGINX by using the `set` directive. We are opting to load in a JSON file since this is more representative of how this data might be available in a real-world scenario.

Use Case 3: Model Failover and Backup

To improve the availability of your AI applications, if for some reason a model is unavailable, whether due to having used all available tokens, the API key being midst rotation, or the model having gone down (among other reasons), incoming requests targeting the unavailable model should be redirected to a backup model that is still available. This can be implemented in various ways, but for this blog post, we will assume that the model failover mechanism is tied to access control. This is a common pattern since it allows to tie the failover mechanism to the models a user can access.

We will start by expanding the JSON file created in the previous use case to include failover data:

rbac.json
{
    "users": {
        "user-a": {
            "models": [
                {
                    "name": "gpt-5",
                    "failover": "claude-sonnet-4-20250514"
                },
                {
                    "name": "claude-sonnet-4-20250514"
                }
            ]
        },
        "user-b": {
            "models": [
                {
                    "name": "gpt-5"
                }
            ]
        }
    },
    "models": {
        "gpt-5": {
            "provider": "openai",
            "location": "/openai"
        },
        "claude-sonnet-4-20250514": {
            "provider": "anthropic",
            "location": "/anthropic"
        }
    }
}

We will then need to create an NJS script that checks if the requested model is available, and if not, redirects the user request to the backup model:

aiproxy.js
...
// Main routing function for the AI proxy
// Handles user authentication, model selection, failover, and response transformation
async function route(r) {
    try {
        // Parse the AI proxy configuration from NGINX variable
        const configStr = r.variables.ai_proxy_config;
        if (!configStr) {
            r.return(500, JSON.stringify({
                error: {
                    message: "AI proxy configuration was not found"
                }
            }));
            return;
        }

        // Parse the configuration JSON
        let config;
        try {
            config = JSON.parse(configStr);
        } catch (e) {
            r.return(500, JSON.stringify({
                error: {
                    message: "Invalid AI proxy configuration JSON"
                }
            }));
            return;
        }

        // Extract the user from NGINX variable (set by header)
        const user = r.variables.aiproxy_user;
        if (!user) {
            r.return(401, JSON.stringify({
                error: {
                    message: "User not specified"
                }
            }));
            return;
        }

        // Check if user exists in configuration
        if (!config.users || !config.users[user]) {
            r.return(403, JSON.stringify({
                error: {
                    message: "User not authorized"
                }
            }));
            return;
        }

        // Check the JSON validity of the AI proxy request body
        let requestBody;
        try {
            requestBody = JSON.parse(r.requestText);
        } catch (e) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Invalid JSON in request body"
                }
            }));
            return;
        }

        // Extract the model from the request
        const requestedModel = requestBody.model;
        if (!requestedModel) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Model not specified in request"
                }
            }));
            return;
        }

        // Check if the requested model is available to the user
        const userModels = config.users[user].models;
        const userModel = userModels.find(m => m.name === requestedModel);

        if (!userModel) {
            r.return(404, JSON.stringify({
                error: {
                    message: `The model '${requestedModel}' was not found or is not accessible to this user`
                }
            }));
            return;
        }

        // Get the model configuration from the global config
        const modelConfig = config.models[requestedModel];
        if (!modelConfig) {
            r.return(500, JSON.stringify({
                error: {
                    message: `Model '${requestedModel}' configuration not found`
                }
            }));
            return;
        }

        // Try primary model first
        let serviceReply = await tryModel(r, modelConfig, requestBody);
        let usedModelConfig = modelConfig;

        // If primary model failed (status code is not 200) and failover is configured, try failover
        if (serviceReply.status !== 200 && userModel.failover) {
            r.log(`Primary model '${requestedModel}' failed with status ${serviceReply.status}, trying failover model '${userModel.failover}'`);

            // Get failover model configuration
            const failoverModelConfig = config.models[userModel.failover];
            if (!failoverModelConfig) {
                r.error(`Failover model '${userModel.failover}' configuration not found`);
                // Return the original error since failover is misconfigured
                let responseBody = getResponseBody(modelConfig, serviceReply);
                r.return(serviceReply.status, responseBody);
                return;
            }

            // Update the request body to use the failover model
            const failoverRequestBody = Object.assign({}, requestBody, {model: userModel.failover});

            // Try the failover model
            serviceReply = await tryModel(r, failoverModelConfig, failoverRequestBody);
            usedModelConfig = failoverModelConfig;
        }

        // Transform and return response body based on provider that was actually used
        let responseBody = getResponseBody(usedModelConfig, serviceReply);
        r.return(serviceReply.status, responseBody);

    } catch (e) {
        r.log(`Error: ${e.toString()}`);
        r.return(500, JSON.stringify({
            error: {
                message: "Internal server error",
            }
        }));
    }
}
...

Based on this config, if user A tries to use OpenAI and OpenAI is unavailable, NGINX will redirect the request to Anthropic. User B only has access to OpenAI, so if OpenAI is unavailable, requests will fail altogether.

To test it, you will need to change the OpenAI API key in your NGINX config to ensure it’s no longer valid. Once that’s done, try running:

curl commands
curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Response comes from Anthropic

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-b' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

// Failure

Use Case 4: Token Usage Logging

The last use case we will be covering in this blog post is token usage logging. Most readily available LLMs run on a token system, with different requests consuming different amounts of tokens. Billing is usually tied to these tokens, so it’s important to keep track of current token usage to ensure token usage does not get out of hand.

To log token usage within NGINX, we will add a few lines of NJS code to extract the token usage information from the LLM response body and save it to an NJS variable. To do this, add the following code to the route JS function defined in the previous step:

aiproxy.js
...
// Main routing function for the AI proxy
// Handles user authentication, model selection, failover, and response transformation
async function route(r) {
    try {
        // Parse the AI proxy configuration from NGINX variable
        const configStr = r.variables.ai_proxy_config;
        if (!configStr) {
            r.return(500, JSON.stringify({
                error: {
                    message: "AI proxy configuration was not found"
                }
            }));
            return;
        }

        // Parse the configuration JSON
        let config;
        try {
            config = JSON.parse(configStr);
        } catch (e) {
            r.return(500, JSON.stringify({
                error: {
                    message: "Invalid AI proxy configuration JSON"
                }
            }));
            return;
        }

        // Extract the user from NGINX variable (set by header)
        const user = r.variables.aiproxy_user;
        if (!user) {
            r.return(401, JSON.stringify({
                error: {
                    message: "User not specified"
                }
            }));
            return;
        }

        // Check if user exists in configuration
        if (!config.users || !config.users[user]) {
            r.return(403, JSON.stringify({
                error: {
                    message: "User not authorized"
                }
            }));
            return;
        }

        // Check the JSON validity of the AI proxy request body
        let requestBody;
        try {
            requestBody = JSON.parse(r.requestText);
        } catch (e) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Invalid JSON in request body"
                }
            }));
            return;
        }

        // Extract the model from the request
        const requestedModel = requestBody.model;
        if (!requestedModel) {
            r.return(400, JSON.stringify({
                error: {
                    message: "Model not specified in request"
                }
            }));
            return;
        }

        // Check if the requested model is available to the user
        const userModels = config.users[user].models;
        const userModel = userModels.find(m => m.name === requestedModel);

        if (!userModel) {
            r.return(404, JSON.stringify({
                error: {
                    message: `The model '${requestedModel}' was not found or is not accessible to this user`
                }
            }));
            return;
        }

        // Get the model configuration from the global config
        const modelConfig = config.models[requestedModel];
        if (!modelConfig) {
            r.return(500, JSON.stringify({
                error: {
                    message: `Model '${requestedModel}' configuration not found`
                }
            }));
            return;
        }

        // Try primary model first
        let serviceReply = await tryModel(r, modelConfig, requestBody);
        let usedModelConfig = modelConfig;

        // If primary model failed (status code is not 200) and failover is configured, try failover
        if (serviceReply.status !== 200 && userModel.failover) {
            r.log(`Primary model '${requestedModel}' failed with status ${serviceReply.status}, trying failover model '${userModel.failover}'`);

            // Get failover model configuration
            const failoverModelConfig = config.models[userModel.failover];
            if (!failoverModelConfig) {
                r.error(`Failover model '${userModel.failover}' configuration not found`);
                // Return the original error since failover is misconfigured
                let responseBody = getResponseBody(modelConfig, serviceReply);
                r.return(serviceReply.status, responseBody);
                return;
            }

            // Update the request body to use the failover model
            const failoverRequestBody = Object.assign({}, requestBody, {model: userModel.failover});

            // Try the failover model
            serviceReply = await tryModel(r, failoverModelConfig, failoverRequestBody);
            usedModelConfig = failoverModelConfig;
        }

        // Transform and return response body based on provider that was actually used
        let responseBody = getResponseBody(usedModelConfig, serviceReply);

        // Extract token usage information from response and set NGINX variables for logging
        if (serviceReply.status === 200) {
            try {
                const parsedResponse = JSON.parse(responseBody);
                if (parsedResponse.usage) {
                    r.variables.ai_proxy_response_prompt_tokens = parsedResponse.usage.prompt_tokens || "";
                    r.variables.ai_proxy_response_completion_tokens = parsedResponse.usage.completion_tokens || "";
                    r.variables.ai_proxy_response_total_tokens = parsedResponse.usage.total_tokens || "";
                }
            } catch (e) {
                r.log(`Warning: Failed to parse response body for token extraction: ${e.toString()}`);
            }
        }

        r.return(serviceReply.status, responseBody);

    } catch (e) {
        r.log(`Error: ${e.toString()}`);
        r.return(500, JSON.stringify({
            error: {
                message: "Internal server error",
            }
        }));
    }
}
...

We will then load those variables within the NGINX config:

aiproxy.conf
# Import custom AI proxy NJS module
js_import /etc/njs/aiproxy.js;

# Declare variable to hold RBAC configuration
js_var $ai_proxy_config "";
# Declare variables for token tracking
js_var $ai_proxy_response_prompt_tokens "";
js_var $ai_proxy_response_completion_tokens "";
js_var $ai_proxy_response_total_tokens "";

resolver 8.8.8.8;

upstream openai {
    zone openai 64k;
    server api.openai.com:443 resolve;
}

upstream anthropic {
    zone anthropic 64k;
    server api.anthropic.com:443 resolve;
}

server {
    listen 4242;
    default_type application/json;
    js_set $ai_proxy_config aiproxy.load_rbac;

    location  /v1/chat/completions {
        set $aiproxy_user $http_x_user;
        js_content aiproxy.route;
    }

    # Internal locations
    # Those locations are not public
    location /openai {
        internal;

        rewrite ^ /v1/chat/completions;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.openai.com";
        proxy_set_header Content-Type "application/json";

        proxy_set_header Authorization 'Bearer ${OPENAI_API_KEY}'; # replace me to set the OpenAI API key

        proxy_method POST;
        proxy_pass https://openai;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.openai.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }

    location /anthropic {
        internal;

        rewrite ^ /v1/messages;
        break;

        proxy_pass_request_headers off;

        proxy_set_header Host "api.anthropic.com";
        proxy_set_header Content-Type "application/json";
        proxy_set_header anthropic-version "2023-06-01"; # required by Anthropic API

        proxy_set_header x-api-key '${ANTHROPIC_API_KEY}'; # replace me to set the Anthropic API key

        proxy_method POST;
        proxy_pass https://anthropic;

        proxy_ssl_verify on;
        proxy_ssl_server_name on;
        proxy_ssl_name "api.anthropic.com";
        proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
}

And finally, modify the NGINX access log to print out those variables anytime NGINX processes a request:

nginx.conf
user nginx;
worker_processes 1;

error_log /var/log/nginx/error.log info;
pid /var/run/nginx.pid;

load_module /usr/lib/nginx/modules/ngx_http_js_module.so;

events {
    worker_connections 1024;
}

http {
    default_type application/octet-stream;

    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'prompt_tokens=$ai_proxy_response_prompt_tokens '
                    'completion_tokens=$ai_proxy_response_completion_tokens '
                    'total_tokens=$ai_proxy_response_total_tokens';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    keepalive_timeout 65;

    include /etc/nginx/aiproxy.conf;
}

To test it, run any of the previous curl commands and check the NGINX access log:

access.log
... 401 ... prompt_tokens= completion_tokens= total_tokens= // Failed request
... 200 ... prompt_tokens=13 completion_tokens=39 total_tokens=52 // Successful request

Note: Extracting current token usage can also be useful in other scenarios, such as rate-limiting access to any given model based on current demand.

See an NGINX AI Proxy Working for Yourself

A comprehensive demo covering all of the above use cases can be found in the NGINX demos AI Proxy GitHub repo. To get it up and running, you will need:

  • An OpenAI API key
  • An Anthropic API key
  • Docker

Once you have all the prerequisites handy, clone the repo by running:

git clone https://github.com/nginx/nginx-demos

Open a terminal session inside the cloned repo and change directory to the nginx-demos/nginx/ai-proxy directory:

cd nginx-demos/nginx/ai-proxy

You will need to run all the following commands from this directory to get this setup working:

  1. Ensure you have downloaded the latest version of the NGINX OSS Docker image:
docker pull nginx:1.29.1

2. Export the OpenAI and Anthropic API keys into variables (note: to test the failover scenario, do not export an OpenAI API key in this step or export an invalid API key):

export OPENAI_API_KEY=<API_KEY>
export ANTHROPIC_API_KEY=<API_KEY>

3. Create a persistent Docker volume for generated key snippets:

docker volume create nginx-keys

4. Launch a new NGINX Docker container using the following command:

docker run -it --rm -p 4242:4242 \
  -v $(pwd)/config:/etc/nginx \
  -v $(pwd)/njs:/etc/njs \
  -v $(pwd)/templates:/etc/nginx-ai-proxy/templates \
  -v nginx-keys:/etc/nginx-ai-proxy/keys \
  -e NGINX_ENVSUBST_TEMPLATE_DIR=/etc/nginx-ai-proxy/templates \
  -e NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx-ai-proxy/keys \
  -e OPENAI_API_KEY \
  -e ANTHROPIC_API_KEY \
  --name nginx-ai-proxy \
  nginx:1.29.1

Finally, to test NGINX as an AI proxy, use the following command to query the OpenAI model as User A:

curl -s -X POST http://localhost:4242/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-User: user-a' \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"Hello"}]}'

If you decided to not export an OpenAI API key and wish to test the failover mechanism, use this command instead:

docker run -it --rm -p 4242:4242 \
  -v $(pwd)/config:/etc/nginx \
  -v $(pwd)/njs:/etc/njs \
  -v $(pwd)/templates:/etc/nginx-ai-proxy/templates \
  -v nginx-keys:/etc/nginx-ai-proxy/keys \
  -e NGINX_ENVSUBST_TEMPLATE_DIR=/etc/nginx-ai-proxy/templates \
  -e NGINX_ENVSUBST_OUTPUT_DIR=/etc/nginx-ai-proxy/keys \
  -e OPENAI_API_KEY=bad \
  -e ANTHROPIC_API_KEY \
  --name nginx-ai-proxy \
  nginx:1.29.1

You can then test the failover mechanism by querying the OpenAI model again as User A. This time the response should come from Anthropic instead!

Note: A more thorough set of example queries and expected responses is provided in the NGINX demos AI Proxy GitHub repo README.

Final Thoughts

This blog post only covers a few AI proxy use case implementations you can achieve using NGINX and NJS, but there are many more! Using NJS opens many doors when it comes to extending NGINX to achieve AI use cases! And whilst the demo uses a Docker container, you could also deploy a similar set up in your K8s cluster! The major significant limitation when using NGINX as an AI proxy is the absence of dedicated AI security guardrails, as these typically require specialized AI security solutions for maximum effectiveness and cannot be implemented through NJS.

For now, we want to hear from you! Are you already using NGINX as an AI proxy? Is NGINX involved in your AI pipelines in any way or shape? Please let us know in the NGINX Community Forum, and who knows, we might just showcase your implementation in the blog!

Community Feedback

Your feedback is invaluable in shaping the future development of NGINX. As always, if you have suggestions, encounter issues, or want to request additional features, please share them through GitHub Issues, and if you want to discuss anything NGINX, come join us in the NGINX Community Forum!

NGINX Community Forum