VLLM

Pass-through endpoints for VLLM - call provider-specific endpoint, in native format (no translation).

Feature	Supported	Notes
Cost Tracking	❌	Not supported
Logging	✅	works across all integrations
End-user Tracking	❌	Tell us if you need this
Streaming	✅

Just replace https://my-vllm-server.com with LITELLM_PROXY_BASE_URL/vllm 🚀

Example Usage

curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \

Supports ALL VLLM Endpoints (including streaming).

Quick Start

Let's call the VLLM /metrics endpoint

Add HOSTED VLLM API BASE to your environment

export HOSTED_VLLM_API_BASE="https://my-vllm-server.com"

Start LiteLLM Proxy

litellm

# RUNNING on http://0.0.0.0:4000

Test it!

Let's call the VLLM /metrics endpoint

curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \

Examples

Anything after http://0.0.0.0:4000/vllm is treated as a provider-specific route, and handled accordingly.

Key Changes:

Original Endpoint	Replace With
`https://my-vllm-server.com`	`http://0.0.0.0:4000/vllm` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000")
`bearer $VLLM_API_KEY`	`bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy)

Example 1: Metrics endpoint

LiteLLM Proxy Call

curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \

Direct VLLM API Call

curl -L -X GET 'https://my-vllm-server.com/metrics' \
-H 'Content-Type: application/json' \

Example 2: Chat API

LiteLLM Proxy Call

curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "I am going to Paris, what should I see?"
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.8,
    "top_p": 0.1,
    "model": "qwen2.5-7b-instruct",
}'

Direct VLLM API Call

curl -L -X POST 'https://my-vllm-server.com/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
    "messages": [
        {
            "role": "user",
            "content": "I am going to Paris, what should I see?"
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.8,
    "top_p": 0.1,
    "model": "qwen2.5-7b-instruct",
}'

Advanced - Use with Virtual Keys

Pre-requisites

Setup proxy with DB

Use this, to avoid giving developers the raw Cohere API key, but still letting them use Cohere endpoints.

Usage

Setup environment

export DATABASE_URL=""
export LITELLM_MASTER_KEY=""
export HOSTED_VLLM_API_BASE=""

litellm

# RUNNING on http://0.0.0.0:4000

Generate virtual key

curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'

Expected Response

{
    ...
    "key": "sk-1234ewknldferwedojwojw"
}

Test it!

curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
  --data '{
    "messages": [
        {
            "role": "user",
            "content": "I am going to Paris, what should I see?"
        }
    ],
    "max_tokens": 2048,
    "temperature": 0.8,
    "top_p": 0.1,
    "model": "qwen2.5-7b-instruct",
}'

VLLM

Example Usage​

Quick Start​

Examples​

Example 1: Metrics endpoint​

LiteLLM Proxy Call​

Direct VLLM API Call​

Example 2: Chat API​

LiteLLM Proxy Call​

Direct VLLM API Call​

Advanced - Use with Virtual Keys​

Usage​

Example Usage

Quick Start

Examples

Example 1: Metrics endpoint

LiteLLM Proxy Call

Direct VLLM API Call

Example 2: Chat API

LiteLLM Proxy Call

Direct VLLM API Call

Advanced - Use with Virtual Keys

Usage