VLLM
Pass-through endpoints for VLLM - call provider-specific endpoint, in native format (no translation).
Feature | Supported | Notes |
---|---|---|
Cost Tracking | ❌ | Not supported |
Logging | ✅ | works across all integrations |
End-user Tracking | ❌ | Tell us if you need this |
Streaming | ✅ |
Just replace https://my-vllm-server.com
with LITELLM_PROXY_BASE_URL/vllm
🚀
Example Usage
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
Supports ALL VLLM Endpoints (including streaming).
Quick Start
Let's call the VLLM /metrics
endpoint
- Add HOSTED VLLM API BASE to your environment
export HOSTED_VLLM_API_BASE="https://my-vllm-server.com"
- Start LiteLLM Proxy
litellm
# RUNNING on http://0.0.0.0:4000
- Test it!
Let's call the VLLM /metrics
endpoint
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
Examples
Anything after http://0.0.0.0:4000/vllm
is treated as a provider-specific route, and handled accordingly.
Key Changes:
Original Endpoint | Replace With |
---|---|
https://my-vllm-server.com | http://0.0.0.0:4000/vllm (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000") |
bearer $VLLM_API_KEY | bearer anything (use bearer LITELLM_VIRTUAL_KEY if Virtual Keys are setup on proxy) |
Example 1: Metrics endpoint
LiteLLM Proxy Call
curl -L -X GET 'http://0.0.0.0:4000/vllm/metrics' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
Direct VLLM API Call
curl -L -X GET 'https://my-vllm-server.com/metrics' \
-H 'Content-Type: application/json' \
Example 2: Chat API
LiteLLM Proxy Call
curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer $LITELLM_VIRTUAL_KEY' \
-d '{
"messages": [
{
"role": "user",
"content": "I am going to Paris, what should I see?"
}
],
"max_tokens": 2048,
"temperature": 0.8,
"top_p": 0.1,
"model": "qwen2.5-7b-instruct",
}'
Direct VLLM API Call
curl -L -X POST 'https://my-vllm-server.com/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"role": "user",
"content": "I am going to Paris, what should I see?"
}
],
"max_tokens": 2048,
"temperature": 0.8,
"top_p": 0.1,
"model": "qwen2.5-7b-instruct",
}'
Advanced - Use with Virtual Keys
Pre-requisites
Use this, to avoid giving developers the raw Cohere API key, but still letting them use Cohere endpoints.
Usage
- Setup environment
export DATABASE_URL=""
export LITELLM_MASTER_KEY=""
export HOSTED_VLLM_API_BASE=""
litellm
# RUNNING on http://0.0.0.0:4000
- Generate virtual key
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'
Expected Response
{
...
"key": "sk-1234ewknldferwedojwojw"
}
- Test it!
curl -L -X POST 'http://0.0.0.0:4000/vllm/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234ewknldferwedojwojw' \
--data '{
"messages": [
{
"role": "user",
"content": "I am going to Paris, what should I see?"
}
],
"max_tokens": 2048,
"temperature": 0.8,
"top_p": 0.1,
"model": "qwen2.5-7b-instruct",
}'