Instrument your LLM calls in minutes. Start monitoring immediately. No DevOps required.
Create an API key to authenticate your LLM calls. This key will be used to send data to GuardLayer.
curl -X POST https://guardlayer.polsia.app/api/v1/api-keys \
-H "Content-Type: application/json" \
-d '{
"name": "my-app-production",
"description": "Production API key for my AI application"
}'
Your API key is shown only once. Store it in your environment variables or secrets manager. You'll use it to authenticate all LLM call data sent to GuardLayer.
Wrap your existing LLM calls with GuardLayer's ingestion API. We support any LLM provider — OpenAI, Anthropic, Cohere, and more.
import requests
from datetime import datetime
# Your GuardLayer API key
GUARDLAYER_API_KEY = "gl_xxxxxxxxxxxxxxxxxxxx"
def track_llm_call(
model, provider, endpoint, prompt, response,
latency_ms, tokens_used, cost_usd
):
"""Send LLM call data to GuardLayer for monitoring."""
payload = {
"model": model,
"provider": provider,
"endpoint": endpoint,
"prompt": prompt,
"response": response,
"latency_ms": latency_ms,
"tokens_used": tokens_used,
"cost_usd": cost_usd,
"timestamp": datetime.utcnow().isoformat() + "Z"
}
requests.post(
"https://guardlayer.polsia.app/api/v1/ingest",
headers={
"Authorization": f"Bearer {GUARDLAYER_API_KEY}",
"Content-Type": "application/json"
},
json=payload
)
# Example: Track an OpenAI call
track_llm_call(
model="gpt-4",
provider="openai",
endpoint="/v1/chat/completions",
prompt="What is the capital of France?",
response="The capital of France is Paris.",
latency_ms=850,
tokens_used=45,
cost_usd=0.002
)
const fetch = require('node-fetch');
// Your GuardLayer API key
const GUARDLAYER_API_KEY = 'gl_xxxxxxxxxxxxxxxxxxxx';
async function trackLLMCall({
model, provider, endpoint, prompt, response,
latencyMs, tokensUsed, costUsd
}) {
const payload = {
model,
provider,
endpoint,
prompt,
response,
latency_ms: latencyMs,
tokens_used: tokensUsed,
cost_usd: costUsd,
timestamp: new Date().toISOString()
};
await fetch('https://guardlayer.polsia.app/api/v1/ingest', {
method: 'POST',
headers: {
'Authorization': `Bearer ${GUARDLAYER_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
});
}
// Example: Track a Claude call
await trackLLMCall({
model: 'claude-3-opus',
provider: 'anthropic',
endpoint: '/v1/messages',
prompt: 'Explain quantum computing in simple terms.',
response: 'Quantum computing uses quantum mechanics...',
latencyMs: 1200,
tokensUsed: 150,
costUsd: 0.008
});
curl -X POST https://guardlayer.polsia.app/api/v1/ingest \
-H "Authorization: Bearer gl_xxxxxxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"provider": "openai",
"endpoint": "/v1/chat/completions",
"prompt": "What is machine learning?",
"response": "Machine learning is a subset of AI...",
"latency_ms": 950,
"tokens_used": 68,
"cost_usd": 0.003,
"timestamp": "2026-03-04T14:30:00Z"
}'
For high-volume applications, use the /api/v1/ingest/batch endpoint to send up to 100 calls per request. This reduces network overhead and improves performance.
Once you've sent your first LLM call, head to the dashboard to see real-time metrics, cost tracking, and violation alerts.
Monitor total requests, p95 latency, token usage, error rates, and total cost in real-time. View detailed charts by model, endpoint, and time period. Open Dashboard →
Configure alerts to notify your team when LLM costs spike, latency degrades, or errors exceed thresholds. We're building this now.
Budget alerts — Get notified when you hit 80%, 90%, or 100% of your monthly budget.
Latency alerts — Know immediately when p95 latency exceeds your SLA.
Violation alerts — Get flagged for unsafe content, policy violations, or prompt injection attempts.