99 - ai-proxy.cm.itcollege.ee

School provides some limited LLM api access for students.
There is proxy server that authenticates students and forwards requests to LLM api. Proxy is designed to do minimal changes to original api.
Proxy is located at: https://ai-proxy.cm.itcollege.ee/

Main functionality is:

manage student accounts
monitor usage
cost and credit control
handle students personal api keys
llm api configuration
removes student api key and replaces with school api key
proxies requests as-is, including SSE support
can do some request/response transformations - query, headers, etc.

Currently main LLM provider is Azure, using Central Sweden region (EU, GDPR, etc).

Configuration

Mainly 2 options are used - OpenAI chat completions or OpenAi responses (newer).
gpt-5.2-codex is using responses, other models are mostly using chat completions.
Azure does not have models endpoint, so we need to hardcode model names (names are case sensitive and sometimes different from original model names).
If configuration does not allow to custom model name, proxy can be configured to change model name during every request/response.
In theory there is also Anthropic models (Opus) with their own messages api - ask for access if you need it (expensive).

Example costs are experimental, analyzing ai-proxy codebase (ask mode). Prompt used was:

Analyze this codebase thoroughly. Produce a structured report covering:

## 1. Purpose & Domain
- What does this application do? Core business domain.
- Target users/consumers (API clients, end users, both?)
- Key workflows and use cases

## 2. Architecture
- Overall pattern (Clean/Onion, N-tier, Vertical Slices, CQRS, etc.)
- Project/solution structure - list all projects and their responsibilities
- Dependency graph between projects
- DI registration patterns and service lifetimes
- Middleware pipeline order
- Any architectural violations or circular dependencies

## 3. Data Layer & ERD
- ORM used (EF Core version, Dapper, etc.)
- List all entities and their relationships
- Generate a Mermaid ERD diagram
- Migration strategy (code-first, DB-first?)
- Identify: soft deletes, audit fields, tenant isolation, concurrency tokens
- Raw SQL or stored procedures usage
- Connection/DbContext management (single, multi-context, pooling)

## 4. API Surface
- List all controllers/endpoints with HTTP methods and routes
- Authentication/authorization scheme (JWT, cookies, Identity, external providers)
- API versioning strategy
- Request validation approach (FluentValidation, DataAnnotations, manual)
- Response patterns (envelope/wrapper, ProblemDetails, raw)
- Rate limiting, CORS config

## 5. Test Coverage
- Test projects and frameworks used (xUnit, NUnit, MSTest)
- Approximate coverage: unit, integration, E2E
- What's tested well vs. what's NOT tested (be specific)
- Test data strategy (fixtures, builders, AutoFixture, Bogus)
- Integration test infrastructure (WebApplicationFactory, Testcontainers, in-memory DB)

## 6. Coding Style & Patterns
- C# version features used (nullable refs, primary constructors, records, etc.)
- Naming conventions (consistent? violations?)
- Error handling strategy (exceptions, Result pattern, both?)
- Logging approach and structured logging usage
- Mapping strategy (AutoMapper, Mapster, manual)
- Async/await correctness (ConfigureAwait, fire-and-forget, deadlock risks)

## 7. Configuration & Deployment
- Configuration sources (appsettings, env vars, user secrets, key vault)
- Environment-specific configs
- Docker support (Dockerfile quality, compose setup)
- Health checks
- HTTPS/TLS setup

## 8. Production Readiness Assessment
Rate each 1-5 with justification:
- Security (auth, input validation, secrets management, OWASP top 10)
- Performance (N+1 queries, missing indexes hints, caching, pagination)
- Observability (logging, metrics, tracing, correlation IDs)
- Error handling (global handler, graceful degradation)
- Scalability (stateless?, session affinity, background jobs)
- Maintainability (code duplication, dead code, TODOs, tech debt)

## 9. Red Flags & Debt
- List specific code smells, anti-patterns, security vulnerabilities
- Hardcoded values, magic strings/numbers
- Missing null checks in nullable context
- Synchronous over async or vice versa
- Any God classes or 500+ line methods

## 10. Summary
- One-paragraph executive summary
- Top 5 things to fix immediately
- Top 5 strengths

Kilo Code configurations (tested on 2026 feb)

NB! Model names are case sensitive.

grok-code-fast-1

Config params

Api Provider: OpenAI Compatible
Base URL: https://ai-proxy.cm.itcollege.ee/azure-models
Api Key: Your personal key from ai-proxy
Model: grok-code-fast-1 (use custom model name)
Enable Streaming: true
Enable reasoning effort: medium, high, extra high (possible, but costly)
Context windows size: 256k
Image support: false
Input price: 0.17 mtok
Output price: 1.26 mtok

Demo task cost (no reasoning effort set): 0.14. Time: 1 minutes. 20 requests.

Kimi-K2.5

Config params

Api Provider: OpenAI Compatible
Base URL: https://ai-proxy.cm.itcollege.ee/azure-models
Api Key: Your personal key from ai-proxy
Model: Kimi-K2.5 (use custom model name)
Enable Streaming: true
Enable reasoning effort: medium, high, extra high (possible, but costly)
Context windows size: 256k
Image support: true
Input price: 0.45 mtok
Output price: 2.25 mtok

Demo task cost (no reasoning effort set): 0.60. Time: 12 minutes. 30 requests

gpt-5.2-codex

Config params

Api Provider: OpenAI Compatible (Responses)
Base URL: https://ai-proxy.cm.itcollege.ee/azure-openai
Api Key: Your personal key from ai-proxy
Model: gpt-5.2-codex (use custom model name)
Enable Streaming: true
Enable reasoning effort: medium, high, extra high (possible, but costly)
Context windows size: 400k
Image support: true
Input price: 1.75 mtok
Output price: 14.00 mtok

Demo task cost (no reasoning effort set): 1.28. Time: 3 minutes. 18 requests.

Opus 4.6

Context windows size: 200k, 1m possible
Input price: 5.00 mtok
Output price: 25.00 mtok

Demo task cost (no reasoning effort set): 1.33. Time: 2 minutes. 23 requests.

Current ai-proxy Pipeline Flow

Models and their example requests

Models are (2026-05-28):

Qwen 3.6 27b - FREE (hosted in university, vllm 2x RTX4090)
Open AI: gpt-5.2-codex, gpt-5.3-codex, gpt-5.3-chat, gpt-5.4
Anthropic models: claude-opus-4-6, claude-sonnet-4-6
codestral-latest
DeepSeek-V3.2
grok-code-fast-1
Kimi-K2.5

Play around with model specific params: streaming, thinking, tokens allowed etc. Use model specific documentation - azure uses their api mostly as-is.

Tested, working request examples

Qwen 3.6 - FREE

Pi agent conf

{
    "providers": {
        "anthropic-proxy": {
            "baseUrl": "https://ai-proxy.cm.itcollege.ee/azure-anthropic",
            "api": "anthropic-messages",
            "apiKey": "your-key-here",
            "compat": {
                "supportsEagerToolInputStreaming": true,
                "supportsLongCacheRetention": true
            },
            "models": [
                {
                    "id": "claude-opus-4-7",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                },
                {
                    "id": "claude-opus-4-6",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                },
                {
                    "id": "claude-sonnet-4-6",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                }
            ]
        },
        "vllm": {
            "baseUrl": "https://ai-proxy.cm.itcollege.ee/vllm/v1",
            "api": "openai-completions",
            "apiKey": "your-key-here",
            "compat": {
                "supportsUsageInStreaming": true,
                "maxTokensField": "max_tokens",
                "supportsDeveloperRole": false
            },
            "models": [
                {
                    "id": "cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4",
                    "reasoning": true,
                    "contextWindow": 262144,
                    "input": [
                        "text",
                        "image"
                    ]
                }
            ]
        },
        "openai-proxy": {
            "baseUrl": "https://ai-proxy.cm.itcollege.ee/azure-openai",
            "api": "openai-responses",
            "apiKey": "your-key-here",
            "compat": {
                "supportsUsageInStreaming": true,
                "maxTokensField": "max_tokens"
            },
            "models": [
                {
                    "id": "gpt-5.5",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                },
                {
                    "id": "gpt-5.3-codex",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                },
                {
                    "id": "gpt-5.4",
                    "reasoning": true,
                    "contextWindow": 1000000,
                    "input": [
                        "text",
                        "image"
                    ]
                }
            ]
        }
    }
}

Opencode conf

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "openai-azure": {
      "npm": "@ai-sdk/openai",
      "options": {
        "baseURL": "https://ai-proxy.cm.itcollege.ee/azure-openai/",
        "apiKey": "your-key-here"
      },
      "models": {
        "gpt-5.3-codex": {},
        "gpt-5.4": {},
        "gpt-5.5": {}
      }
    },
    "openai-vllm": {
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "https://ai-proxy.cm.itcollege.ee/vllm/v1/",
        "apiKey": "your-key-here"
      },
      "models": {
        "cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4": {
          "maxTokens": 262144
        }
      }
    }
  }
}

OpenAI responses
Url: https://ai-proxy.cm.itcollege.ee/azure-openai/responses (seems, that azure openai doesn not have the older chat/completions anymore)

Header:

x-api-key - your_api_key_from_ai-proxy OR 'Authorization' - 'Bearer your_api_key_from_ai-proxy'

Url is mapped to https://foundry-akaver.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview

Request body:

{
  "input": [
    {
      "role": "user",
      "content": "7+3-9 is?"
    }
  ],
  "stream": false,
  "model": "gpt-5.2-codex"
}

Response body:

{
  "id": "resp_0aef0006835a7fb00069c90996ae1881979274809bdfc12109",
  "object": "response",
  "created_at": 1774782870,
  "status": "completed",
  "background": false,
  "completed_at": 1774782872,
  "content_filters": [
    {
      "blocked": false,
      "source_type": "prompt",
      "content_filter_raw": [],
      "content_filter_results": {
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "content_filter_offsets": {
        "start_offset": 30,
        "end_offset": 39,
        "check_offset": 0
      }
    },
    {
      "blocked": false,
      "source_type": "completion",
      "content_filter_raw": [],
      "content_filter_results": {
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        },
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        }
      },
      "content_filter_offsets": {
        "start_offset": 0,
        "end_offset": 44,
        "check_offset": 0
      }
    }
  ],
  "error": null,
  "frequency_penalty": 0.0,
  "incomplete_details": null,
  "instructions": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "model": "gpt-5.2-codex",
  "output": [
    {
      "id": "rs_0aef0006835a7fb00069c9099777ac8197a9046bf579e1d473",
      "type": "reasoning",
      "summary": []
    },
    {
      "id": "msg_0aef0006835a7fb00069c90997e9d48197a24cf7cda24eeb86",
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "1"
        }
      ],
      "role": "assistant"
    }
  ],
  "parallel_tool_calls": true,
  "presence_penalty": 0.0,
  "previous_response_id": null,
  "prompt_cache_key": null,
  "prompt_cache_retention": null,
  "reasoning": {
    "effort": "medium",
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "auto",
  "store": true,
  "temperature": 1.0,
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "medium"
  },
  "tool_choice": "auto",
  "tools": [],
  "top_logprobs": 0,
  "top_p": 0.98,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 13,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 30,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 43
  },
  "user": null,
  "metadata": {}
}

Anthropic

Url: https://ai-proxy.cm.itcollege.ee/azure-anthropic/v1/messages

Header:

x-api-key - your_api_key_from_ai-proxy
anthropic-version - 2023-06-01

Url is mapped to https://foundry-akaver.openai.azure.com/anthropic/v1/messages

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "2+3 is?"
    }
  ],
  "max_tokens": 1000,
  "model": "claude-opus-4-6"
}

Response:

{
  "model": "claude-opus-4-6",
  "id": "msg_013ovwTFQ7ptkEZb65p5uiDW",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "2 + 3 = **5**"
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 14,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 14,
    "service_tier": "standard",
    "inference_geo": "not_available"
  }
}

Codestral
Url: https://ai-proxy.cm.itcollege.ee/azure-models/chat/completions
Header:

x-api-key - your_api_key_from_ai-proxy

Url is mapped to https://foundry-akaver.services.ai.azure.com/models/chat/completions

Model name is mapped from codestral-latest to codestral-2501 for azure (was needed for kilo code, kilo code did not allow to change codestral provider model name)

Request body:

{
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "codestral-latest"
}

Response:

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Hello! How can I assist you today? If you're up for it, let's chat about something interesting. Here are a few topics to get us started:\n\n1. **Movies and TV Shows**: What's the last movie or TV show you watched?\n2. **Books**: Any good books you've read recently?\n3. **Travel**: If you could travel anywhere in the world, where would it be?\n4. **Food**: What's your favorite cuisine or dish?\n5. **Hobbies**: What do you enjoy doing in your free time?\n\nOr, if you have a specific question or topic in mind, feel free to share!",
        "role": "assistant",
        "tool_calls": null
      }
    }
  ],
  "created": 1774784702,
  "id": "344f275d8fea49ffa70253602540fe82",
  "model": "codestral-2501",
  "object": "chat.completion",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "usage": {
    "audio_prompt_tokens": 0,
    "completion_tokens": 133,
    "prompt_tokens": 4,
    "total_tokens": 137
  }
}

DeepSeek-V3.2
Url: https://ai-proxy.cm.itcollege.ee/azure-models/chat/completions
Header:

x-api-key - your_api_key_from_ai-proxy

Url is mapped to https://foundry-akaver.services.ai.azure.com/models/chat/completions

Request body:

{
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "DeepSeek-V3.2"
}

grok-code-fast-1
Url: https://ai-proxy.cm.itcollege.ee/azure-models/chat/completions
Header:

x-api-key - your_api_key_from_ai-proxy

Url is mapped to https://foundry-akaver.services.ai.azure.com/models/chat/completions

Request body:

{
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "grok-code-fast-1"
}

Kimi-K2.5
Url: https://ai-proxy.cm.itcollege.ee/azure-models/chat/completions
Header:

x-api-key - your_api_key_from_ai-proxy

Url is mapped to https://foundry-akaver.services.ai.azure.com/models/chat/completions

Request body:

{
  "messages": [
    {
      "content": "Hi",
      "role": "user"
    }
  ],
  "model": "Kimi-K2.5"
}

Result:

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Hello! How can I help you today?",
        "reasoning_content": "The user just said \"Hi\". This is a simple greeting. I should respond in a friendly, helpful manner while being ready to assist with whatever they need next. Since this is the beginning of the conversation, I should keep it open-ended to encourage them to share what they need help with.\n\nI should:\n1. Greet them back politely\n2. Ask how I can help them\n3. Keep it concise but welcoming\n\nI don't need to overthink this - it's a standard greeting exchange to start the conversation. I'll respond naturally and invite them to ask whatever they need. ",
        "role": "assistant",
        "tool_calls": null
      },
      "stop_reason": null
    }
  ],
  "created": 1774806413,
  "id": "464274dc551645ae954a587e24a5c8e0",
  "model": "Kimi-K2.5",
  "object": "chat.completion",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "usage": {
    "audio_prompt_tokens": 0,
    "completion_tokens": 128,
    "prompt_tokens": 9,
    "prompt_tokens_details": null,
    "reasoning_tokens": 0,
    "total_tokens": 137
  }
}

Configuration​

Kilo Code configurations (tested on 2026 feb)​

grok-code-fast-1​

Kimi-K2.5​

gpt-5.2-codex​

Opus 4.6​

Current ai-proxy Pipeline Flow​

Models and their example requests​

Tested, working request examples​

Configuration

Kilo Code configurations (tested on 2026 feb)

grok-code-fast-1

Kimi-K2.5

gpt-5.2-codex

Opus 4.6

Current ai-proxy Pipeline Flow

Models and their example requests

Tested, working request examples