If you want to check the code for this project:

A high-performance LLM inference server with structured and unstructured output validation, designed for distributed systems and optimized for large-scale LLM operations. Is optimized for high throughput low latency.

The core components of the project are:

  • Connection pool

  • Batching startegies

  • Structured vs unstructured output generation (can check the results here)