A high-performance LLM inference server with structured and unstructured output validation, designed for distributed systems and optimized for large-scale LLM operations. Is optimized for high throughput low latency.
The core components of the project are:
Connection pool
Batching startegies
Structured vs unstructured output generation (can check the results here)