Yoann Abriel
fr

All projects

2024-2026

KARL

In production at Orange Business, for product and sales teams

RAG cloud-intelligence chatbot in production at Orange Business. Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) via vLLM on H100 NVL and L40S GPUs, LangChain + ChromaDB orchestration. Built for auditable answers, not for the demo.

KARL is the RAG cloud-intelligence chatbot I built end to end inside Orange Business's Cloud Avenue product team. It runs a local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM on H100 NVL and L40S GPUs, with LangChain orchestration and a ChromaDB vector store. The goal was never a demo that impresses: it was answers grounded in internal sources, auditable and reliable, for product and sales teams.

Challenges

  • Serving multiple LLMs locally on GPUs (H100 NVL, L40S) via vLLM
  • Getting auditable, reliable answers rather than ones that just look good in a demo
  • Orchestrating ChromaDB vector search with LangChain over internal cloud sources

Solutions

  • Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM
  • LangChain + ChromaDB RAG pipeline to ground answers in internal sources
  • Evaluating outputs against real cases instead of trusting vibes

Results

  • Deployed in production for Orange Business product and sales teams
  • Local multi-LLM inference on H100 NVL and L40S GPUs via vLLM
  • Grounded, auditable answers via RAG (LangChain + ChromaDB)

Technologies

LangChain · ChromaDB · vLLM · H100 NVL · Llama 3.3 70B · DeepSeek R1 · RAG · Python