EXPERTISE · DEEP DIVES

Expertise that's been
pressure-tested in production.

Six core domains. One proprietary stack. 50 million users worth of evidence that it works under real load, real cost pressure, and real compliance constraints.

CORE DOMAINS

50M+

USERS OF PROOF

10y+

PRODUCTION AI

INDIC LANGUAGES

01 MULTIMODAL · VLM

Multimodal & Vision-Language Models

Vision-Language Models are the strategic high ground of this decade — the point where the CV era and the LLM era converge into systems that see, read, and reason in a single pass. We build VLM systems for scene reasoning, visual Q&A, multimodal agents, image-text retrieval, and grounded multimodal reasoning.

Our methodology pairs foundation-model adaptation (CLIP, BLIP-2, LLaVA, Qwen-VL families) with vertical fine-tuning, contrastive alignment on domain data, and rigorous multimodal evaluation harnesses — including red-team protocols for hallucination, refusal, and modality leakage.

VLM workloads sit on our shared inference plane — vLLM, TGI, Triton — with quantization tuned to the task: bf16 where reasoning matters, int8/int4 where throughput does. Where the use case demands edge, we cascade to a smaller distilled VLM on-device.

CLIPBLIP-2LLaVAQwen-VLLoRADPOvLLMTritonEVAL HARNESS

PROOF POINT

LightX uses multimodal pipelines to drive image-conditioned generation at 5M+ designs/month. Same systems are productized for enterprise creative ops.

02 COMPUTER VISION

Computer Vision at Production Scale

Computer Vision isn't an adjacent capability for us — it's the discipline our founder began at UC Berkeley and the system class that powers our consumer products. Detection, segmentation, instance tracking, OCR, defect inspection, video analytics: we ship them at consumer cost and consumer latency.

We design pipelines around the deployment target first. Server-side detection runs on Ultralytics YOLO and SAM derivatives with TensorRT acceleration. On-device segmentation cascades to distilled student models compiled to CoreML, TFLite, or ONNX Runtime — quantized, pruned, and budgeted to phone-class hardware.

Production CV is as much about evaluation as architecture. We build per-class precision/recall dashboards, drift detectors, and edge-case mining pipelines so that the model in production is the same model your QA team approved.

YOLOSAMTIMMTensorRTONNX RUNTIMECoreMLTFLiteQuantizationDRIFT DETECTION

PROOF POINT

PhotoCut processes 30M+ background removals every month — the production baseline our enterprise CV systems inherit.

03 GENERATIVE VISUAL

Generative AI for Visual Content

Generative visual AI is where consumer-scale taught us the most. The cost ceiling and brand-safety floor are non-negotiable when you serve millions of users — and those constraints map exactly to enterprise creative ops.

Our generative stack pairs diffusion backbones (SDXL, FLUX-class architectures) with brand-locked LoRA adapters, ControlNet conditioning, and inpainting/relighting modules for catalog automation, virtual try-on, and brand-consistent creative. Generation is orchestrated through cost-aware schedulers that pick the right model and the right step count per request.

Generation infrastructure is hosted on vLLM and Triton with autoscaling tuned to creative ops traffic shapes — long tails of bursty requests, not steady throughput.

DIFFUSERSSDXLFLUXControlNetLoRA · BRAND-LOCKvLLMSCHEDULERSA/B HARNESS

PROOF POINT

5M+ creative designs generated every month across LightX, Photoleaf, and StoryZ. Same pipelines, productized for your catalog.

04 VERNACULAR

Multilingual & Vernacular Intelligence

India ships in 22 official languages and a long tail of regional scripts; most enterprise AI doesn't. We've built proprietary OCR and NLU stacks for Devanagari, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Punjabi, Odia, and Urdu — among others — at character-accuracy parity with English baselines on production data.

Our pipelines combine script-specific detection (recognizing the visual logic of conjuncts and ligatures), domain-tuned recognition models, and multilingual NLU heads built on Indic-aligned encoders. For generation, we adapt instruction-tuned multilingual models with vernacular DPO and human preference loops.

Where data is sparse, we use synthetic augmentation and cross-lingual transfer — and we share our preprocessing standards openly with customer teams so the work is portable.

INDIC OCRCTC + ATTNDevanagariBengaliTamilTeluguCROSS-LINGUALDPO

PROOF POINT

Deployed in production KYC and creative pipelines processing Indic-script forms and content at population scale.

05 LLMS · RAG · AGENTS

LLMs, RAG & Agentic Systems

We deploy LLMs where they earn their keep: domain fine-tuning, retrieval pipelines, and agentic workflows that hit real enterprise traffic — often on-prem or inside customer VPCs, with no provider lock-in.

Fine-tuning is selected by data size and behavior delta: LoRA and QLoRA for narrow capability shifts, full SFT where the gap is wider, RLHF/DPO where preference data is available. Retrieval uses hybrid sparse-dense pipelines (BM25 + dense) on Qdrant, Weaviate, FAISS, or OpenSearch, with re-ranking and chunking strategies tuned to the document corpus.

Agent runtimes are LangGraph-based or fully custom, with explicit tool use, deterministic state machines, and evaluation harnesses that test the agent, not just the model. Inference runs on vLLM, TGI, or Triton — your call.

LlamaMistralQwenGemmaLoRAQLoRASFTRLHF · DPOQdrantWeaviateFAISSLangGraphvLLMTGI

06 NLP · DOCUMENT AI

Document & Text Intelligence

Most enterprise value is locked inside semi-structured documents — invoices, contracts, claims, clinical reports, KYC packets. We build extraction pipelines that combine layout understanding (DETR-derived detectors), OCR, and structured prediction heads to lift fields, tables, and relationships at audit-grade accuracy.

On the language side, we run NER for domain vocabularies, summarization, topic modeling, sentiment, and conversation analytics — fine-tuned where general-purpose models miss your terms. Enterprise search and Q&A pipelines combine RAG with structured filters so answers cite real documents and stay inside permission boundaries.

Everything ships with human-in-the-loop where the work demands it: exception queues, confidence-thresholded escalation, and audit trails your compliance team can read.

DocumentAILayoutLMDETRNERRAGHYBRID SEARCHHITLAUDIT TRAIL

TECH STACK · TRANSPARENT

What we actually build with.

Show, don't hide. No mystery proprietary box — just a sharp, opinionated stack we operate every day in production.

MODELS · FRAMEWORKS

PyTorch HF Transformers Diffusers TIMM Ultralytics YOLO SAM CLIP BLIP-2 LLaVA Llama Mistral Qwen Gemma

FINE-TUNING

LoRA QLoRA Full SFT RLHF DPO Preference data

VECTOR · RETRIEVAL

Qdrant Weaviate FAISS OpenSearch hybrid

ORCHESTRATION

LangGraph LlamaIndex Custom agent runtimes

SERVING · INFERENCE

vLLM TGI Triton ONNX Runtime TensorRT CoreML TFLite

MLOPS

MLflow Weights & Biases Prefect Airflow Argo

CLOUD · INFRA

AWS (primary) GCP Azure Kubernetes Terraform

COMPLIANCE POSTURE

GDPR EU AI Act readiness India DPDP SOC2-aligned HIPAA-deployable

RESEARCH LINEAGE EST. 2002

UC Berkeley × IIT Kanpur

Autonomous-navigation research, computer vision foundations, multimodal perception.

SLAM Stereo vision Sensor fusion Path planning

RESEARCH LINEAGE

Academic discipline.
Consumer-scale rigor.

Our founder's autonomous-navigation work at UC Berkeley still informs how we design vision systems today — the discipline of building perception that works under real-world noise, latency budgets, and failure modes.

That academic foundation, plus a decade of shipping AI to 50 million users, is what we bring to your problem. Not a lab demo. Not a slide deck. A built thing.

NEXT STEP

Want a deep technical session
with our senior architects?

Book a technical review hello@andor.in

Expertise that's been pressure-tested in production.

Multimodal & Vision-Language Models

Computer Vision at Production Scale

Generative AI for Visual Content

Multilingual & Vernacular Intelligence

LLMs, RAG & Agentic Systems

Document & Text Intelligence

What we actually build with.

Academic discipline.Consumer-scale rigor.

Want a deep technical sessionwith our senior architects?

Expertise that's been
pressure-tested in production.

Academic discipline.
Consumer-scale rigor.

Want a deep technical session
with our senior architects?