Back to home

About Auxerta

We build the data that makes AI models expert. Not general-purpose — domain-specific, human-validated, production-grade.

The Thesis

Most AI companies are building models. We're building what models need to actually work.

The current generation of large language models can do almost anything — summarize, translate, generate code, hold a conversation. But "almost anything" isn't good enough when the output goes into a medical record, a legal filing, a defense system, or an autonomous vehicle's decision loop. In those domains, you need models that are specifically trained on high-fidelity, domain-pure data — data that captures the edge cases, the regulatory constraints, and the semantic nuances that generic training corpora miss entirely.

That data doesn't appear on the internet. It doesn't exist in CommonCrawl. It has to be deliberately constructed — curated by domain experts, annotated to rigorous standards, validated against real-world constraints, and continuously refreshed as the domain evolves.

Auxerta exists to build that layer. We work directly with organizations that need domain-specific AI and provide the training data infrastructure that makes it possible — from NER annotation and instruction tuning to synthetic data generation with constrained decoding and red-team evaluation.

What We Deliver

1M+
Labeled Samples
9+
Specialized Domains
↓40%
Avg. Loss Reduction
0
Tolerance for Error

Founders

PA

Philip Abao

Co-Founder

At Auxerta, he drives business strategy, product direction, and client acquisition — while staying deep in the code, building the data pipelines and AI systems that clients depend on. Equal parts strategist and engineer.

Software EngineeringBusiness StrategyData PipelinesAI Systems
SJ

Soraya Johnson

Co-Founder

At Auxerta, she leads domain strategy and expert partnerships — bridging the gap between technical depth and regulatory landscape through patenting and scientific research.

Patent StrategyScientific ResearchDomain Analysis

The Team

Auxerta operates as a distributed network of researchers, software engineers, and domain experts — each contributing their specialization to a shared mission: build AI that actually works in fields where getting it wrong has consequences.

There is no traditional org chart. Instead, we function as a collaborative network where every contributor — whether focused on NLP pipelines, clinical annotation, or patent strategy — is connected to the same objective. Every person we bring on makes the network sharper. Every domain we enter makes the data more robust.

We are expanding internationally. With a new presence in the Yokohama area of Japan, Auxerta is growing across the Pacific — bringing our data infrastructure closer to the industries and research communities in the Asia-Pacific region.

United States
Headquarters — distributed across the US with founding operations and core engineering.
Japan — Yokohama
Expanding operations in the Asia-Pacific region, with a new office in the greater Yokohama area.

Why Domain-Specific

General-purpose models are trained on the internet. The internet is broad but shallow — it contains surface-level information about many topics, but rarely the deep, structured, expert-validated knowledge that production AI systems require.

A model trained on CommonCrawl can describe what a drug interaction is. A model trained on Auxerta's clinical dataset can identify that a specific combination of metformin and glyburide at elevated dosages requires renal function monitoring — and flag when a synthetic clinical record omits that check.

That difference — between knowing about a domain and knowing within a domain — is the difference between a model that demos well and a model that ships.

How We Work

Domain experts first

Every dataset starts with the people who know the domain — clinicians, engineers, attorneys, analysts. We encode their knowledge into annotation schemas, validation rules, and quality benchmarks.

Constraints at generation

We don't generate data and hope it's correct. We use constrained decoding, knowledge-grounded validation, and multi-layer quality checks to bound the error rate before data enters a training pipeline.

Pipelines, not projects

Domains change. Regulations update. New edge cases surface. Our data infrastructure is designed for continuous delivery — not one-time collection that decays the moment it's delivered.

Want to work with us?

We're always looking for people and organizations who take data seriously.