AI / ML · 02 of 05

Generation, on your brand. At your scale.

Text, code, image and multimodal pipelines that produce output worth shipping. We fine-tune to your voice, ground to your sources, and bolt on an evaluation harness so "it sounded fine" never has to be the QA answer again.

140kGenerations / day · top client
78%Pass eval on first try
−61%Cost vs. naive GPT-4
promptgenerate(brand="orbital", modality="mixed")
// onboarding email
export const mailer = async () => {
  const tpl = await render("welcome", user);
  return send(tpl);
}
BLEU 0.41brand-fit 9.2/10$0.003 / gen
What you get

Six generative capabilities, one pipeline.

We pick the smallest model that does the job, ground it in your data, and ship it with evals you can read in a meeting.

Text & long-form

On-brand articles, RFP responses, knowledge-base content. Style transfer trained on your past best work.

LoRA · style-fit

Code generation

In-editor copilots, scaffolding agents, code review bots — tuned to your repo conventions and your test suite.

repo-aware · typed

Image & product

SDXL / FLUX pipelines for catalog, lifestyle and creative variants — consistency across generations.

SDXL · FLUX · IP-adapter

Video & motion

Short-form social, product demos, training reels with Runway and Sora-class APIs.

Runway · Sora · ComfyUI

Retrieval-augmented

Hybrid (BM25 + vector) retrieval over your docs, tickets and tables. Citations on every answer.

hybrid · rerank · cite

Evaluation harness

Per-task golden sets, model-graded rubrics, regression alerts in CI.

golden set · rubric · CI
How it works

Ground first, generate second, evaluate always.

A three-lane pipeline that's survived the boring middle of dozens of generative projects.

01 · Ground
DocsTicketsBrand assetsEmbeddings
02 · Generate
Base LLMLoRA / SFTToolsConstraints
03 · Evaluate
Golden setRubricHuman reviewRegression CI

The first 70% is grounding. The last 30% is taste.

Most "generative AI" failures aren't model failures — they're grounding failures. We start with the documents, examples and constraints that anchor output to your reality.

  • 01
    Curate the source-of-truth

    Ingest pipelines, chunking strategy, freshness rules — the boring part that determines quality.

  • 02
    Pick the smallest model that wins

    Frontier models for hard tasks, distilled SLMs for the easy 80%.

  • 03
    Fine-tune only what's worth it

    LoRA / SFT for style and structured output.

  • 04
    Eval is a CI gate, not a milestone

    No model ships if regression fails.

Tech stack

Picks we'd defend at 3am.

Generative tooling moves weekly. We track it so you don't have to — and we only ship what survives a quarter of production traffic.

Foundation models

GPT-4o / o1Claude 3.5Gemini 1.5Llama 3.1Mistral Large

Image & video

SDXLFLUXDALL-E 3Stable VideoRunway Gen-3ComfyUI

Fine-tuning

AxolotlUnslothHF PEFTDPO / ORPOLoRA / QLoRA

Retrieval

pgvectorWeaviateVespaCohere RerankBM25

Orchestration

LlamaIndexLangChainDSPyInstructor

Eval

BraintrustLangSmithPromptfooRagas

Serving

vLLMTGIModalReplicateBedrock

Workflow

TemporalAirflowPrefectArgo
From vision to victory

From prompt to production pipeline, in five steps.

No 9-month transformations. A real generative feature, in production, with eval gates — in six weeks.

01
Week 1
Sharpen the brief

Pick a single generation task. Define what good output looks like using a rubric.

02
Week 2
Build the golden set

30–80 input/output examples. This becomes your permanent quality anchor.

03
Week 3–4
Generate & ground

RAG pipeline + prompt structure + optional LoRA. Iterate against dataset, not opinions.

04
Week 5
Ship behind eval gate

CI blocks regressions. Cost and latency budgets enforced automatically.

05
Ongoing
Improve continuously

Production failures feed dataset. Improvements compound over time.

Where it pays back

Three pipelines we've scaled.

Patterns we've run hot — with the cost, quality and throughput curves to prove it.

Pattern · Marketing engine

A content factory that sounds human.

120k SKUs × 7 locales of product copy. LoRA + retrieval + reviewer queue.

−84%Cost / SKU
9.2 / 10Brand-fit score
Claude 3.5Llama LoRApgvector
Pattern · Code copilot

A copilot that knows your repo.

Repo-aware assistant grounded in codebase, conventions and test suite.

+38%PRs merged in 1 review
−27%Bug regressions
GPT-4oInstructorTree-sitter
Pattern · Product photography

Lifestyle imagery without a studio day.

SDXL + IP-adapter pipeline for furniture catalog scaling.

32 hr → 11 minPer SKU
1.4MImages / month
SDXLIP-adapterComfyUIModal
Why ETY

Pipelines, not parlor tricks.

2.1MGenerations / day across all client pipelines we operate.
78%Average eval-pass rate across our generative deployments.
11Models we've fine-tuned & deployed in production over 18 months.
−61%Median cost reduction vs. baseline frontier-only setup.

One pipeline. Receipts on the rubric.

Send us a single generation task that's slow, costly or off-brand today. We'll come back with the golden set, the pipeline and the eval gate — or tell you the smaller fix that solves it cheaper.