We formalize the tacit judgment scientists use every day.

What to investigate, which evidence to trust, where to allocate the next experiment: the hardest research decisions stay implicit. We build the systems that make them explicit.

What We Build

Software that carries the criteria scientists apply at each stage of research, from ideation and literature synthesis to evaluation, planning, and experimental execution.

i.

Judgment, Written Down

We capture how a scientist evaluates ideas, evidence, and directions, then encode it as a framework you can audit, share, and re-run on new inputs.

ii.

A Loop Around the Researcher

Discovery, hypothesis, evaluation, planning, execution. Calibrated judgment models sit at each stage as directional filters, keeping the researcher in the driver's seat.

iii.

Standards That Update

Reward models freeze the day they ship; bibliometric proxies lag by years. Our criteria are re-fitted after every round of human feedback, so they track the field as it moves.

Artifacts from the lab

Three recurring shapes our judgment systems take in practice. Client-specific instances are redacted; the patterns below are what we keep reaching for across domains.

target

Pattern 01

Research Taste, Calibrated

An evaluation function trained from rounds of expert feedback. Each disagreement is one data point; over time the system's scores approach the scientist's, and the remaining gaps mark the dimensions where intuition is still doing the work.

NOVELTY FEASIBILITY IMPACT

Pattern 02

Problems as First-Class Artifacts

A structured card format for research problems: the claim, the supporting evidence, rubric scores, reviewer commentary, and iteration history held in one file. Judgment moves from meeting notes into a versioned record any reviewer can revisit.

NOVELTY FEASIBILITY IMPACT EVIDENCE RISK COST expert system

Pattern 03

Multi-Axis Evaluation

A rubric that splits the single verdict into independent dimensions. Human and system scores line up per axis, making it obvious which dimension each disagreement sits on, and which one deserves the next round of calibration.

How It Works

Research decisions that matter most are the ones scientists cannot fully articulate. We make those decisions legible in three steps, then put them back in the loop.

01

Extract

Sit with the domain scientist. Turn the criteria, heuristics, and gut checks they apply in practice into an explicit, testable draft.

02

Calibrate

Run the draft on real candidates; every disagreement with the expert becomes a training signal. Accuracy improves, and the remaining gaps point at what we missed.

03

Evolve

Scientific standards shift as the field produces new evidence. Feedback keeps coming in; the rubric is re-fitted on a schedule, so the system reflects current practice instead of last year's.

Track Record

01 · Research peer-reviewed work in CVPR, ICCV, NeurIPS, ECCV, IEEE TPAMI
Citations
0
across a decade of applied ML research
Papers
0
including top CV/ML venues
h-Index
0
sustained citation impact
02 · Practice work shipped into production systems and public tooling
US Patents
0
granted at Microsoft, Pinterest, Samsara
Community
0
practitioners enrolled in AI-first engineering programs
GitHub Stars
0
across open-source research tooling

Who We Are

Yan Wang

Yan Wang

Founder & Chief Scientist

A decade of shipping production ML at scale. The current focus: can the intuition a working scientist applies day-to-day be written down, calibrated against their own feedback, and run as software?

QuackTech is currently embedded with experimental physics groups at Yale, co-developing judgment systems for research taste, hypothesis evaluation, and problem selection.

Prior Staff Applied Scientist, Samsara  ·  Staff ML Engineer, Pinterest  ·  Senior SDE, Microsoft  ·  PhD, Columbia University