Developers

Document extraction, from your own code.

Define a workflow once, then run it against any batch of documents and pull back structured, citation-backed results — all over a small HTTP API authenticated with API keys.

What you can build

Three building blocks, one base URL.

Everything lives under https://datasnipe.app/api. Authenticate with a Bearer API key and the workflow and artifact endpoints are yours to automate.

Workflow API

Create, read, update, and delete reusable extraction workflows — a field schema, a model, and an optional context prompt — then run a workflow against one or many uploaded files.

Artifacts API

Poll a run to completion and download the collated results as CSV or TSV, with control over how rows are grouped and a confidence cutoff for low-quality cells.

API keys

Scoped, revocable keys you create in the dashboard. Grant read, write, or run access, attach a key to yourself or your organization, and rotate it whenever you need.

Authentication

Bearer keys, scoped on purpose.

Create an API key from the API keys page in your dashboard. The secret is shown once at creation and stored only as a hash — copy it then, because it can never be retrieved again.

Send the key on every request

Pass the key in the Authorization header. Keys are prefixed with dsk_.

HTTP header
Authorization: Bearer dsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Keys can't manage keys. API keys authenticate the Workflow and Artifacts endpoints only. Creating, listing, and revoking keys is a dashboard-only action, so a leaked key can never mint or revoke others.

Scopes

Each key carries an explicit set of scopes. A request to an endpoint whose scope the key lacks returns 403 with insufficient_scope.

workflows:read

List and fetch workflows, poll job groups, and download CSV/TSV artifacts.

workflows:write

Create, update, and delete workflows.

workflows:run

Run a workflow against uploaded files.

Ownership

A key belongs to either you or your organization. A personal key sees your personal workflows; an organization key sees the organization's. Workflows and their results are always scoped to the key's owner.

Workflow API

Define once, run anywhere.

A workflow bundles an extraction schema, a model, and an optional context prompt under a name. Manage workflows with these endpoints.

GET /api/workflows List your workflows · workflows:read
POST /api/workflows Create a workflow · workflows:write
GET /api/workflows/:id Fetch one workflow · workflows:read
PATCH /api/workflows/:id Update name, schema, model, or prompt · workflows:write
DELETE /api/workflows/:id Delete a workflow · workflows:write
POST /api/workflows/:id/runs Run a saved workflow against uploaded files · workflows:run
POST /api/runs Run a one-off config — no saved workflow · workflows:run

Create a workflow

Send a JSON body with a name and an inline extraction schema. Alternatively, pass a sourceGroupId to clone the schema, model, and prompt from a previous run instead of an inline schema.

Body

FieldTypeDescription
name string required 1–120 characters. Must be unique within the owner.
extractionSchema Field[] required* At least one field. *Required unless sourceGroupId is given.
model string optional Defaults to claude-sonnet-4-6. See models below.
contextPrompt string optional Extra context passed to the extraction phase.
fewShotExamples Example[] optional Each { input, output } pair guides the model.
ownerType string optional user or organization. Must match the key's owner.
sourceGroupId string optional Clone metadata from a prior job group instead of an inline schema.

Extraction field

Fields are explicit: a name, a type of string, number, boolean, date, or list, and an optional description. A list field also needs itemFields — the scalar columns of each item.

POST /api/workflows
curl https://datasnipe.app/api/workflows \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Clinical trial extractor",
    "model": "claude-sonnet-4-6",
    "extractionSchema": [
      { "name": "title", "type": "string", "description": "Paper title" },
      { "name": "sample_size", "type": "number", "description": "Participants enrolled" },
      { "name": "double_blind", "type": "boolean" },
      {
        "name": "arms",
        "type": "list",
        "description": "Each treatment arm",
        "itemFields": [
          { "name": "label", "type": "string" },
          { "name": "dose_mg", "type": "number" }
        ]
      }
    ]
  }'

Responds 201 with { "workflow": { … } }. The workflow object includes id, name, ownerType, ownerId, extractionSchema, model, contextPrompt, fewShotExamples, createdAt, and updatedAt.

Run a workflow

Runs are multipart/form-data: attach one or more files parts. DataSnipe processes the bytes in memory and returns a group id plus a job per file.

POST /api/workflows/:id/runs
curl https://datasnipe.app/api/workflows/$WORKFLOW_ID/runs \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -F "files=@paper-1.pdf" \
  -F "files=@paper-2.pdf"
202 Accepted
{
  "workflowId": "0b1c…",
  "groupId": "9f2a…",
  "jobs": [
    { "jobId": "a1…", "fileName": "paper-1.pdf" },
    { "jobId": "b2…", "fileName": "paper-2.pdf" }
  ]
}

Hold on to groupId — it's how you poll the run and download artifacts.

One-off runs

To extract without saving a workflow, post to /api/runs with a JSON config part alongside the files parts. The config takes the same extractionSchema, model, contextPrompt, and fewShotExamples fields as a workflow. The response is a groupId and jobs — but no workflowId, since nothing is persisted.

POST /api/runs
curl https://datasnipe.app/api/runs \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -F 'config={
        "extractionSchema": [ { "name": "title", "type": "string" } ],
        "model": "claude-sonnet-4-6"
      };type=application/json' \
  -F "files=@paper-1.pdf"

Poll the returned groupId and download artifacts with the same job-group endpoints used for saved runs.

Models

Pass any of these ids as model. The default is claude-sonnet-4-6.

Supported model ids
claude-opus-4-7            gpt-5.4               gemini-3.1-pro-preview
claude-sonnet-4-6          gpt-5.4-mini          gemini-3.5-flash
claude-haiku-4-5-20251001  gpt-4.1               gemini-3.1-flash-lite
Artifacts API

Poll, then download the table.

A run is asynchronous. Poll the job group until every job is done or failed, then download the collated results.

GET /api/job-groups/:groupId Poll status & per-job results · workflows:read
GET /api/job-groups/:groupId/artifact.csv Download collated CSV · workflows:read
GET /api/job-groups/:groupId/artifact.tsv Download collated TSV · workflows:read

Poll a run

Each job moves through queuedreadysummarizingextractingdone, or ends as failed with an error. The response carries per-job extractions, token usage, and cost.

GET /api/job-groups/:groupId
curl https://datasnipe.app/api/job-groups/$GROUP_ID \
  -H "Authorization: Bearer $DATASNIPE_API_KEY"

Download artifacts

Both artifact.csv and artifact.tsv accept the same query parameters.

ParamValuesDefaultDescription
collateBy document · page · none document One row per file, per page, or per individual extraction occurrence.
confidenceCutoff 0–1 0.3 Drops collated cells below the cutoff. Ignored when collateBy=none.
format standard · normalized standard normalized explodes a single list field into one row per item. Requires the schema to have exactly one list field, otherwise 422.
GET /api/job-groups/:groupId/artifact.csv
curl "https://datasnipe.app/api/job-groups/$GROUP_ID/artifact.csv?collateBy=document&confidenceCutoff=0.5" \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -o results.csv
Quickstart

Zero to a CSV in four calls.

Create a key in the dashboard, then run the full loop from your shell.

End-to-end
# 0. Create a key at https://datasnipe.app/api-keys with the
#    workflows:read, workflows:write, and workflows:run scopes.
export DATASNIPE_API_KEY="dsk_…"

# 1. Create a workflow.
WORKFLOW_ID=$(curl -s https://datasnipe.app/api/workflows \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Quickstart", "extractionSchema": [
        { "name": "title", "type": "string" } ] }' \
  | jq -r '.workflow.id')

# 2. Run it against your documents.
GROUP_ID=$(curl -s https://datasnipe.app/api/workflows/$WORKFLOW_ID/runs \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  -F "files=@paper-1.pdf" -F "files=@paper-2.pdf" \
  | jq -r '.groupId')

# 3. Poll until every job is completed.
curl -s https://datasnipe.app/api/job-groups/$GROUP_ID \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" \
  | jq '.jobs[] | { fileName, status }'

# 4. Download the collated table.
curl -s "https://datasnipe.app/api/job-groups/$GROUP_ID/artifact.csv" \
  -H "Authorization: Bearer $DATASNIPE_API_KEY" -o results.csv
Errors

Predictable status codes.

Errors return a JSON body with an error field and, where useful, extra context.

StatusMeaningBody
400 Invalid request body or query parameters. { error: [ … ] }
401 Missing or invalid API key. { error: "Unauthorized" }
402 Not enough credits to start the run. { error: "insufficient_credits", available, required }
403 The key lacks the required scope. { error: "insufficient_scope", required }
404 Workflow or job group not found (or not visible to the key). { error: "…" }
409 A workflow with that name already exists for the owner. { error: "…" }
422 Normalized export needs exactly one list field in the schema. { error: "…" }
429 Too many in-flight jobs for the account. Respect Retry-After. { error, limit, outstanding, retryAfterSeconds }