Document extraction, from your own code.
Define a workflow once, then run it against any batch of documents and pull back structured, citation-backed results — all over a small HTTP API authenticated with API keys.
Three building blocks, one base URL.
Everything lives under https://datasnipe.app/api.
Authenticate with a Bearer API key and the workflow and artifact endpoints
are yours to automate.
Workflow API
Create, read, update, and delete reusable extraction workflows — a field schema, a model, and an optional context prompt — then run a workflow against one or many uploaded files.
Artifacts API
Poll a run to completion and download the collated results as CSV or TSV, with control over how rows are grouped and a confidence cutoff for low-quality cells.
API keys
Scoped, revocable keys you create in the dashboard. Grant read, write, or run access, attach a key to yourself or your organization, and rotate it whenever you need.
Bearer keys, scoped on purpose.
Create an API key from the API keys page in your dashboard. The secret is shown once at creation and stored only as a hash — copy it then, because it can never be retrieved again.
Send the key on every request
Pass the key in the Authorization header.
Keys are prefixed with dsk_.
Authorization: Bearer dsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Scopes
Each key carries an explicit set of scopes. A request to an endpoint
whose scope the key lacks returns 403 with
insufficient_scope.
workflows:read
List and fetch workflows, poll job groups, and download CSV/TSV artifacts.
workflows:write
Create, update, and delete workflows.
workflows:run
Run a workflow against uploaded files.
Ownership
A key belongs to either you or your organization. A personal key sees your personal workflows; an organization key sees the organization's. Workflows and their results are always scoped to the key's owner.
Define once, run anywhere.
A workflow bundles an extraction schema, a model, and an optional context prompt under a name. Manage workflows with these endpoints.
workflows:read
workflows:write
workflows:read
workflows:write
workflows:write
workflows:run
workflows:run
Create a workflow
Send a JSON body with a name and an inline
extraction schema. Alternatively, pass a sourceGroupId
to clone the schema, model, and prompt from a previous run instead of
an inline schema.
Body
| Field | Type | Description | |
|---|---|---|---|
| name | string | required | 1–120 characters. Must be unique within the owner. |
| extractionSchema | Field[] | required* | At least one field. *Required unless sourceGroupId is given. |
| model | string | optional | Defaults to claude-sonnet-4-6. See models below. |
| contextPrompt | string | optional | Extra context passed to the extraction phase. |
| fewShotExamples | Example[] | optional | Each { input, output } pair guides the model. |
| ownerType | string | optional | user or organization. Must match the key's owner. |
| sourceGroupId | string | optional | Clone metadata from a prior job group instead of an inline schema. |
Extraction field
Fields are explicit: a name, a
type of
string, number,
boolean, date,
or list, and an optional
description. A list
field also needs itemFields — the scalar
columns of each item.
curl https://datasnipe.app/api/workflows \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Clinical trial extractor",
"model": "claude-sonnet-4-6",
"extractionSchema": [
{ "name": "title", "type": "string", "description": "Paper title" },
{ "name": "sample_size", "type": "number", "description": "Participants enrolled" },
{ "name": "double_blind", "type": "boolean" },
{
"name": "arms",
"type": "list",
"description": "Each treatment arm",
"itemFields": [
{ "name": "label", "type": "string" },
{ "name": "dose_mg", "type": "number" }
]
}
]
}'
Responds 201 with
{ "workflow": { … } }. The workflow object
includes id,
name, ownerType,
ownerId,
extractionSchema,
model,
contextPrompt,
fewShotExamples,
createdAt, and
updatedAt.
Run a workflow
Runs are multipart/form-data: attach one or
more files parts. DataSnipe processes the
bytes in memory and returns a group id plus a job per file.
curl https://datasnipe.app/api/workflows/$WORKFLOW_ID/runs \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-F "files=@paper-1.pdf" \
-F "files=@paper-2.pdf"
{
"workflowId": "0b1c…",
"groupId": "9f2a…",
"jobs": [
{ "jobId": "a1…", "fileName": "paper-1.pdf" },
{ "jobId": "b2…", "fileName": "paper-2.pdf" }
]
}
Hold on to groupId — it's how you poll the
run and download artifacts.
One-off runs
To extract without saving a workflow, post to
/api/runs with a JSON
config part alongside the
files parts. The config takes the same
extractionSchema,
model,
contextPrompt, and
fewShotExamples fields as a workflow. The
response is a groupId and jobs — but no
workflowId, since nothing is persisted.
curl https://datasnipe.app/api/runs \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-F 'config={
"extractionSchema": [ { "name": "title", "type": "string" } ],
"model": "claude-sonnet-4-6"
};type=application/json' \
-F "files=@paper-1.pdf"
Poll the returned groupId and download
artifacts with the same job-group endpoints used for saved runs.
Models
Pass any of these ids as model. The default
is claude-sonnet-4-6.
claude-opus-4-7 gpt-5.4 gemini-3.1-pro-preview
claude-sonnet-4-6 gpt-5.4-mini gemini-3.5-flash
claude-haiku-4-5-20251001 gpt-4.1 gemini-3.1-flash-lite
Poll, then download the table.
A run is asynchronous. Poll the job group until every job is
done or failed,
then download the collated results.
workflows:read
workflows:read
workflows:read
Poll a run
Each job moves through queued →
ready →
summarizing →
extracting →
done, or ends as
failed with an
error. The response carries per-job
extractions, token usage, and cost.
curl https://datasnipe.app/api/job-groups/$GROUP_ID \
-H "Authorization: Bearer $DATASNIPE_API_KEY"
Download artifacts
Both artifact.csv and
artifact.tsv accept the same query
parameters.
| Param | Values | Default | Description |
|---|---|---|---|
| collateBy | document · page · none | document | One row per file, per page, or per individual extraction occurrence. |
| confidenceCutoff | 0–1 | 0.3 | Drops collated cells below the cutoff. Ignored when collateBy=none. |
| format | standard · normalized | standard | normalized explodes a single list field into one row per item. Requires the schema to have exactly one list field, otherwise 422. |
curl "https://datasnipe.app/api/job-groups/$GROUP_ID/artifact.csv?collateBy=document&confidenceCutoff=0.5" \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-o results.csv
Zero to a CSV in four calls.
Create a key in the dashboard, then run the full loop from your shell.
# 0. Create a key at https://datasnipe.app/api-keys with the
# workflows:read, workflows:write, and workflows:run scopes.
export DATASNIPE_API_KEY="dsk_…"
# 1. Create a workflow.
WORKFLOW_ID=$(curl -s https://datasnipe.app/api/workflows \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "Quickstart", "extractionSchema": [
{ "name": "title", "type": "string" } ] }' \
| jq -r '.workflow.id')
# 2. Run it against your documents.
GROUP_ID=$(curl -s https://datasnipe.app/api/workflows/$WORKFLOW_ID/runs \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
-F "files=@paper-1.pdf" -F "files=@paper-2.pdf" \
| jq -r '.groupId')
# 3. Poll until every job is completed.
curl -s https://datasnipe.app/api/job-groups/$GROUP_ID \
-H "Authorization: Bearer $DATASNIPE_API_KEY" \
| jq '.jobs[] | { fileName, status }'
# 4. Download the collated table.
curl -s "https://datasnipe.app/api/job-groups/$GROUP_ID/artifact.csv" \
-H "Authorization: Bearer $DATASNIPE_API_KEY" -o results.csv
Predictable status codes.
Errors return a JSON body with an error
field and, where useful, extra context.
| Status | Meaning | Body |
|---|---|---|
| 400 | Invalid request body or query parameters. | { error: [ … ] } |
| 401 | Missing or invalid API key. | { error: "Unauthorized" } |
| 402 | Not enough credits to start the run. | { error: "insufficient_credits", available, required } |
| 403 | The key lacks the required scope. | { error: "insufficient_scope", required } |
| 404 | Workflow or job group not found (or not visible to the key). | { error: "…" } |
| 409 | A workflow with that name already exists for the owner. | { error: "…" } |
| 422 | Normalized export needs exactly one list field in the schema. | { error: "…" } |
| 429 | Too many in-flight jobs for the account. Respect Retry-After. |
{ error, limit, outstanding, retryAfterSeconds } |