Image understanding · OCR · Docs

Vision (VLM / OCR)

TBC

Multimodal chat that accepts images, plus document OCR and table extraction.

Overview

Vision (VLM / OCR)

Planned uses: screenshot analysis, document/PDF OCR, table and figure extraction, layout analysis, receipt and form auto-entry, image Q&A, and screen understanding for UI automation.

Endpoint: /v1/chat/completions
Example model: vlm-7b

API

API example

curl

curl https://api.openalchemy.io/v1/chat/completions \
  -H "Authorization: Bearer $OPENALCHEMY_API_KEY" \
  -H "X-Project-Id: $YOUR_PROJECT_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vlm-7b",
    "messages": [
      {"role": "user", "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/receipt.jpg"}}
      ]}
    ]
  }'

Status

Quota and pricing (per project)

Allotments and rate limits apply to the project that issued the API key. A second domain-scoped policy layer constrains which origins may invoke each modality.

Read the docs for this modality →

START TODAY

Ready to turn inference cost into something closer to alchemy?

The free tier lets you spin up one project and run your first 1,000 requests with no credit card.

Get started free See pricing