← All modalities
Image understanding · OCR · Docs
Vision (VLM / OCR)
TBC
Multimodal chat that accepts images, plus document OCR and table extraction.
Overview
Vision (VLM / OCR)
Planned uses: screenshot analysis, document/PDF OCR, table and figure extraction, layout analysis, receipt and form auto-entry, image Q&A, and screen understanding for UI automation.
- Endpoint
- /v1/chat/completions
- Example model
- vlm-7b
API
API example
curl
curl https://api.openalchemy.io/v1/chat/completions \
-H "Authorization: Bearer $OPENALCHEMY_API_KEY" \
-H "X-Project-Id: $YOUR_PROJECT_ID" \
-H "Content-Type: application/json" \
-d '{
"model": "vlm-7b",
"messages": [
{"role": "user", "content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/receipt.jpg"}}
]}
]
}'Status
Quota and pricing (per project)
Allotments and rate limits apply to the project that issued the API key. A second domain-scoped policy layer constrains which origins may invoke each modality.
START TODAY
Ready to turn inference cost into something closer to alchemy?
The free tier lets you spin up one project and run your first 1,000 requests with no credit card.