Showing posts with label JSON. Show all posts
Showing posts with label JSON. Show all posts

Wednesday, June 24, 2026

Mistral OCR + Sparrow: Document to JSON

Integrated Mistral OCR as a new cloud inference backend into Sparrow, an open-source document extraction platform. This gives Sparrow a full cloud option alongside its existing local backends (MLX, vLLM), so users without GPU infrastructure can still run enterprise-grade document extraction.

Pipeline: Mistral OCR converts the document to structured HTML, then Mistral Small extracts and transforms the data into JSON based on a defined schema with field-level hints.

In this video, extracting a bonds portfolio table with hint-driven rules:

  • Instrument name normalization (extracting issuer brand from full fund names)
  • European number formatting (period as thousands separator, comma as decimal)
  • Percentage formatting with sign preservation
  • Derived risk classification computed from profit/loss percentage
Same Sparrow API, same schema and hint format as local backends — just switch the backend flag to run on Mistral Cloud instead of MLX or vLLM.

Sparrow is open source and local-first by design — documents never leave your infrastructure unless you choose the cloud backend.

⭐ GitHub: github.com/katanaml/sparrow
🌐 Live demo: sparrow.katanaml.io 

 

Monday, March 16, 2026

Qwen 3.5 Test for JSON Structured Data Extraction

Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM

 

Thursday, March 12, 2026

Fast Large Table Extraction: Sparrow + dots.ocr to JSON

Sparrow provides table processing mode. It is optimized to handle large tables, it comes with separate template script (new templates can be easily added) to process dots.ocr markdown output into structure JSON with field mapping.

 

Monday, February 16, 2026

GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

I compare two OCR models using real test cases: GLM OCR and DeepSeek OCR2. Both are evaluated on their ability to extract document content and convert it into well-structured Markdown. I demonstrate which model performs better and which one is faster. 

 

Monday, February 9, 2026

Get Vision LLMs to Follow Your Rules: Prompt-Guided JSON Formatting

JSON query helps to fetch structured output with Vision LLM and extract document data. I describe how to improve such output with additional rules provided through LLM prompt. In this video I share example of number formatting, based on applied rule LLM will output values in requested format. 

 

Tuesday, November 11, 2025

Comparing Qwen3-VL AI Models for OCR Task

I'm comparing the Qwen3-VL 8B BF16 and Qwen3-VL 30B Q8 models for OCR and structured data extraction tasks. Based on my findings, the quantized 30B model runs faster and with better accuracy than the 8B BF16 model, despite using more memory. 

 

Tuesday, October 21, 2025

Qwen3-VL New Models Comparison and Performance on Mac Mini M4

I run and compare newest Qwen3-VL models in Sparrow. Qwen3-VL models run fast and provide good accuracy. 

 

Friday, October 10, 2025

Ollama Support in Sparrow and Update to Latest MLX

I explain whats new in Sparrow and what was updated in the recent version.

 

Monday, March 31, 2025

Extract Structured Data from Documents with Sparrow (Free Tier Available)

I built Sparrow for document data extraction 🚀 It's fully open-source and runs locally on your machine You can extract structured data from any document using powerful Mistral 24B 8bit and Qwen 2.5 72B 4bit models It's free to try with no registration (3 calls per 6 hours, max 3-page documents) and doesn't send your documents to third parties 

 

Wednesday, March 12, 2025

Building AI Agent for Local Structured JSON Output

I explain key steps of building AI agent to process document and extract structured JSON data locally. I'm running it with Sparrow and using Qwen VL model for vision processing backend and OCR. The steps are explained with Sparrow code walkthrough. 

 

Monday, February 10, 2025

Structured Data Extraction with Sparrow Agent: Vision LLM & Prefect in Action

Discover how to streamline your data extraction process with Sparrow Agent! In this tutorial, I showcase how Sparrow Agent leverages Vision LLM to intelligently handle complex data tasks, while Prefect ensures every step is logged and monitored for maximum transparency and efficiency. Join me as I break down the process and share tips for optimizing your automated workflows. 

 

Tuesday, January 28, 2025

Improving Qwen-VL Structured Output with Image Cropping

Explaining how I'm improving structured output results from Qwen-VL with image cropping in Sparrow.
 
 

Monday, December 23, 2024

Stateless MLX Inference with FastAPI in Sparrow

I show how to run inference with MLX in stateless mode, when loaded model is released after inference completes. This is useful when inference requests are less frequent and it helps to reclaim resources reserved by MLX.

 

Tuesday, December 17, 2024

Streamlined Table Data Extraction with Sparrow | Table Transformer, Qwen2 VL, MLX, & Mac Mini M4 Pro

Learn how to streamline table data extraction with Sparrow, Table Transformer, Qwen2 VL, and MLX on the Mac Mini M4 Pro. Simplify your workflow and get accurate results! 

 

Monday, April 29, 2024

LLM JSON Output with Instructor RAG and WizardLM-2

With Instructor library you can implement simple RAG without Vector DB or dependencies to other LLM libraries. The key RAG components - good data pre-processing and cleaning, powerful local LLM (such as WizardLM-2, Nous Hermes 2 PRO or Llama3) and Ollama or MLX backend.

Monday, January 8, 2024

Transforming Invoice Data into JSON: Local LLM with LlamaIndex & Pydantic

This is Sparrow, our open-source solution for document processing with local LLMs. I'm running local Starling LLM with Ollama. I explain how to get structured JSON output with LlamaIndex and dynamic Pydantic class. This helps to implement the use case of data extraction from invoice documents. The solution runs on the local machine, thanks to Ollama. I'm using a MacBook Air M1 with 8GB RAM.