Andrej Baranovskij Blog: VisionLLM

Showing posts with label VisionLLM. Show all posts

Monday, May 4, 2026

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations.

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.

Monday, March 16, 2026

Qwen 3.5 Test for JSON Structured Data Extraction

Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM

Monday, February 16, 2026

GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

I compare two OCR models using real test cases: GLM OCR and DeepSeek OCR2. Both are evaluated on their ability to extract document content and convert it into well-structured Markdown. I demonstrate which model performs better and which one is faster.

Monday, February 9, 2026

Get Vision LLMs to Follow Your Rules: Prompt-Guided JSON Formatting

JSON query helps to fetch structured output with Vision LLM and extract document data. I describe how to improve such output with additional rules provided through LLM prompt. In this video I share example of number formatting, based on applied rule LLM will output values in requested format.

Tuesday, January 27, 2026

Vision LLM Output Control for Better OCR with Prompt Hints

I explain my approach to enforce better OCR output from vision LLMs with prompt hints. This allows to set rules for output data validation and formatting.

Wednesday, December 3, 2025

Structured Data Retrieval with Sparrow using OCR and Vision LLM [Improved Accuracy]

I explain improvements I'm adding into Sparrow to achieve better accuracy for structured data. I'm using a method, where I run OCR step first, then construct advanced prompt with injected OCR data. This prompt is sent along with image to Vision LLM for structured data retrieval. All this happens as part of a single pipeline.

Wednesday, July 23, 2025

PaddleOCR 3.1 Setup in FastAPI

I explain how to run PaddleOCR 3.1 from FastAPI app.

Monday, July 14, 2025

Structured Data Query with Sparrow AI Agent

Sparrow comes with option to extract stuctured data with query. In this video I explain how you can define such query to fetch array and field data.

Tuesday, July 8, 2025

Vision LLM with MLX: Extracting Electric Meter Data in Production

In this video, I share my experience using the MLX backend to run Vision LLM (with MLX-VLM) for structured data extraction in a production environment. See how I used Sparrow to accurately read electric meter data and learn practical tips for deploying similar solutions.

Monday, June 23, 2025

How to Extract Financial Statement Data with Sparrow & Vision LLM

Extract financial statement data with Sparrow and Vision LLM in this quick tutorial! Sparrow auto-detects tables, builds clear grids, and uses OCR for accurate Vision LLM results, preventing errors. Runs locally with no cloud dependency, making it great for private financial documents. Perfect for anyone handling sensitive financial data.

Monday, June 16, 2025

Boost Vision LLM Accuracy with OCR Text Integration

I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision LLM for structured output construction while relying on external OCR text, giving you more control over the results.

Tuesday, April 22, 2025

Running Vision Models on Apple Silicon with MLX-VLM

I show and explain how to run Qwen and Mistral vision models on Apple Silicon with MLX-VLM. I share technical tips about how to run both models and show how to pass query prompt.

Wednesday, March 12, 2025

Building AI Agent for Local Structured JSON Output

I explain key steps of building AI agent to process document and extract structured JSON data locally. I'm running it with Sparrow and using Qwen VL model for vision processing backend and OCR. The steps are explained with Sparrow code walkthrough.

Monday, March 3, 2025

Querying Non Existing Fields with Qwen2.5 Vision LLM

I describe how Sparrow helps to query non existing fields with Qwen2.5 Vision LLM. Running it locally with MLX and MLX-VLM.

Monday, February 10, 2025

Structured Data Extraction with Sparrow Agent: Vision LLM & Prefect in Action

Discover how to streamline your data extraction process with Sparrow Agent! In this tutorial, I showcase how Sparrow Agent leverages Vision LLM to intelligently handle complex data tasks, while Prefect ensures every step is logged and monitored for maximum transparency and efficiency. Join me as I break down the process and share tips for optimizing your automated workflows.

Tuesday, January 28, 2025

Improving Qwen-VL Structured Output with Image Cropping

Explaining how I'm improving structured output results from Qwen-VL with image cropping in Sparrow.

Monday, January 20, 2025

Apple MLX Vision LLM Server with Ngrok, FastAPI and Sparrow

I show how I run Apple MLX backend on my local Mac Mini M4 Pro 64GB and access it from the Web through Ngrok, with automatically provisioned HTTPS certificate.

Tuesday, January 14, 2025

Vision LLM Structured Output with Sparrow

I show how Sparrow UI Shell works with both image and PDF docs to process and extract structured data with Vision LLM (Qwen2) in the MLX backend.

Monday, December 23, 2024

Stateless MLX Inference with FastAPI in Sparrow

I show how to run inference with MLX in stateless mode, when loaded model is released after inference completes. This is useful when inference requests are less frequent and it helps to reclaim resources reserved by MLX.

Monday, December 9, 2024

Structured Output from Multipage PDF with Sparrow (Qwen2 Vision LLM and MLX)

I explain how multipage PDFs are handled in Sparrow to extract structured data in a single call.