How to DeepSeek-Janus-Pro-1B on Google Colab: Complete Guide

Last Updated : 23 Jul, 2025

DeepSeek-Janus-Pro-1B is an advanced multimodal AI model designed to handle both text and image inputs efficiently. If you want to run DeepSeek-Janus-Pro-1B on Google Colab, this guide will help you set it up step by step.

Running large AI models on Google Colab is challenging due to memory limitations. However, with the right approach, you can successfully deploy and test DeepSeek-Janus-Pro-1B on Colab without requiring a high-end local machine.

How-To-Run-DeepSeek-VL2-Free-on-Google-Colab
Running DeepSeek-Janus-Pro-1B on Google Colab

Prerequisites to Run DeepSeek-Janus-Pro-1B on Google Colab

Before we begin, ensure you have:

  • A Google account for accessing Colab.
  • Basic familiarity with Python and Jupyter notebooks.
  • Understanding of Hugging Face Transformers and PyTorch.

How to DeepSeek-Janus-Pro-1B on Google Colab

Learn how to run DeepSeek-Janus-Pro-1B on Google Colab with this step-by-step guide covering repository setup, dependency installation, model loading, and multimodal inference, plus troubleshooting tips for common errors.

Step 1: Setting Up Google Colab

Google Colab provides free GPU access, which is necessary for running large AI models like DeepSeek-Janus-Pro-1B.

Enable GPU in Google Colab

  • Open Google Colab by visiting Colab.
  • Click on Runtime > Change runtime type.
  • Select GPU under "Hardware accelerator".
  • Click Save.

Step 2: Clone the DeepSeek-Janus Repository

  • Start by cloning the official DeepSeek-Janus repository from GitHub. Use the following format:
Python
# Clone the DeepSeek-Janus repository
!git clone <GitHub_Repository_URL>

# Navigate into the cloned directory
%cd <Repository_Folder_Name>

Step 3: Install Required Dependencies

  • Once inside the cloned repository, install the necessary libraries using the provided requirements.txt or pyproject.toml.
Python
# Install dependencies from requirements.txt
!pip install -r requirements.txt

!pip install torch transformers accelerate deepseek-ai

# (Optional) Install the project in editable mode if needed
!pip install -e .

# Install additional dependencies if required (e.g., flash-attn)
!pip install flash-attn

Step 3: Load the Model and Move It to GPU

  • Now, import the necessary libraries and load the model. Make sure to move the model to GPU for faster performance.
Python
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images

# Define model path (update with correct model identifier if needed)
model_path = "<Model_Identifier_or_Path>"

# Load processor and tokenizer
vl_chat_processor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

# Load model with remote code enabled
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

# Move model to GPU
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()

Step 4: Pass an Image for Processing

  • Use the model to analyze an image and generate a response based on the provided question.
Python
# Provide the path to the image you want to analyze
image_path = "/content/<Image_File_Name>.png"
question = "What's in the image?"

# Create a conversation format
conversation = [
    {"role": "<|User|>", "content": f"<image_placeholder>\n{question}", "images": [image_path]},
    {"role": "<|Assistant|>", "content": ""}
]

# Load and process the image
pil_images = load_pil_images(conversation)

# Prepare inputs for the model
prepare_inputs = vl_chat_processor(conversations=conversation, images=pil_images, force_batchify=True).to(vl_gpt.device)
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

# Generate the response
outputs = vl_gpt.language_model.generate(
    inputs_embeds=inputs_embeds,
    attention_mask=prepare_inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True,
)

# Decode and display the response
answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)

Step 5: Run Predefined Inference Scripts (Optional)

  • If the repository includes predefined scripts like inference.py or generation_inference.py, you can run them directly:
Python
# Run an inference script from the repository
!python <Script_Name>.py --prompt "Describe the image."

Step 6: Authenticate with Hugging Face (If Required)

  • If the model is private or gated on Hugging Face, authenticate using your Hugging Face token.
  1. Go to Hugging Face Tokens Page and generate a token.
  2. Run the following code to authenticate:
Python
from huggingface_hub import notebook_login

# This will prompt you to enter your Hugging Face token
notebook_login()

Troubleshooting Guide for Running DeepSeek-Janus-Pro-1B on Google Colab

1. 401 Unauthorized Error (Authentication Issue)

Main Cause: Model is private on Hugging Face.

How to Fix:

  • Generate a token from Hugging Face Tokens.
  • Authenticate in Colab:
Python
from huggingface_hub import notebook_login
notebook_login()

2. Model Not Found Error

Main Cause: Incorrect model path or private model.

How to Fix:

  • Verify the model name on Hugging Face.
  • Use the GitHub repo if not on Hugging Face:
Python
!git clone <GitHub_Repository_URL>
%cd <Repository_Folder_Name>

3. Dependency Issues

Main Cause: Missing packages.

How to Fix:

Python
!pip install -r requirements.txt
!pip install flash-attn transformers accelerate

4. CUDA/GPU Issues

Main Cause: GPU not enabled or CUDA errors.

How to Fix:

  • Enable GPU: Runtime > Change runtime type > GPU.
  • Verify GPU:
Python
import torch
print(torch.cuda.is_available())
  • Force CPU if needed:
Python
vl_gpt = vl_gpt.to(torch.bfloat16).cpu().eval()

5. Out of Memory (OOM) Errors

Main Cause: Too many tokens or large model size.

How to Fix:

  • Reduce token size:
Python
outputs = vl_gpt.language_model.generate(max_new_tokens=256)
  • Clear memory:
Python
import torch
torch.cuda.empty_cache()

6. File Not Found Error

Main Cause: Missing image or incorrect path.

How to Fix:

  • Upload image:
Python
from google.colab import files
uploaded = files.upload()
  • Verify file path:
Python
image_path = "/content/<Image_File_Name>.png"

Conclusion

To successfully run DeepSeek-Janus-Pro-1B on Google Colab, make sure to clone the correct repository, install all required dependencies, and authenticate with Hugging Face if needed. Enable GPU in Colab for faster performance and adjust model settings to avoid memory issues. By following these steps and troubleshooting common errors, you can easily leverage the model's powerful text and image processing capabilities for your projects.


How do I set up Google Colab for DeepSeek-Janus-Pro-1B?

To set up Google Colab for DeepSeek-Janus-Pro-1B:

  1. Enable GPU: Go to Runtime → Change runtime type → Select GPU.
  2. Install required libraries like torch and transformers.
  3. Clone the DeepSeek repository (if available).

How do I load the model?

To load the model:

  1. Use transformers to load the model and tokenizer from the DeepSeek repository or Hugging Face.
  2. Move the model to the GPU for faster processing.

Why am I getting an "Out of Memory" error?

To fix "Out of Memory" errors:

  1. Colab’s free GPU has limited VRAM (12GB).
  2. Use mixed precision (e.g., bfloat16) or reduce batch size/sequence length to save memory.

How do I process images with DeepSeek-Janus-Pro-1B?

To process images with DeepSeek-Janus-Pro-1B:

  1. Use the provided image processor (e.g., VLChatProcessor) to load and preprocess images.
  2. Combine image and text inputs for multimodal tasks.

How do I generate text responses?

To generate text responses:

  1. Pass the processed inputs to the model’s generate method.
  2. Decode the output tokens using the tokenizer to get the final response.

What if the DeepSeek repository is private?

To access the DeepSeek repository if it’s private:

  1. Contact DeepSeek for access or check their official website for instructions.
  2. Use similar open-source models like LLaVA or InstructBLIP in the meantime.
Comment