DeepSeek V4 Support (WIP) #22376
Replies: 5 comments 23 replies
-
|
Created a draft with no intentions of merging: #22378 Just so it is easy to see the diff. |
Beta Was this translation helpful? Give feedback.
-
|
tecaprovn/deepseek-v4-flash-gguf cannot success ,do you have gguf model can be use? thanks |
Beta Was this translation helpful? Give feedback.
-
|
@wuwenthink I can upload a bf16 gguf ... |
Beta Was this translation helpful? Give feedback.
-
|
https://github.com/Fringe210/llama.cpp-deepseek-v4-flash-cuda <-- it works (18tokens/sec on single RTX 6000 96 gb) . 100% Minimax+clina+vstudio so needs a lot of love. Maybe it helps. |
Beta Was this translation helpful? Give feedback.
-
|
I got CUDA working on NVIDIA GB10 (128 GB unified memory) with antirez's fork and GGUF. The crash ( PR on antirez's fork: antirez/llama.cpp-deepseek-v4-flash#4 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
So want pretty deep into optimizing DeepSeek V4 on my experimental branch, before I realized it wasn't the upstream version. I went back and ported to the upstream base, but it performs a little slower than my experimental branch.
I will share both.
Here is the Work in Progress, based on the upstream version: https://github.com/nisparks/llama.cpp/tree/wip/deepseek-v4-support
The experimental branch is still a work in progress and will share later.
Uploaded the GGUF here: https://huggingface.co/nsparks/DeepSeek-V4-Flash-FP4-FP8-GGUF
Beta Was this translation helpful? Give feedback.
All reactions