Skip to content

vllm.models.deepseek_v4.nvidia

Modules:

Name Description
flashinfer_sparse

DeepSeek V4 FlashInfer TRTLLM-gen sparse MLA backend.

flashmla
model
mtp

MTP draft model for DeepSeek V4 (internal codename: DeepseekV4).

ops

NVIDIA-only (cutedsl/cutlass) kernels for DeepSeek V4.