deepseek-v3.1
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
| Provider | Source | Input Price ($/1M) | Output Price ($/1M) | Description | Free |
|---|---|---|---|---|---|
| vercel | vercel | Input: $0.30 | Output: $1.00 | DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats. | |
| poe | poe | Input: $7,800.00 | Output: - | Latest Update: Terminus Enhancement This model has been updated with the Terminus release, addressing key user-reported issues while maintaining all original capabilities: - Language consistency: Reduced instances of mixed Chinese-English text and abnormal characters - Enhanced agent capabilities: Optimized performance of the Code Agent and Search Agent Core Capabilities DeepSeek-V3.1 is a hybrid model supporting both thinking mode and non-thinking mode, built upon the original V3 base checkpoint through a two-phase long context extension approach. Technical Specifications Context Window: 128k tokens File Support: PDF, DOC, and XLSX files File Restrictions: Does not accept audio and video files | |
| nvidia | models-dev | Input: $0.00 | Output: $0.00 | Provider: Nvidia, Context: 128000, Output Limit: 8192 | |
| siliconflowcn | models-dev | Input: $0.27 | Output: $1.00 | Provider: SiliconFlow (China), Context: 164000, Output Limit: 164000 | |
| chutes | models-dev | Input: $0.20 | Output: $0.80 | Provider: Chutes, Context: 163840, Output Limit: 65536 | |
| azure | models-dev | Input: $0.56 | Output: $1.68 | Provider: Azure, Context: 131072, Output Limit: 131072 | |
| siliconflow | models-dev | Input: $0.27 | Output: $1.00 | Provider: SiliconFlow, Context: 164000, Output Limit: 164000 | |
| iflowcn | models-dev | Input: $0.00 | Output: $0.00 | Provider: iFlow, Context: 128000, Output Limit: 64000 | |
| synthetic | models-dev | Input: $0.56 | Output: $1.68 | Provider: Synthetic, Context: 128000, Output Limit: 128000 | |
| submodel | models-dev | Input: $0.20 | Output: $0.80 | Provider: submodel, Context: 75000, Output Limit: 163840 | |
| azurecognitiveservices | models-dev | Input: $0.56 | Output: $1.68 | Provider: Azure Cognitive Services, Context: 131072, Output Limit: 131072 | |
| deepinfra | litellm | Input: $0.27 | Output: $1.00 | Source: deepinfra, Context: 163840 | |
| sambanova | litellm | Input: $3.00 | Output: $4.50 | Source: sambanova, Context: 32768 | |
| wandb | litellm | Input: $55,000.00 | Output: $165,000.00 | Source: wandb, Context: 128000 |