Table of Contents
# DeepSeek V4 Is Live: API Models, 1M Context, Pricing
Key takeaways
- DeepSeek’s official API pricing page now lists deepseek-v4-flash and deepseek-v4-pro, confirming the V4 API lineup is live in documentation.
- Both V4 models support a 1 million token context length, a maximum output of 384K tokens, JSON output, tool calls, and thinking/non-thinking modes.
- The pricing is aggressive: V4 Flash starts at $0.14 per 1M input tokens on cache miss and $0.28 per 1M output tokens, while V4 Pro is priced higher for more capable workloads.
DeepSeek V4 is no longer just a rumor or pre-launch benchmark story. DeepSeek’s official API documentation now lists two V4 models — deepseek-v4-flash and deepseek-v4-pro — with published pricing, context length, output limits, and feature support.
That makes this a major update for developers comparing frontier AI models on cost, long-context capability, and production API access. DeepSeek has already pressured the AI market with cheaper model pricing, and V4 appears designed to push that pressure further by combining a very large context window with aggressive token costs.
The headline number is context length: both DeepSeek V4 Flash and DeepSeek V4 Pro are listed with a 1M token context. The docs also show a maximum output of 384K tokens, support for JSON output, tool calls, chat prefix completion, and fill-in-the-middle completion in non-thinking mode.
What DeepSeek V4 models are available?
DeepSeek’s API docs list two V4 model names:
| Model | Official model version | Positioning |
|---|---|---|
| deepseek-v4-flash | DeepSeek-V4-Flash | Lower-cost V4 model for high-volume workloads. |
| deepseek-v4-pro | DeepSeek-V4-Pro | Higher-priced V4 model for more demanding tasks. |
The docs also say the older compatibility model names deepseek-chat and deepseek-reasoner will be deprecated in the future. For compatibility, they correspond to the non-thinking and thinking modes of deepseek-v4-flash.
That compatibility note matters for existing developers. If your app currently calls deepseek-chat or deepseek-reasoner, you may not need an immediate rewrite, but you should plan a migration path to explicit V4 model names.
What are the key DeepSeek V4 specs?
DeepSeek’s API documentation lists the following shared V4 capabilities:
- Context length: 1 million tokens
- Maximum output: 384K tokens
- Thinking mode: supports both non-thinking and thinking modes, with thinking enabled by default
- JSON output: supported
- Tool calls: supported
- Chat prefix completion: beta support
- FIM completion: beta support in non-thinking mode only
- API formats: OpenAI-compatible base URL and Anthropic-compatible base URL
The combination of 1M context and 384K max output is especially important for document-heavy workflows. It could make DeepSeek V4 useful for analyzing large codebases, long legal or policy documents, research collections, and multi-file technical projects — assuming real-world quality holds up under production testing.
How much does DeepSeek V4 cost?
DeepSeek lists prices per 1 million tokens. The price gap between Flash and Pro is large:
| Price item | DeepSeek V4 Flash | DeepSeek V4 Pro |
|---|---|---|
| 1M input tokens, cache hit | $0.028 | $0.145 |
| 1M input tokens, cache miss | $0.14 | $1.74 |
| 1M output tokens | $0.28 | $3.48 |
For high-volume apps, V4 Flash is the obvious pricing story. At $0.14 per 1M input tokens on cache miss and $0.28 per 1M output tokens, it is positioned for large-scale inference where even small per-token differences can change the budget.
V4 Pro is much more expensive, but still may be attractive if it delivers better reasoning, instruction following, or coding quality. Teams should not choose based on model name alone. The smart test is to run both models on the same internal workload and compare quality per dollar.
Why does 1M context matter?
A 1 million token context window changes what developers can attempt. Instead of slicing documents into many small chunks, a long-context model can inspect more of the original material at once. That can reduce retrieval mistakes and make it easier to ask questions across large files.
For software engineering, the obvious use cases are repository analysis, migration planning, security review, and debugging across many related files. For business users, the strongest use cases are contract comparison, policy review, research synthesis, and large knowledge-base summarization.
But context length is not the same as accuracy. A model can accept a million tokens and still miss details, over-focus on recent text, or fail to cite the right evidence. DeepSeek V4 needs real tests on long-context retrieval and multi-step reasoning before teams rely on it for critical work.
What should developers do now?
If you already use DeepSeek’s API, the first step is to check whether your current model names map to V4 compatibility behavior. The docs say deepseek-chat and deepseek-reasoner correspond to non-thinking and thinking modes of deepseek-v4-flash for compatibility.
For new projects, start with a controlled test:
- Run the same prompt set on deepseek-v4-flash and deepseek-v4-pro.
- Include at least one long-context task, one coding task, and one structured JSON output task.
- Track cost, latency, format reliability, and hallucination rate.
- Compare results against your current model, not only against DeepSeek’s own pricing table.
Developers should also review DeepSeek’s data handling and compliance requirements before sending sensitive business or customer data to any third-party model API.
Why this matters in the AI model race
DeepSeek V4 puts renewed pressure on the frontier AI market because it combines three things buyers care about: long context, tool support, and low pricing. OpenAI, Anthropic, Google, Meta, and DeepSeek are now competing not only on benchmark scores but also on how cheaply models can complete real workflows.
For Hubkub readers, the main question is practical: can DeepSeek V4 reduce AI operating costs without causing quality or trust problems? If V4 Flash is good enough for everyday automation, it could become a default low-cost model for many developers. If V4 Pro performs closer to frontier closed models, it could become a serious option for heavier workloads.
For wider context, see Hubkub’s AI tools and guides hub and the existing canonical DeepSeek V4 guide.
Common Questions —
Q: Is DeepSeek V4 officially listed?
A: Yes. DeepSeek’s official API pricing page lists deepseek-v4-flash and deepseek-v4-pro, including model versions, context length, maximum output, feature support, and pricing.
Q: What is the context length of DeepSeek V4?
A: DeepSeek’s API docs list a 1 million token context length for both V4 Flash and V4 Pro. The maximum output is listed as 384K tokens.
Q: How much does DeepSeek V4 Flash cost?
A: DeepSeek lists V4 Flash at $0.028 per 1M input tokens on cache hit, $0.14 per 1M input tokens on cache miss, and $0.28 per 1M output tokens.
Q: Should I use DeepSeek V4 Flash or V4 Pro?
A: Start with V4 Flash if cost is the main constraint and test V4 Pro for tasks where reasoning quality, instruction following, or reliability may justify higher pricing. Do not choose solely by model name.
Conclusion
DeepSeek V4 is now visible in official API documentation, and the details are substantial: V4 Flash, V4 Pro, 1M context, 384K max output, thinking mode, tool calls, JSON output, and aggressive pricing. This is exactly the kind of release that can change model selection for developers who care about cost and long-context workflows.
The right move now is not blind migration. It is structured testing. Run DeepSeek V4 against your real prompts, compare Flash and Pro, measure cost per successful task, and verify whether the long context actually improves outcomes. If it does, DeepSeek V4 could become one of the most important AI infrastructure releases of 2026.








