Nvidia Develops New AI Technology That Reduces GPU Usage by 8x Without Sacrificing Accuracy

5 months ago | Artificial Intelligence

Jakarta, INTI - Nvidia has developed a new technique that claims to reduce artificial intelligence (AI) computing memory requirements by up to eight times without lowering model accuracy.

The technology, called dynamic memory sparsification (DMS), is designed to optimize memory usage in large language models (LLMs) during reasoning processes.

With this technique, the load on GPUs can be significantly reduced, allowing AI systems to run more efficiently.

Addressing Memory Bottlenecks

In large language models like those used in modern chatbots, the reasoning process generates what is known as key-value cache (KV cache), a temporary memory structure that continues to grow as the model produces tokens one by one while “thinking.”

The longer the reasoning process, the more GPU memory is required. This has become one of the main bottlenecks in developing large-scale AI systems, as it increases computing costs and limits the number of users that can be served simultaneously.

According to Nvidia, DMS enables models to “manage their own memory” by selecting which tokens need to be retained and which can be discarded, without compromising output quality.

No Accuracy Trade-Off

This approach differs from previous methods that relied on fixed rules (heuristics) to delete older memory entries. Those earlier techniques often sacrificed accuracy by removing important information.

Read Also

UGM Strengthens AI Learning Ecosystem through Industry Partnership and...

Jakarta, INTI - In the education sector, particularly in higher education, artif...

6 hours ago

In contrast, DMS trains the model to recognize which tokens are truly relevant for subsequent reasoning steps.

Nvidia also implements a delayed eviction mechanism, meaning token deletion is postponed so the model can fully absorb important context before memory is cleared.

In tests conducted on several models, including Qwen and Llama, DMS demonstrated improved efficiency without any decline in performance.

As reported by VentureBeat, in several mathematics and coding benchmarks, models equipped with DMS even achieved higher scores than the standard versions operating under the same computational budget.

This memory efficiency has a direct impact on GPU usage. With a smaller cache, GPUs no longer need to continuously read and write large volumes of data, reducing latency and increasing throughput.

In tests comparing the standard (vanilla) Qwen3-8B model with the version enhanced by Dynamic Memory Sparsification (DMS), both demonstrated nearly identical accuracy levels across various reasoning benchmarks, including MATH 500, HumanEval, and AIME 2024.

In some evaluations, the DMS version even recorded slightly higher scores. The most significant differences were observed in memory efficiency and performance stability.

The standard Qwen3-8B model tends to experience memory usage spikes as context length increases, sometimes resulting in “out of memory” errors.

In contrast, the DMS-enabled version maintains stable generation times and avoids memory exhaustion, allowing the model to process longer contexts without excessively burdening the GPU.

For companies, these savings are considered substantial, as AI infrastructure costs today are heavily dependent on GPU capacity and memory.

Can Be Applied to Existing Models

Nvidia stated that DMS can be applied topre-trained models without requiring retraining from scratch. The adaptation process is described as relatively lightweight and compatible with standard inference infrastructure.

The technology has been released as part of Nvidia’s Model Optimizer framework and can be integrated into AI pipelines built on Hugging Face, as well as systems supporting FlashAttention.

Conclusion

Nvidia’s Dynamic Memory Sparsification (DMS) represents a significant step forward in AI efficiency. By dramatically reducing memory usage without compromising accuracy, the technology addresses one of the biggest bottlenecks in large-scale AI deployment: GPU and memory constraints. With compatibility for existing pre-trained models and standard inference infrastructure, DMS offers a practical path for companies to lower AI infrastructure costs while maintaining high performance. As AI workloads continue to scale, innovations like DMS could play a crucial role in making advanced models more accessible and sustainable.

technology ArtificialIntelligence Nvidia

10 Countries Predicted to Become Global Economic Giants by 2030: Who Will Lead? Here are the Latest ...

1 year ago | News

Here's the List of Countries with the Most Hackers in the World, Where Does Indonesia Rank?

2 years ago | Cybersecurity

APJII Records Indonesia's Internet Penetration Reaches 80.66 Percent in 2025

11 months ago | Network Infrastructure

Trends in Internet Penetration in Indonesia in 2024

2 years ago | News

Nvidia Develops New AI Technology That Reduces GPU Usage by 8x Without Sacrificing Accuracy

Read Also

UGM Strengthens AI Learning Ecosystem through Industry Partnership and...

Popular News

#1

Indonesia Concludes 700 MHz and 2.6 GHz Spectrum Auction to Accelerate Digital T...

#2

Indonesia and China Sign Three MoUs Worth Rp36 Trillion to Advance Green Energy,...

#3

PLN is Ready to Become the Offtaker of Electricity Generated by the Waste to Ene...

#4

Telkomsel Obtained 100 MHz of Spectrum, Ready to Strengthen Its 5G Services in I...

#5

Nongsa Changi Cable Landing Marks a New Era of Ultra-Fast Digital Connectivity

More Articles

10 Countries Predicted to Become Global Economic Giants by 2030: Who Will Lead? Here are the Latest ...

Here's the List of Countries with the Most Hackers in the World, Where Does Indonesia Rank?

APJII Records Indonesia's Internet Penetration Reaches 80.66 Percent in 2025

Trends in Internet Penetration in Indonesia in 2024

Nvidia Develops New AI Technology That Reduces GPU Usage by 8x Without Sacrificing Accuracy

Read Also

UGM Strengthens AI Learning Ecosystem through Industry Partnership and...

Popular News

#1

Indonesia Concludes 700 MHz and 2.6 GHz Spectrum Auction to Accelerate Digital T...

#2

Indonesia and China Sign Three MoUs Worth Rp36 Trillion to Advance Green Energy,...

#3

PLN is Ready to Become the Offtaker of Electricity Generated by the Waste to Ene...

#4

Telkomsel Obtained 100 MHz of Spectrum, Ready to Strengthen Its 5G Services in I...

#5

Nongsa Changi Cable Landing Marks a New Era of Ultra-Fast Digital Connectivity

More Articles

10 Countries Predicted to Become Global Economic Giants by 2030: Who Will Lead? Here are the Latest ...

Here's the List of Countries with the Most Hackers in the World, Where Does Indonesia Rank?

APJII Records Indonesia's Internet Penetration Reaches 80.66 Percent in 2025

Trends in Internet Penetration in Indonesia in 2024

Join Our Mailing List