AI Market Shifts to Token Pricing as Groq Challenges NVIDIA

The artificial intelligence industry continues to expand at a massive rate. An expert from the computing infrastructure company Nebius recently sat down with AlphaSense to discuss these rapid changes. The expert explained that while NVIDIA still builds the absolute fastest graphics processing units, tech companies actively seek cheaper alternatives.

The entire tech world is currently changing how it measures computing costs. Right now, demand for raw computing power remains incredibly high. Because so many companies want access, hardware providers run their machines at exactly 100 percent utilization rates. This constant use helps them drive down costs and earn massive profits from their expensive equipment.

3rd party Ad. Not an offer or recommendation by hardwareanalytic.com.

For the last few years, the industry rented these powerful computers by the hour. The Nebius expert shared the exact prices companies pay today for on-demand server access. Renting a standard NVIDIA H100 chip costs exactly $2.95 per hour. If a company wants the slightly better H200 model, they pay $3.50 per hour. Meanwhile, the brand-new Blackwell B200 chips cost anywhere from $4.90 to $6.50 per hour of computing time. These high hourly rates force small startups to spend millions of dollars just to keep their software running.

Big tech companies find ways to lower these massive bills through long-term commitments. If a buyer reserves server capacity in advance, the hourly rates drop significantly. However, securing these steep discounts requires a massive upfront investment. A buyer must sign a strict contract lasting 1 or 2 years and agree to rent at least 10,000 graphics chips simultaneously. Under these massive bulk contracts, the older H100 chip drops to just $1.50 per hour. The H200 falls to $2.20 per hour, and the premium B200 starts at exactly $3.50 per hour.

3rd party Ad. Not an offer or recommendation by hardwareanalytic.com.

A massive shift in how software works caused these pricing models to change. Near the end of 2025, NVIDIA shocked the tech world by signing a massive non-exclusive licensing agreement with a hardware startup named Groq. At the time, this contract was the largest deal NVIDIA had ever made. The agreement specifically focused on artificial intelligence inference technology. The Nebius expert noted that inference tasks now account for 90 to 95 percent of all enterprise computing demand. Instead of spending months training brand-new software brains from scratch, modern businesses simply plug into existing pretrained models through basic web application programming interfaces.

Because companies now only run inference tasks, they no longer want to rent hardware by the hour. The entire tech industry is now shifting toward a brand-new cost structure. Cloud providers started charging customers based entirely on the total number of tokens their software generates. A token serves as the basic building block of artificial intelligence language. Providers usually bill their clients for every 1 million tokens generated. Under this new system, alternative chips suddenly look highly attractive to budget-conscious developers.

The startup company Groq is perfectly positioned to win this new pricing war. The Nebius expert stated that Groq chips offer incredibly budget-friendly rates. Using Groq hardware costs customers only 5-10 cents per 1 million tokens generated. NVIDIA cannot match these rock-bottom prices with its current hardware lineup. Running inference tasks on an NVIDIA B100, B200, or B300 chip costs exactly 25 cents per 1 million tokens. This means the NVIDIA system costs up to 5 times more money to run the same language task.

3rd party Ad. Not an offer or recommendation by dailyalo.com.

Groq does not just win on pricing. The smaller startup also crushes the giant tech company in raw generation speed. The Nebius expert revealed that Groq chips push out an incredible 800 tokens every single second.

In direct comparison, the premium NVIDIA chips produce only roughly 450 tokens per second. This means Groq delivers nearly double the output speed for a fraction of the cost. As more software developers realize they can achieve faster results and save massive amounts of cash, the industry will likely move away from the traditional hourly rental model for good.

Latest