in

9to5Neural: DeepSeek defined, deep NVIDIA losses, AI privateness declare debunked


Welcome to 9to5Neural. AI moves fast. We help you keep up. Last week we mentioned that American AI firms are seeing deep competition from DeepSeek R1 out of China. Today DeepSeek’s impact has reached Wall Street as NVIDIA stock drops 17%. Let’s take a closer look at DeepSeek, NVIDIA’s response, and the bigger picture for AI development.

What is DeepSeek?

DeepSeek is simply a Chinese AI firm born out of a hedge fund called High-Flyer. Liang Wengeng founded the company in 2023, and it’s based in Hangzhou, Zhejiang, China. Wengeng co-founded High-Flyer seven years earlier, focusing on AI investments.

DeepSeek began training its models before the U.S. government restricted China’s access to American AI chips. For this reason, the company is expected to have a healthy supply of NVIDIA GPUs from before restrictions were imposed.

Still, DeepSeek has needed to operate under the constraints of limited access to additional NVIDIA hardware. This constraint may have forced DeepSeek to focus on the innovation it touts with its V3 model.

What DeepSeek has shown is the ability to compete with OpenAI’s brand new o3 model. ChatGPT o3 is the successor to o1, possibly because O2 is an established UK phone carrier.

Anyway, DeepSeek has created a model that is virtually as competitive while requiring dramatically fewer resources and costing a small percentage of the cost to run compared to OpenAI’s chatbot.

DeepSeek ended up here by focusing on distilling existing models rather than spinning up models using the same strategy as American companies. It’s fair to say that DeepSeek heavily benefits from the work that has thus far been done by the AI firms we already know. At the same time, DeepSeek has necessarily needed to focus on optimizing existing models through distillation due to U.S. restrictions on exporting American AI chips to China.

DeepSeek training methodology

That’s only the story so far. What happens next is still to be determined, but I think we can bet on OpenAI and other American AI firms prioritizing model distillation to bring operation costs down and stay competitive. In other words, DeepSeek hasn’t achieved anything American AI firms can’t replicate. It’s just a matter of prioritizing model efficiency now that the competition has arrived.

But prioritizing model distillation isn’t the only thing that helped DeepSeek arrive in the AI race. DeepSeek has also relied on AI training AI. American AI firms still use human-in-the-loop training that puts an importance on human-labeled datasets.

The benefit of the AI-training-AI method is that training is much more scalable as it requires less human input. The challenge, however, is that errors can be amplified. It also makes AI alignment checks more difficult. Alignment is another way of saying that our AI models reflect our values and operate as we intend.

Supervised fine-tuning and reinforcement learning from human feedback is what makes our AI models provide unbiased responses. In other words, we make sure the data is good.

While I don’t expect a violent shift in how American AI firms ensure data quality, I do believe we’ll see sizable movement toward AI training AI. This was always the goal for OpenAI and similar firms; DeepSeek may have just applied pressure to go there sooner.

$6 million tanks $600 billion

If you follow DeepSeek, you’ll likely come across a $6 million figure that comes from their research paper covering its newest model. The claim is that V3 was developed for under $6 million using less capable NVIDIA H800 hardware. However, this claim can be true while also omitting investment costs associated with training earlier models — not to mention the NVIDIA supply acquired prior to U.S. AI chip export restrictions.

Another figure to analyze: $600 billion. That’s the amount of market cap that NVIDIA lost today alone. That’s the result of investors being spooked by DeepSeek models being cheaper to train and cheaper to run, meaning less opportunity than expected for NVIDIA growth.

I think this is extremely shortsighted and an overreaction. My thinking is this: DeepSeek has demonstrated a great efficiency in how current AI models can be developed. Great! That may shrink the time it takes to develop the next major evolution of AI models.

In other words, throwing more NVIDIA GPUs at the problem is likely still the answer to pushing forward AI technology — we might just get further, faster now. Remember: the AI race is forward, not to where we are now.

AI isn’t a solved problem

Which leads to OpenAI’s massive Stargate Project. Stargate is basically meant to be a building in Texas that’s packed to the gills with compute. Say future AI models can achieve more with less compute. That just means that these AI models will be able to accomplish even more with the existing amount of compute that Stargate targets.

There’s a real gap between where these firms want to go with AI and where we are today. The impact of DeepSeek may just be it forced other AI firms to prioritize different goals for now. We’ll need to see what comes out of DeepSeek next to have a fair sense of whether or not they’re a more innovative firm.

A few other notes.

NVIDIA found the silver lining in DeepSeek’s work with this statement issued today:

DeepSeek is an excellent Al advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.

In other words, we’re building a better airplane mid-flight, but we still need jet fuel to fly.

NVIDIA is still up 93% year-over-year and 1,782% over the last five years.

OpenAI will be much more generous with ChatGPT o3-mini when it arrives due in large part to DeepSeek’s competition.

After publishing on Monday, OpenAI boss Sam Altman responded on X to the attention DeepSeek is garnering:

deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price. we will obviously deliver much better models and also it’s legit invigorating to have a new competitor! we will pull up some releases.

but mostly we are excited to continue to execute on our research roadmap and believe more compute is more important now than ever before to succeed at our mission. the world is going to want to use a LOT of ai, and really be quite amazed by the next gen models coming.

look forward to bringing you all AGI and beyond.

Fair summation of DeepSeek’s achievement, and obviously is doing a lot of work in that sentence.

President Trump addressed the DeepSeek effect on Monday, per Reuters:

The release of DeepSeek, AI from a Chinese company should be a wakeup call for our industries that we need to be laser-focused on competing to win.

I’ve been reading about China and some of the companies in China, one in particular coming up with a faster method of AI and much less expensive method, and that’s good because you don’t have to spend as much money. I view that as a positive, as an asset.

I view that as a positive because you’ll be doing that too, so you won’t be spending as much, and you’ll get the same result, hopefully.

We always have the ideas. We’re always first. So I would say that’s a positive that could be very much a positive development. So instead of spending billions and billions, you’ll spend less, and you’ll come up with, hopefully, the same solution.

The AI race is on, folks, and the AI industry is the new NASA.

DeepSeek has slowed down new account creation today due to a large-scale cyber attack impacting the service. This message currently reads across the top of chat.deepseek.com:

Due to large-scale malicious attacks on DeepSeek’s services, registration may be busy. Please wait and try again. Registered users can log in normally. Thank you for your understanding and support.

However, we were able to create a new account after a few hours of trying on Monday.

You may also have seen a viral social media post claiming that installing DeepSeek on iOS gives the Chinese AI firm deep access to personal data on your iPhone, including email and messages. Fortunately, that’s not how iOS architecture functions. You can even create an account using Sign in with Apple, which can generate a throwaway email address for additional security. However, DeepSeek does have access to what you input into the chatbot.

Also, DeepSeek still suggests talking about math, coding, and logic problems instead when asked about what happened in 1989 at Tiananmen Square. However, Perplexity seems to have cracked that issue.

More on the latest in AI developments in the next edition of 9to5Neural — only on 9to5Mac! Read the previous issue here.

Top iPhone accessories

FTC: We use income earning auto affiliate links. More.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

When Does the IRS Begin Accepting Tax Returns? Discover Out – Hollywood Life

Ethereum Basis Sells One other 100 ETH, However There’s Nonetheless ‘Hopium’ For Holders