DeepSeek is the Moby Dick of generative AI. After shaking up global markets, some are now trying to bring it down. Nvidia lost $589 billion in market value, recording a 17% drop in just one day. Meanwhile, Donald Trump is urging major tech companies to ramp up their efforts in the AI sector to contrast China.
DeepSeek stands out for its efficiency: it requires less computing power and has a lower environmental impact compared to competing models. Moreover, being open-source, any developer can download and modify it as they see fit. However, despite these advantages, concerns have quickly emerged regarding user data management and the platform’s transparency.
The Privacy Issue: DeepSeek and the Italian Regulator
In Italy, the Data Protection Authority has imposed an immediate restriction on the processing of Italian users’ data by DeepSeek, whose provider companies are based in China. The decision came after the company‘s assurances regarding data management were deemed insufficient.
DeepSeek, user data and China servers
The company’s website notice leaves no doubt: ‘We store the information collected on secure servers located in the People’s Republic of China.’ In other words, all conversations and questions sent to DeepSeek, along with the generated responses, could be sent to China. The company categorizes the data collected into three categories: information provided directly by users (text or audio inputs), data collected automatically (device details, operating system, and IP address), and data from external sources (Google or Apple, if access is made through these services). Although users have the option to delete their chat history, the issue of data handling and transfer remains open and controversial.
‘Thanks to the DeepSeek-R1 model, DeepSeek surpasses industry benchmarks in terms of capacity, cost, and efficiency. The incredible thing is that they developed it in just two months, spending less than $6 million, while companies like OpenAI are investing billions. The model is open-source, super-efficient, and can be used on consumer hardware, like a Mac or even a Raspberry Pi. This efficiency challenges the business model of big tech companies, which rely on massive investments in infrastructure. DeepSeek wants to prove that ‘more money’ does not equal ‘more innovation.’ And this comes just as companies like Nvidia, Meta, and Microsoft are making big announcements about AI spending. However, many questions remain unanswered in the controversies encountered,’ explains Federica Urzo, an analyst at the Luiss Data Lab.ƒ”
Integrated censorship and evasion methods
Another controversial aspect of DeepSeek is its apparent censorship system, which prevents users from receiving information on certain topics deemed sensitive in China. These include issues such as press freedom, human rights, and political protests. The chatbot provides an initial response before abruptly interrupting or changing its output to a generic message. This suggests the presence of an active monitoring mechanism capable of correcting the system’s responses in real time.
Why using leet speak bypasses censorship
Some users bypass this censorship by using leet speak, a language that replaces certain letters with numbers or symbols. Instead of asking for ‘Tienanmen,’ one can write ‘T1en@nm3n,’ tricking the system into not immediately recognizing the term. However, not all topics can be bypassed: topics like Xi Jinping and Taiwan cannot be evaded even with leet speak or hexadecimal codes.
This phenomenon is tied to the way artificial intelligence models work. Pierpaolo Balbi, Associate Professor of Computer Science at the University of Bari, explains how Large Language Models (LLMs) work. ‘LLMs are trained in three phases: pre-training, fine-tuning/instruction, and human relevance feedback. The first phase allows the model to generate sentences, the second trains it to solve specific tasks with question/answer examples, while the third aims to mitigate bias and hallucinations through human feedback. During this last step, it is possible to instruct the system not to respond to certain questions or to respond in a specific way.’ The use of leet speak alters the character sequence, making it harder for the system to detect the integrated censorship.
‘I downloaded the DeepSeek model trained with 32 billion parameters to run it locally. This way, the censorship present in the app and web interface, which blocked questions and answers related to Chinese topics, is bypassed. The fact that the local version of the model also answers these questions suggests that it was trained without bias or restrictions, and that the ‘filter’ or censorship was likely applied afterwards, at least in the versions accessible to the general public,’ explains Domenico Cangemi, a mathematician at the Luiss Data Lab
DeepSeek and the comparison with big tech
Balbi emphasizes that the techniques used by DeepSeek in China, OpenAI, and Meta in America are fundamentally similar, with the main difference being the data used for training. DeepSeek has not yet provided official statements on the management of such data, and the European Commission has not initiated formal proceedings against it. One solution to reduce risks related to privacy and censorship is to run the model locally on one’s device, an option made possible by DeepSeek’s open-source nature, unlike ChatGPT. However, the issue of transparency regarding the sources remains, even for those using the chatbot offline.
According to a recent analysis by NewGuard for IDMO, DeepSeek’s chatbot repeated false statements 30% of the time, refused to respond 53% of the requests, and only contradicted misinformation 17% of the time. This highlights the need to carefully verify the accuracy of the information generated.