Politically charged searches cause DeepSeek-R1 to produce dangerous code

  • Experts believe DeepSeek-R1 produces dangerously insecure code when including political terms in prompts.
  • Half of politically sensitive requests cause DeepSeek-R1 to refuse to generate code
  • Politically explosive requests often involve encrypted secrets and insecure input processing.

When DeepSeek-R1, a Chinese language model (LLM), was released in January 2025, it caused a stir and has since been widely used as a coding assistant.

Independent testing of massive strike argue that model outputs can vary significantly depending on seemingly irrelevant contextual modifiers.

The team tested 50 coding tasks in multiple security categories with 121 trigger word configurations, running each prompt five times, for a total of 30,250 tests. Responses were scored based on a vulnerability score from 1 (secure) to 5 (critically vulnerable).

Politically sensitive issues spoil the result

The report shows that DeepSeek-R1 produced code with serious security flaws when political or sensitive terms such as Falun Gong, Uyghur or Tibet were included in the prompts.

These include hard-coded secrets, insecure handling of user input, and in some cases completely invalid code.

Researchers say these politically sensitive triggers can increase the likelihood of dangerous spending by 50% compared to basic advice without these words.

By experimenting with more complex instructions, DeepSeek-R1 created functional applications with login forms, databases and management panels.

However, these applications lacked session management and basic authentication, exposed sensitive user data, and after repeated attempts, up to 35% of implementations contained a weak or missing password hash.

Simpler requests, such as website requests for soccer fans, caused less severe problems.

Therefore, CrowdStrike argues that politically sensitive triggers have a disproportionate impact on code security.

The model also showed an inherent kill switch, as DeepSeek-R1 refused to generate code for some politically sensitive requests almost half the time after scheduling a response.

It is clear from the review of the argumentative tracks that the model had developed a technical plan internally, but ultimately did not support it.

The researchers believe this is due to the censorship built into the model to comply with Chinese regulations. They note that the political and ethical biases of the model can have a direct impact on the reliability of the generated code.

When it comes to politically sensitive topics, LLMs generally tend to reflect the ideas of the mainstream media, but this can be in stark contrast to other credible news outlets.

DeepSeek-R1 remains a powerful encryption model, but these experiments show that AI tools, including chatgpt, can introduce hidden risks into enterprise environments.

Organizations relying on LLM-generated code must conduct extensive internal testing prior to implementation.

In addition to security layers such as a firewall and antivirus are still important because the model can produce unpredictable or vulnerable results.

The bias built into the model weights represents a new risk to the supply chain that can affect code quality and overall system security.