Everyone is afraid of AI cyberattacks, but experiments show the opposite

  • The report shows that LLM-generated malware continues to fail basic tests in real-world environments.
  • GPT-3.5 immediately generated malicious scripts and exposed major security flaws
  • The improved guardrails in the gpt-5 transformed the results into safer and non-harmful alternatives.

Despite the growing fear of armed LLMs, recent experiments have shown that the likelihood of negative consequences is far from reliable.

Netscope researcher tried whether modern language models could support the next wave of autonomous cyberattacks, with the aim of determining whether these systems can generate functional malicious code without relying on hard-coded logic.

The experiment focused on core functions of tax evasion, exploitation and operational security, and produced surprising results.

Reliability issues in real world environments.

The first step was to let GPT-3.5-Turbo and GPT-4 create Python scripts that tried to inject processes and kill security tools.

GPT-3.5-Turbo immediately produced the desired result, while GPT-4 refused to do so until a simple Persona dropped its cover.

The test showed that it is still possible to bypass the protections even if the models add more restrictions.

After confirming that code generation was technically possible, the team moved on to operational testing, which required both models to create scripts designed to detect virtual machines and respond accordingly.

These scripts were then tested on VMware Workstation, an AWS Workspace VDI, and a regular physical machine, but often failed, detected errors in different environments, or did not work consistently.

The logic worked fine on physical hosts, but the same script did not work on virtual cloud spaces.

These results support the idea that AI tools can immediately support automated malware that can adapt to different systems without human intervention.

The restrictions have also increased the value of traditional defenses, such as firewalls or antivirus, as untrusted code is less likely to be able to evade them.

With GPT-5, Netskope saw significant improvements in code quality, especially in cloud environments where older models struggled.

However, the improved security created new problems for anyone trying to exploit it with intent, as the model no longer denied requests but instead redirected output to more secure functions, making the resulting code unusable for multi-step attacks.

The team was allowed to use more complex propositions and still got a result that contradicted the desired behavior.

This change suggests that greater reliability comes with stronger built-in controls, as testing shows that large models in controlled environments can produce malicious logic, but the code remains inconsistent and often inefficient.

Today, fully autonomous attacks no longer exist and real incidents still require human oversight.

It remains possible that future systems will plug reliability gaps faster than security barriers can compensate, especially as malware experiments.