Generative AI increases risks of cyberattacks and data leaks

A new Patterns paper warns that adding generative AI to machine-learning systems can increase bias, opacity, and security risks. (CREDIT: Wikimedia / CC BY-SA 4.0)

Machine-learning systems already shape ordinary parts of life, from spam filters to product recommendations and social media feeds. Now a newer push is underway. It is folding generative AI into those systems to write code, label data, explain decisions, and even help make them.

That may sound efficient. Micheal Lones is not convinced it is wise.

In a paper published in the journal Cell Press Patterns, the Heriot-Watt University computer scientist argues that plugging large language models into machine-learning workflows can make those systems harder to understand, harder to audit, and more vulnerable to security failures, legal trouble, bias, and bad decisions. His central point is not that generative AI has no use in machine learning. Instead, it is that the tradeoffs are being underestimated.

“Machine-learning developers need to be aware of the risks of using GenAI in machine learning and find a sensible balance between improvements in capability and the risks that might come with that,” Lones says. “Given the current limitations of generative AI, I’d say this is a clear example of just because you can do something doesn’t mean you should.”

His paper is framed as a tutorial and practical warning for people building machine-learning systems, not as a report of a single new experiment. Moreover, it walks through the ways generative AI is being used now and where those uses can go wrong.

Machine-learning developers need to be aware of the risks of using GenAI in machine learning. (CREDIT: Shutterstock)

Where generative AI is creeping into the workflow

Lones describes four main roles for generative AI in machine learning. Generative AI can sit inside a machine-learning pipeline as part of the decision-making process. It can help design the pipeline and write code. It can generate synthetic training data or preprocess and label existing data. And it can analyze results or write reports about what a model has done.

Each role brings its own problems. Put several of them together, Lones argues, and the risks start to pile up.

“If you have GenAI working in a number of different ways within your machine-learning workflows or system, then they can interact in unpredictable and hard to understand ways,” he says. “My advice at the moment is to avoid adding too much complexity in terms of how we use GenAI in machine learning, particularly if you're in a sector that has high stakes that impact people’s lives and livelihood.”

That warning matters most in places like medicine and finance, where mistakes are not minor. In the paper, Lones uses two running examples to show how these risks could play out: a hospital triage system and a loan approval system.

The medical example involves an in-house hospital tool that uses a language model to judge how serious a case is and which specialists should handle it. The bank example relies on a commercial generative AI service. It decides whether loans should be approved, pulling in internal policy documents and other tools along the way.

In both cases, the systems are attractive for the same reason many companies are drawn to generative AI right now: speed, automation, and lower staffing costs.

They are also exactly the kinds of systems that could do real harm if they fail.

Running examples of ML systems that contain generative AI. (CREDIT: Cell Press Patterns)

Opaque systems, uncertain decisions

One major problem is that large language models make errors, and those errors are not always easy to spot.

They can hallucinate facts, produce flawed code, generate brittle designs, and return different answers to the same prompt. Lones argues that this is especially dangerous in machine learning. Developers may rely on AI-generated suggestions for steps that affect everything downstream, including training, evaluation, deployment, and monitoring.

The paper also stresses that newer or bigger models are not automatically better. In some cases, Lones notes, older or simpler models can still outperform flashy generative systems on specific tasks. Developers should ask whether they need generative AI at all before adding it.

That question becomes more pressing when explainability enters the picture.

“In areas like medicine or finance, there are laws about being able to show that the machine-learning system is reliable, and that you can explain how it reaches decisions,” says Lones. “As soon as you start using LLMs, that gets really hard, because they're so opaque.”

His concern is not only that these systems are difficult to interpret. It is that people may overestimate how much apparent explanations, including model “reasoning” traces, really tell them. Those traces can sound persuasive while still being unreliable.

Data leaks and technical debt

The paper spends a great deal of time on security and governance risks.

Remotely hosted models often require data to be sent to outside servers. That opens the door to data leakage and cybersecurity problems, especially if sensitive medical, financial, or internal business information is involved. In addition, systems that use retrieval tools, outside databases, or agentic AI features can widen that exposure further.

Lones also warns that generative AI can deepen classic machine-learning problems rather than solve them. Synthetic data may carry hidden bias from the original training data used to build the model. Generated labels or preprocessing steps may distort a dataset. AI-written code may contain mistakes, outdated packages, or even invented dependencies.

That kind of convenience can turn into technical debt later. Teams may have to maintain code they did not fully understand in the first place.

The issue of bias

Bias is another recurring issue. Because many generative models are trained on massive datasets scraped from the internet, they can absorb uneven representation, stereotypes, and unfair patterns. Those biases can then spill into data generation, feature engineering, model decisions, and written explanations.

“It's important for people in the general public to be aware of the limitations of GenAI systems,” says Lones. “Companies will deploy these systems to do things like cut costs, and this may improve the experience that end users get, but it may also have negative consequences, such as bias and unfairness.”

He argues that developers should manually review code and outputs, document exactly where generative AI was used, and think carefully about whether supposed efficiency gains are worth the risks.

The paper also makes clear that the danger is not limited to development. Problems can emerge after deployment too. This happens especially if remote models change over time, prompts stop behaving the same way, or users learn how to game the system.

Practical implications of the research

The paper offers a simple message for developers, companies, and the public: treat generative AI in machine learning as a source of tradeoffs, not magic.

For low-stakes uses, some risks may be acceptable. For systems that affect health, money, or access to services, Lones argues for much more caution. That means using human oversight, limiting unnecessary complexity, checking outputs by hand, watching for bias and security problems, and resisting the urge to hand off too much of the workflow to a system that may sound confident without being reliable.

His broader point is that automation can make machine learning more powerful while also making it more fragile. The more generative AI is woven through a system, the harder it may become to understand what that system is really doing. It also becomes harder to know who is responsible when it goes wrong.

Research findings are available online in the journal Cell Press Patterns.

The original story "Generative AI increases risks of cyberattacks and data leaks" is published in The Brighter Side of News.

Like these kind of feel good stories? Get The Brighter Side of News' newsletter.

AI AI bias AI in finance AI in healthcare AI security artificial intelligence explainable AI Generative AI large language models Machine Learning Research Science synthetic data

Shy CohenScience and Technology Writer

Shy Cohen
Writer

Shy Cohen is a Washington-based science and technology writer covering advances in artificial intelligence, machine learning, and computer science. Having published articles on MSN, AOL News, and Yahoo News, Shy reports news and writes clear, plain-language explainers that examine how emerging technologies shape society. Drawing on decades of experience, including long tenures at Microsoft and work as an independent consultant, he brings an engineering-informed perspective to his reporting. His work focuses on translating complex research and fast-moving developments into accurate, engaging stories, with a methodical, reader-first approach to research, interviews, and verification.

Generative AI increases risks of cyberattacks and data leaks

A computer scientist says generative AI may make machine-learning systems less transparent, less secure, and harder to audit.

Written By: Shy Cohen/
Edited By: Joshua Shavit