Data Poisoning

Surface adversarially useful rows and trigger patterns before they reach training data.

Use this when data could be intentionally shaped, externally supplied, or trigger-like enough that a security reviewer should inspect it before it reaches a fine-tune or evaluation set.

When this helps

Before suspicious rows become model behavior.

The team wants a review queue for trigger-like strings, hidden instructions, unusual repetition, or source paths that could be useful to an attacker.

What Datascreen shows

Security-relevant candidates, not verdicts.

Datascreen can surface rows and clusters that deserve review, preserve evidence, and record the reviewer decision without treating a candidate row as attack proof.

0.001% Tiny share, serious failure A hidden trigger can create harmful behavior normal evals may not catch.

250 Backdoor from 250 poisoned documents A 2025 LLM poisoning study found 250 poisoned documents backdoored tested LLMs across model and dataset sizes.

5 Five injected texts can control a RAG answer PoisonedRAG hit 90% attack success by injecting five malicious texts per target question into a knowledge database with millions of texts.

Source: Carlini et al., Poisoning Web-Scale Training Datasets is Practical, 2023; Souly et al., Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples, 2025; Zou et al., PoisonedRAG, 2024.

Watch walkthrough

Product demo: review trigger-like rows before fine-tuning.

Show suspicious rows grouped by trigger pattern and source context, then record whether the security reviewer removed, fixed, or escalated the rows.

Review depth

What the product needs to make the decision obvious.

Trigger review

Group unusual strings, hidden controls, and repeated patterns that deserve security review.

Source trail

Keep where the row came from visible while the team decides what to do.

Human approval

Require a reviewer decision before acting on high-risk rows.

Security record

Export what was reviewed and what remains uncertain.

What the team gets

Review queue

A prioritized list of rows and clusters that deserve human review.

Evidence context

The row, source, neighborhood, and reason shown together.

Decision log

A record of what reviewers kept, removed, fixed, or escalated.

Exportable report

A workflow-ready handoff that states what was reviewed and what remains uncertain.