Back to blog
·9 min read·by Panikkos Panayiotou

Why your Puppeteer scraper gets blocked (and how to fix it)

A practical guide to the 7 fingerprint signals modern bot detection looks at, and the cheapest production-grade fix for each.

scrapinganti-botpuppeteer

The wall

You wrote a Puppeteer script. It works on your laptop. You ship it. It runs ten times and then every page returns a Cloudflare interstitial. Sound familiar?

This is what is happening:

1. TLS / JA3 fingerprinting

Even before HTTP, the TLS handshake leaks your client. Bare Puppeteer's TLS fingerprint matches "headless Chromium without the usual extensions" — a tell that no real human is on the other end.

Fix. Use a real Chrome build (not chromium), or proxy through a TLS-faithful relay. BrowserForHire's gateway runs branded Chrome with the same JA3 a real desktop emits.

2. HTTP/2 frame ordering

Real browsers send H2 frames in a specific order with specific priorities. Headless Chromium has subtle differences. DataDome and Akamai catch this in their first response.

Fix. Pin the Chrome major version and disable the --disable-features=UseChromeOSDirectVideoDecoder flags that change the frame profile.

3. Canvas + WebGL fingerprint

Sites render an invisible canvas and a WebGL triangle, hash the result, and compare against known headless signatures. navigator.webdriver === true is the obvious giveaway, but rendering itself is the deeper signal.

Fix. Patch navigator.webdriver, install a real GPU stack in the container (or use a service that does), and randomize the canvas hash within tolerable bounds.

4. Behaviour signals

Scrolling at a constant velocity, clicking pixel-exactly on every button, never moving the mouse — these all flag automation.

Fix. Add jitter. page.mouse.move() to a random nearby point before clicks. page.evaluate(() => window.scrollBy({ top: 600, behavior: 'smooth' })) instead of teleporting to the bottom.

5. CAPTCHA challenge handling

When you do hit a captcha, you need a solver. Don't try to OCR Turnstile yourself. Vendors like 2Captcha, CapSolver, and AntiCaptcha give you APIs.

Fix. Detect the challenge, route to the appropriate vendor, paste the token back. BrowserForHire bundles this into a single ?solve=true flag.

6. IP reputation

Datacenter IPs are flagged in seconds. Residential proxies cost more but pass.

Fix. Mix datacenter and residential. BrowserForHire's plans bundle a residential pool priced at $/MB transparently — no premium-domain surcharges.

7. Persistent state

Sites watch for new sessions hitting the same path and treat them as bots. A real user has cookies from yesterday.

Fix. Persist storage state across runs. Our gateway supports ?session=foo to carry cookies and localStorage between requests.

The lazy fix

If all of the above sounds like a quarter of work you don't have, that's exactly why we built BrowserForHire. One connect URL, everything above is handled, public weekly success rate so you know it still works.

js
const browser = await chromium.connectOverCDP(
  'wss://cloud.browserforhire.com?token=' + BFH + '&stealth=true'
)

That's it. Ship.

Ship Chrome to production today.

Free 1,000 credits, no credit card. Drop-in replacement for Browserless.