In search engine optimization engineering, automated testing is critical to a stable organic search presence. Despite the utility of a framework like Puppeteer, there are a shocking amount of hoops one would need to configure before properly running the library.
In an exciting turn of events, Cloudflare recently announced the ability to run Puppeteer within a Cloudflare Worker as part of their Workers Browser Rendering API, which is currently in closed beta, but accepting new clients via a waitlist.
This opens the door to an exciting variety of possibilities as many developers, myself included, have already been using Google Cloud Functions, AWS Lambda, or even a dedicated virtual machine to handle the bloat of a full web browser as part of your application.
You can currently make fetch calls to a cloud function (Google Cloud, AWS, DO, Heroku, etc) from a Cloudflare Worker as you have 30 seconds to respond, which is normally enough time, or even up to 15 minutes on a scheduled Cloudflare Worker.
Community Cloudflare Excitement and Innovation
Taking a step back, it seems I was not the only one wanting this feature:
You know what would be a killer @Cloudflare feature, one click to enable pre-rendering for SPAs when a bot/crawler is detected.
— Adam Wathan (@adamwathan) November 26, 2019
Adam Wathan of TailwindCSS called this out as a killer feature that Cloudflare could build by offering pre-rendering for SPAs when a bot/crawler is detected. Cloudflare already offers various integrations like this such as integrations of IndexNow as well Signed exchanges (SXGs)
After the announcement, there was more discussion on how this could have a major impact on tools that are built on top of Puppeteer.
Drop in replacement for current puppeteer solution. 😍
— Adam Janiš (@adam_janis) November 16, 2022
👆 means we can keep our interface intact, and you can still take screenshots with a single command inside our Cloudflare for Platforms Workers. 🚀 https://t.co/jMMTI275wI pic.twitter.com/EYqr7ohGwV
Repeat.dev offers a webhook to schedule tasks and generate PDFs, which is the exact use case Cloudflare calls out and they hope to integrate the solution soon.
Cloudflare is launching Puppeteer support inside of Workers. Should make PDFs and screenshot workflows a lot easier.
— Wes Bos (@wesbos) November 16, 2022
Just last week I had hit the 50mb limit of a lambda, and couldn't use a layer to run puppeteer on Netlify Functions. https://t.co/HfemRBLCtU
The timing of this tweet is just perfect as it was announced a few days later:
@KentonVarda Is it possible to run a chrome/puppeteer instance in cloudflare workers? Or is that crazy talk?
— Dash 🍜 (@x8BitRain) November 14, 2022
Example Cloudflare Puppeteer Integration
So although I have requested early access, here is what we know about what Cloudflare is offering and how easily it could be integrated.
Puppeteer can be easily imported as a package, specifically built for Cloudflare.
import puppeteer from ‘@cloudflare/puppeteer’
You can create a browser instance and launch a page, just like you would on any puppeteer script. It will be interesting to see if there are Cloudflare-specific methods and/or missing methods from the normal Puppeteer package.
const browser = await puppeteer.launch({
browserBinding: env.MY_BROWSER
})
const page = await browser.newPage()
await page.goto(“https://example.com/”)
const img = await page.screenshot() as Buffer
await browser.close()’
This code example that Cloudflare provided even showcases how you can upload the screenshot to R2 storage, which is a Cloudflare object storage solution, much like AWS S3 or Google Cloud Storage.
try {
await env.MY_BUCKET.put(“screenshot.jpg”, img);
return new Response(`Success!`);
} catch (e) {
return new Response(”, { status: 400 })
}
What kind of limitations will Puppeteer have on Cloudflare?
Since Cloudflare is a CDN and workers operate on the edge, I’m curious to see what limits or backdoors this opens up to the world of scraping.
Cloudflare Workers already allow you to visit sites that normally would block you if you are making requests from another server. Because Cloudflare Workers are on the Edge, you are less likely to be blocked.
One limit that I came across recently was around subrequests. Rather than have the client make 50 individual requests that would render data on the fly, I had the client make 1 request to a Cloudflare Worker, which would then wrap 50 requests and respond with the final result.
This allows for heavy lifting such as filtering unique values and sorting arrays before sending them back to the client. So all of that business logic is not needed in the front end but can take a payload and render a view.
What are some possible projects for Puppeteer on the Edge?
The obvious use case is crawling a page and parsing the content for SEO insights:
- Page Title
- Meta Tags
- Headings (H1-H6)
- Extracting Text
- Extracting Links
There are already a ton of businesses built on top of Puppeteer:
- API & E2E Monitoring
- PDF/Gif/Screenshot Rendering
- Product Monitoring
Specifically for the SEO community, here’s a list of opportunities that I am excited about:
- Edge Crawling – a Screaming Frog crawler built on Cloudflare that exports to R2 storage. No server, no local machine, just Edge.
- Crawl + AI – Fetch a SERP, parse out the content, and use the content to generate new content to upload to your site.
- SEO Alerts – Crawl your clients’ sites to monitor for changes and report issues.
- SEO + NLP – Crawl sites, extract text, and process NLP for better insights and opportunities.
There are a ton of low-hanging opportunities as well, just something as simple as status code monitoring to find broken links and redirect issues could be configured.
What will you build or want to build with Puppeteer on the Edge? Hit me up on Twitter – @johnmurch with your thoughts.
Want to see how iPullRank can set your organization up with SEO monitoring automation? Check out how we help advanced SEO teams and contact us for any projects you want to launch.
- Puppeteer on The Edge: SEO Use Cases with Cloudflare’s New Rendering API (beta) - November 23, 2022
- What To Do When Rich Snippets Steal Your Clicks - February 22, 2022
- Create a Slack App: Lookup Search Volume and CPC for a Keyword - February 8, 2022
Leave a Comment