Earlier this week, Cloudflare announced the introduction of their Browser Crawl Endpoint.
This allows Cloudflare users to crawl an entire website by making a single API call to the Browser rendering service.
Although the browser rendering service honours robots.txt they don't define a specific User-Agent that the service will check for, apparently instead expecting website operators to disallow all user agents if they want to keep Cloudflare out.
However, they have also documented that the service includes Cloudflare specific request headers, allowing requests to be blocked by checking for those.
This post details how to achieve that on BunnyCDN, Nginx and Openresty. The Headers
The relevant header names are...
