BrowserCat

BrowserCat is a “serverless” platform for working with headless browsers, including Chromium, Firefox, and Webkit. With just a single line of code, developers can run commands within a browser context, scaling up with demand and paying only for what they use.

With programmatic access to a browser, you can:

  • Automate complex workflows of clicking, scrolling, copy-pasting, and form filling, saving weeks of effort in just a few hours of code.
  • Crawl and scrape interactive websites, even those that require log-in.
  • Generate beautiful, dynamic images and interactive PDFs using HTML, CSS, and JS.
  • Record videos of scripted workflows for bespoke marketing, up-to-date documentation, or quality assurance.
  • Give your AI agents access to the internet, with persistent sessions and no limits whatsoever.
  • Speed up your E2E tests with maximum parallelization. And even run your tests against production regularly for the ultimate peace of mind.

There are niche services targeting the some specific use-cases outlined above. However, they frequently sacrifice flexibility to optimize on narrow use-cases. But in my experience, you quickly reach the limits of their offer, and you’re in the lurch when you find you have to host and scale headless browsers yourself.

BrowserCat is the flexible, fast, and affordable option.

It’s the platform moreso than the product.

Background

After announcing that I would retire SiteArcade, I spent about six months researching startup ideas. I wanted to make sure I didn’t repeat any of the mistakes I’d made previously.

The right project needed to be (1) the right size for a solopreneur, (2) with the potential for funding after product-market fit, (3) in a proven market without a clear “best,” (4) with room for many pivots, (5) building on my existing skill set, (6) interesting to work on for a long time, and (7) where I have lots of ideas for how to improve things.

While working on SiteArcade, I’d done a lot of work with headless browsers. First, for crawling Amazon reliably. Second, for generating dynamic user logos and brand assets on demand. And third, for generating beautiful, interactive PDF press kits. So I had a lot of ideas in this space, and I kept coming back to it.

At the same time, I was riding the cutting edge of generative AI, and I came to believe that in the future, not only would developers need easy, programmatic access to the web… but so would AI agents.

Thus, BrowserCat was born.

Goals

BrowserCat aims to target three discrete user populations, in series:

  1. Developers
  2. AI agents
  3. Non-technical users

At minimum, devs require a stable platform with a fantastic developer experience. Building on this baseline, they’re looking for dedicated APIs for specific use-cases, long-running sessions, large file storage, data aggregation, job scheduling, and integrations with other automation tools, like Zapier.

AI agents benefit from the above foundation, but they also require training, embeddings, and integrations to meet the varied use-cases users have brought up so far.

Lastly, non-technical users benefit from both the platform and the AI, but they also need GUIs to accomplish what the other users would do with code. This means in-browser tools for creating crawlers, parameterized workflows, and image/PDF templates.

At every point in this roadmap, there are ample opportunities for monetization, as well as many, many paths to product-market fit.

Challenges

Part of why I chose this project is that there’s so many opportunities to grow.

Just to launch, I had to solve for:

  • Global deployment of heavy browser containers with many permutations (browser type * agent version).
  • A data pipeline to accurately track usage, even in cases of failure.
  • Security concerns when different users connect to the same container.
  • Everyday SaaS and development issues.

But I’m even more excited by what’s to come:

  • Various integrations with AI.
  • Support for multiple connection agents beyond Playwright (e.g. Puppeteer, Cypress, and Selenium).
  • Rewrite the WebDriver websockets server in a compiled language for better speed, cost, and support across multiple agents and agent versions. Eventually also to support WebDriver BiDi.
  • Best-in-class API development for specific use-cases, such as scraping and templated image generation.
  • Best-in-class UI design for specific use cases, such as creating automation flows and custom crawlers.

Tech Stack

For the initial BrowserCat tech stack, I prioritized rapid development over my long-term vision for the service. Bootstrapped startups have no guarantee of success until they’re profitable, so it doesn’t pay to invest limited resources in concerns that will only matter at (literally) 1,000x - 10,000x scale.

That said, in the long run I expect to drop many of these services in favor of centralizing my infrastructure. Working piecemeal has introduced microservice complexity that won’t be justified long-term. Having used CDK + AWS extensively, it’s a natural choice, though Fly.io has served be very well with it’s scale-to-zero and instant-on containers.

Applications:

Shared Resources:

  • Payments: Billing and subscriptions via Stripe.
  • Authentication: For user and org management, I use Clerk, but I manage API keys locally.
  • Database: I use PostgreSQL, with pg_cron as a makeshift data pipeline. Hosted on Supabase.
  • File Storage: I host user-generated content in AWS S3, and website assets in Cloudinary.
  • Monitoring: Error handling with Sentry.io. Logs and uptime monitoring with BetterStack.