Ivan Tarin
ScraperAPI is a web scraping platform that allows companies to collect the data they need on HTML sites across the web, built with developers in mind. ScraperAPI was in DigitalOcean’s Hatch startup incubation program from day one and has since grown to serve tens of thousands of companies. As a bandwidth-heavy data service experiencing tremendous growth, their tool was beginning to have problems. Their solution involved migrating from VMs to DigitalOcean Kubernetes (DOKS). Now, their tool can scrape 14,000 websites per second and handle 36 billion requests a month. Their confidence in their infrastructure allows them to offer 99.9% uptime to their clients. Learn how they sped up new feature releases by starting on the DigitalOcean App Platform and migrating to DOKS.
Like most startups using DigitalOcean products, ScraperAPI started with Droplets (VMs) and scaled into the hundreds. Eventually, they found that it wasn’t the right architecture to support their bandwidth-intensive app and rapid growth. They were tired of manually writing code to sort through user agents and rotate IP addresses. They needed more control, performance, and reliability. The company’s monolithic app also needed help to handle the increasing demand. They opted to migrate to Digitalocean Kubernetes for scalability and convenience.
“We used to run 100+ Droplets. We converted to DigitalOcean Kubernetes in 2020, going from partially to fully managed DigitalOcean services. High-scale transactional apps run optimally if one instance has a few things [cloud-native patterns]. That’s where Kubernetes comes in. I’m a big fan of smaller pods but many. If you replicate with VMs, you can’t go below one core. Kubernetes allows more granularity.” –Zoltan Bettenbuk, CTO of ScraperAPI
ScraperAPI used the lift and shift method to migrate to DOKS. This involves moving existing VMs to Kubernetes without making profound changes to the architecture, configuration, or code and running them in containers.
**
Today, they still depend on the same monolithic app they migrated to DOKS in 2020. However, they’ve discovered a workflow to break-up their monolith and add new features continuously, all while keeping their team small. ScraperAPI operates as a DevOps shop where all the engineers manage the environment and code. Their website’s entry point is a dashboard and API. ScraperAPI has a team of five engineers managing the infrastructure. They use DOKS and other managed services to scale and manage their resources.
First, they create one or more proofs-of-concept (POCs) on the DigitalOcean App Platform. When it makes sense, the POCs can be new features or break-off services from their monolith. Sometimes they decide to stay on the App Platform, where their website’s UI and console exist today.
When the team agrees on a POC, they migrate it to DOKS to scale it further. They automate this process using GitHub actions for the POC migration from the App Platform to DOKS. They appreciate App Platforms’ simplicity. GitHub Actions delivers source code to the App Platform, then containerizes your app and deploys it for you. Bettenbuk loves uploading raw code to the App Platform. “I can start on the App Platform without changing my code,” says Bettenbuk. While the most tedious work for the team involves writing a Dockerfile for their microservices, the team can automate more with tools like buildpacks that don’t require writing Dockerfiles.
Afterward, GitHub Actions creates YAML files for Kubernetes and uploads their image to their DigitalOcean Container Registry. At first, they used the GitHub container registry to control their images. After testing, they found the DigitalOcean Container registry to be faster. The only change to their DOKS cluster is creating a namespace, which is also automated.
ScraperAPI’s preference for DigitalOcean is based on managed services’ pricing, scalability, and convenience. Their transition to DOKS has allowed them to scale efficiently and save time.
“At a previous company, we used a bare metal service provider and scaling a database could take upwards of two weeks; the provider had to provision new nodes. One of the best things is I can now scale in less than a minute.” –Zoltan Bettenbuk, CTO of ScraperAPI
They enabled the DigitalOcean cluster autoscaler which can control costs with automated adjustments to nodes in your cluster. ScraperAPI also activated the High Availability control plane for its reliability (99.95% uptime SLA for DOKS). Managed services allow ScraperAPI to focus on other essential aspects of its business—building new features instead of managing infrastructure.
DigitalOcean Kubernetes (DOKS) was the right choice for ScraperAPI, providing easy scalability, cost savings, and reliable database-managed service. If you’re considering migrating to DigitalOcean, explore DOKS as an option to manage your infrastructure, enabling you to focus on your business—not on managing servers.
Ask our experts if App Platform or DigitalOcean Kubernetes is right for your business.
Read the full story of How ScraperAPI scaled their data-heavy business with DigitalOcean Managed Databases.
Admas Kanyagia
Aaqib Gadit