How This Unsexy Strategy Helped These Startups Acquire Millions of Users in Record Time

Forget the 3 comma club. Here’s how to join the trillion-dollar club.

As of today, March 28, 2023, Airbnb, Amazon, and Netflix have a cumulative market cap of $1.2+ trillion thanks to this one unsexy strategy.

To put that in perspective, if their worth was a country’s GDP, it would rank 15th in the world (right below 🇪🇸 Spain).

What’s their secret to rapid growth and market dominance?

It’s data extraction at scale (also known as scraping). It’s been used by the most explosive startups to acquire users and grow.

Read on to find out what’s web scraping and how you can benefit from using publicly available data for your business.

Web Scraping: The Secret to Scalable Growth

In today’s digital economy, data is the new differentiator.

Having reliable data at your disposal can give your business a competitive edge.

Amazon (Market Cap: $1.01T)

Amazon leverages big data collected from the internet, and their customers’ behavior, to update their product pricing approximately every ten minutes. 🤯

Their pricing is set according to the general trends in the market, users’ shopping patterns, and business goals — among others.

Amazon doubled its annual sales from 2018 to 2021.

By capturing big data, Amazon can smartly offer discounts on best-selling items and, at the same time, earn large proﬁts on less popular products. This data-driven strategy has proven fruitful as they signiﬁcantly doubled their annual sales from 2018 to 2021.

Netflix (Market Cap: $148.45B)

Netﬂix experienced similar success. They used web data acquisition to gather data about the preferences of their viewers and potential subscribers.
Netflix maintained a low churn rate of 2.4% from 2019 to 2021.

Unsurprisingly, many of the Netﬂix Original shows are a hit, helping them maintain a low churn rate of 2.4% from 2019 to 2021.

Airbnb (Market Cap: $74.50B)

In the early days of Airbnb, the company used Craigslist as a source of listings and scraped data from the site to populate its own platform.
Airbnb email

This helped Airbnb rapidly acquire many listings and users.

These examples show that data harvesting is helpful in various businesses, regardless of the industry, type, or size.

Every organization that strives to scale should leverage publicly available data and use it to its advantage.

But how?
How can organizations collect web data at a large scale, automatically, and within minutes?

The answer is web scraping. 🤖

Three major beneﬁts of data harvesting:

Give insight into the market condition
Close observation of competitors
Deep understanding of consumer behavior

What is Web Scraping?

Web scraping is a method for extracting large amounts of data from the internet. This intelligent automated approach gathers everything from prices to product speciﬁcations, property listings, and publicly available data.

The results can be presented in structured ﬁle formats: XML or JSON.

Put simply, web scraping can be compared to “copy-pasting” content from websites, but it differs in the process and the tools needed to perform the action.

As you can imagine, data scraping requires a web scraper and a few lines of code to function. Some common programming languages and libraries used include Python BeautifulSoup and Python Scrapy.

Furthermore, unlike manual copy-pasting, a web scraper can harvest information from thousands of URLs by queuing requests in bulk.

This scalable solution eliminates any human intervention during the scraping process, saving you time and manual labor.

But Is Web Scraping Legal?

One general concern around web scraping is whether or not it’s legal.

No government has passed laws explicitly legalizing or de-legalizing web scraping thus far (2023). Therefore, we can only make strong assumptions based on case law about web scraping activity (e.g., HiQ vs. LinkedIn) and other data-related regulations.

We know that web scraping itself is legal — but it can be illegal depending on what type of data you scrape and how you scrape it. In general, you can legally scrape the internet as long as:

The data is publicly available
You don’t scrape private information
You don’t scrape copyrighted data
You don’t need to create an account and log in to access the website, OR you have read and fully understood the Terms and Conditions (T&Cs)

⚠️ Disclosure: I’m no expert, and the information given is provided for informational purposes only. Please seek legal advice if you’re in doubt about your web scraping project to ensure you’re not scraping the web illegally.

The Standard Sync Web Scraping Process

There are two primary components of a web scraper, the web crawler and the web scraper itself.

Web crawlers

The web crawler works similarly to a search engine bot. It crawls a list of URLs and catalogs the information. Then, it visits all the links it can ﬁnd within the current and subsequent pages until it hits a speciﬁed limit or there are no more links to follow.

Web scrapers

After the web crawler visits the dedicated web pages, the web scraper will collect the data. An integral element of a web scraper called ‘data locators’ will ﬁnd, select, and collect the targeted data from the HTML ﬁle of a website at scale without being blocked.

In simple words, this is how web crawling feeds into sync scraping: once data is crawled, it can be harvested. When the ﬁrst scraping request is complete, you can begin the next task.

Of course, the purpose of your scraping needs will always determine the type of scraper and method/s you use. Depending on your timeline and the volume of data collection you need, you may face challenges when you try to use a standard sync scraper to complete multiple tasks. Why? Because you’re bound to a limited response (timeouts) and the need to re-submit tasks.

Using an asynchronous scraper service, you can scrape at scale without these problems. It requires less coding and less infrastructure needed to build or maintain on your side. This speedy, modern method allows you to submit a large batch of requests simultaneously — still working to achieve the highest reachable success rate.

Once the job is done, you’ll be notiﬁed.

Web Scraping Process

Web scraping process Source: ScraperAPI white paper.

The web crawlers visit the given URLs.
The web scrapers request the page’s HTML ﬁle, parsing the response to generate a node tree. Most web scrapers will only parse the HTML code on the page, but more advanced web scrapers will also fully render the CSS and JavaScript of the page.
The scraper bots extract the data based on pre-set criteria (name, address, price, etc.) by targeting elements using HTML tags or CSS/Xpath sectors.
After the information is harvested, the scraper bots export the data into a database, spreadsheet, JSON ﬁle, or any other structured format, and it’s ready to be repurposed.

Learn Web Scraping: The Next Step

If you want to learn more about web scraping, I suggest starting with the basics and familiarizing yourself with the jargon. This will allow you to quickly search Google and find answers to any specific questions for your use case.

If you don’t know what “parallel requests,” “custom headers,” or “honeypots” are, you’ll have a hard time figuring out how to make things work.

If you’re interested, download this web scraping white paper (it’s free) to learn about:

🤖 Web scraping benefits and processes

💽 Types of data collection and web scrapers

😾 Common challenges (and how to overcome them)

✈️ Industries that use scrapers in their day-to-day tasks

🪄 Tips for using a web scraping API more effectively

Disclosure: I’m a growth consultant at ScraperAPI.

Tomas Laurinavicius

posted to

Growth

on March 28, 2023

Say something nice to tomaslau…

Post Comment

3

Wow, this is an incredibly informative and well-researched post! It's amazing to see how data extraction has played such a crucial role in the success of these massive companies. You've inspired me to dive deeper into web scraping and learn how I could potentially apply it to my own projects. Thanks for sharing this wealth of knowledge! Keep up the great work and keep inspiring others! 🚀

Dfrankle

·
a year ago
·
Reply
1. 1
  
  Hi David, I appreciate the comment!
  
  I'm a marketer and not very comfortable with technical stuff, but after seeing the use cases of web scraping, I'm inspired to learn how to utilize it to grow my startup and for fun.
  
  tomaslau
  
  ·
  a year ago
  ·
  Reply
2

That’s an interesting article, but I think the warning about legality should be significantly strengthened, especially in the area of copyright, because what passes as acceptable in one jurisdiction does not mean it is acceptable in another.

In particular, simply because something is acceptable for a US company to do does not mean it is permissible in the many other countries in the world. The EU and the UK do not have the same laws as the US.

There’s also a difference between “criminalising” and civil law. A person or company can be innocent of any crime but still face civil action for matters such as copyright infringement. It’s possible that scraping data which has been made using some form of creative input, such as creating news headlines or other created items may fall foul of the law. This article provides a useful summary of the potential Intellectual Property risks which web scraping can lead to:

https://emlaw.co.uk/web-scraping-legal-issues/
(I have no connection with that firm)

The website owner can also include in their terms and conditions a provision that data scraping is not permitted.

So, my advice would be (This is not legal advice, and I’m not a lawyer, just someone with decades of business leadership experience), : think very carefully indeed before you do this. Just because you know how to do something doesn't mean you should. Consult a qualified lawyer before you even plan it.

QSLIndie

·
a year ago
·
Reply
1. 1
  Appreciate the comment.
  
  These are fair concerns, and I absolutely agree with the "Just because you know how to do something doesn't mean you should."
  
  I originally included this part regarding ethics and lawfulness:
  
  "We know that web scraping itself is legal — but it can be illegal depending on what type of data you scrape and how you scrape it. In general, you can legally scrape the internet as long as:
  
  The data is publicly available
  
  You don’t scrape private information
  
  You don’t scrape copyrighted data
  
  You don’t need to create an account and log in to access the website, OR you have read and fully understood the Terms and Conditions (T&Cs)
  
  ⚠️ Disclosure: I’m no expert, and the information given is provided for informational purposes only. Please seek legal advice if you’re in doubt about your web scraping project to ensure you’re not scraping the web illegally."
  
  What else would you add to this list of warnings?
  tomaslau
  
  ·
  a year ago
  ·
  Reply
  1. 1
    
    Hi Tomas,
    
    Thanks for your reply.
    
    I think you have correctly included all the main warning headlines. My concern was more that for a reader who may be super-excited about tech, and the latest new thing, but just finds business a bit boring, they could rush into a position of exposure very quickly without realising the real world risks.
    
    I think the only other warning I'd add is more of an ethical one: "Would you like your own carefully created business data to be scraped?"
    
    Best - and good luck!
    
    QSLIndie
    
    ·
    a year ago
    ·
    Reply
2

Super interesting Thomas. Thank you for sharing. I am building something that involves scraping several sources incl. google news, serp, LinkedIn, crunchbase. Would love to connect and exchange if you are open to it?
https://www.linkedin.com/in/tthiele/

timbuildingclardo

·
a year ago
·
Reply