Web Scraping using Cypress..!

Gyan Vardhan
4 min readJan 13, 2021
Web Scrape with Cypress

In my previous blogs I already discussed Webscraping using different tools & its legality.

In this session, I’ll describe you, How to Web Scrape using Cypress..!

In this era of technology, Don’t work hard, take your time & do it in such a way that it looks easier to world.

INTRODUCTION

Collection of data from the web is termed as Web Scraping, Web Data Extraction & Web Harvesting. These days everything & everyone needs fuel to run. Data is the most precious fuel to run any organization. Finding the data is good; extracting it even better; doing it using automation is perfect.

Get to know the Tool

What is Cypress..?

Cypress is an advanced & next generation front end testing tool built for the modern web.

Cypress is a free, open source, locally installed Test Runner and a Dashboard Service for recording your tests. It aims to restrict the hurdles that the engineers and developers face while testing web applications based on React and Angular JS.

It is most often compared to Selenium; however, Cypress is different fundamentally and architecturally.

Experiential-Session

Performed on Versions

Microsoft Visual Studio — 1.52.0

Cypress — 6.0.1

Let’s perform web scraping using Cypress. Just check the website for the data you want to scrape and get the list of parent and child HTML tags.

Steps to follow to Web Scrape

> Select the Website and the Data

> Create a Java Script file

> Set the URL

> Inspect and get the proper HTML Tags

> Include the HTML Tags in the code

> Cross-check the Scraped Data

Step 1- Select the Website and the Data

I select this website “https://www.bullion-rates.com/gold/INR/2007-1-history.htm” and want to scrape data of gold rates along with dates.

Sample data we want to scrape

Step 2- Create a Java Script file

Create a Java Script file & open it into Microsoft Visual Studio, where we start to code for Webscraping.

Step 3- Set the URL

Java Script code look like this to pass the URL.

Pass the URL

Step 4- Inspect and get the proper HTML Tags

When you know the HTML tags, it’s quite easy to find to find them in which your data is present.

To see the HTML tags; right click and select inspect option.

Inspecting the HTML Tags

Proper HTML Tags:-

If you noticed table id is “dtDGrid” and table body is “tbody” under that table row tag “tr” in which our data resides in the “DataRow” tag.

Now, if you want to frame the selector it would be like this

Framing the Selector

In selector framing there is “#” which represents Id & “.” represents class.

If you look closely all the data are present under table data tag “td” under “DataRow”. So now, I have to iterate through “td” HTML tag to get all the data within it.

Step 5- Include the HTML Tags in the code

Our code will be like this after including the HTML Tags:-

Step 6- Cross-check the Scraped Data

Code be like this to print the data:-

Print Data

In this way, you can cover more child HTML tags to scrape data.

Scraped Data

Conclusion-

I tried to explain Web Scraping using Cypress in a very simple way, Hope this will help you.

Find full code on

access full code on github

If you have any questions about the code or web scraping in general, reach out to me on

Connect to Gyan on Linkedin

We will meet again with something new.

Till then,

Happy Coding..!

--

--

Gyan Vardhan

Data Scientist at Officetroops Technologies Pvt Ltd