Web Scraping Using Node JS..!

Gyan Vardhan
4 min readNov 4, 2020

In my previous blogs I already discussed Webscraping using Python, R programming & its legality. I recommend you to go through my previous blogs it will be more helpful for you to understand it better.

In this session, I’ll describe you, How to Web Scrape using Node JS..?

Finding Quality Data is like “Looking for a needle in a haystack”.

What is Node JS..?

JavaScript is a popular programming language and it runs in any web browser.

Node JS is an interpreter and provides an environment for JavaScript with some specific useful libraries.

In short, Node JS adds several functionality & features to JavaScript in terms of libraries & make it more powerful.

Some Popular Libraries of Node JS

Let’s begin our topic of Web Scraping using Node JS.

I’m using Visual Studio to run this task.

Node JS Codes

Step 1- Creating the “package.json” file

To create package.json file, I need to run npm init and give a few details as needed in the below screenshot.

create package.json

Step 2- Install & Call the required libraries

Need to run below codes to install these libraries.

Install required libraries

Once the libraries are installed properly then you will see these messages are getting displayed.

messages displayed after installation

Call the required libraries:

call the libraries

Step 3- Select the Website & Data need to Scrape.

I picked this website “https://www.bullion-rates.com/gold/INR/2007-1-history.htm" and want to scrape data of gold rates along with dates.

date need to scrape

Step 4- Set the URL & Check the Response Code

Node JS code look like this to pass the URL & check the response code.

sample code

Step 5- Inspect & Find the Proper HTML tags

It’s quite easy to find the proper HTML tags in which your data is present.

To see the HTML tags; right click and select inspect option.

Inpect the HTML Tags

Proper HTML Tags:-

If you noticed there are 3 columns in our table, so our HTML tag for table row would be “HeaderRow” & all the column names are present with tag “th” (Table Header).

HTML Tags

And for each table row (tr) our data resides in DataRow HTML tag

HTML Tags

Now, I need to get all HTML tags to reside under “HeaderRow” & need to find all the “th” HTML tags & finally iterate through “DataRow” HTML tag to get all the data within it.

Step 6- Include the HTML tags in our Code

After including the HTML tags, our code will be:-

Code Snippet

Step 7- Cross-check the Data

Print the Data in the Console as logs, so the code for this is like:-

Print the Data
Scraped Data

If you go to a more granular level of HTML Tags & iterate them accordingly then you get more precise data.

Conclusion-

I tried to explain Web Scraping using Node JS in a precise way, Hope this will help you in understanding it better.

Find full code on

access full code on github

If you have any questions about the code or web scraping in general, reach out to me on

linkedin.com/in/gyan-vardhan-347570163

We will meet again with something new.

Till then,

Happy Coding..!

--

--