How to Scrape Blogger URLs with Google Sheets

If you read the last post, you’re probably aware that Google Sheets can be used as a primitive scraper. Using xpaths you can search websites and pick them apart to find a specific piece of information. Now we’ll be using Google Sheets to scrape blogger URLs. In this guide, we’ll be scraping a list of blogger URLs with Google Sheets from Google Search, for the purposes of helping automate blogger outreach. If you copy the code, setting up this sheet yourself should be pretty easy to do. See the example below:

Scraping Blogger URLs from Google

The first step to scraping blogger URLs will be to get a raw list of URLs from Google search. This can be easily done within Google sheets. We will be using an ImportXML formula on a Google Search query to generate this list. To start, copy the following code into your sheet (A6 if you’re following the sample above):

=ImportXML(CONCATENATE("http://www.google.com/search?q={blog ",B3,"}&num=",D3), "//h3[@class='r']/a/@href")

This code lets you alter the keyword and amount of search results you want to appear in your sheet (see references to cells B3 and D3). The keyword “blog” is already hard coded in the search query since we’ll be using this to get blogger URLs, but removing this will let you scrape any of sort of site. If you’re curious as to how this is working, this formula is first executing a search query, and then it is returning all the results for the xpath query “//h3[@class='r']/a/@href”. Within a Google search results page, this refers to all H3’s that are links (hrefs).

Cleaning Up the Links

Now that you have a list of links on your sheet, they won’t be workable just yet, so you’ll have to clean them up. To do that, use the following formula and paste it in your sheet (B6 if you’re following along):

=iferror(substitute((RIGHT(A6,LEN(A6)-7)),REGEXEXTRACT(RIGHT(A6,LEN(A6)-7),"& a=.*"),""),"")

his formula uses a REGEXEXTRACT to search for the bit of text that follows the pattern “&sa=” and then, using a SUBSTITUTE, removes anything after it. It then substitutes the first seven characters of the text with nothing. What you are left with is a clean URL that the ImportXML formula can parse and search.

Scraping Blogger URLs

While outside the scope of this guide, it is helpful to know that you can use Google Sheets to go further and scrape the list of URLs that you have already got. To do this, find the element that you want to scrape and then right click it and click "Inspect Element". With the element selected on your console, you can right click again and select "Copy xpath". This xpath can be used in conjunction with the URL you've indexed in your Google Sheet and you can directly export that content to your Sheet instantly. 

Sending a Mail Merge with Google Sheets

If you've handpicked some emails from the list of URLs you've generated, here's a quick way to send them out from your gmail account right from Google Sheets.

Setting up Your Sheet

Before writing your script, you'll need to set up your email list and message. To do this, open up a new sheet, paste the email list you've gathered in the first column (A1), then paste the body of your email in the second column (B1). 

If you're curious about writing a good email pitch, this is a good resource on the subject. It essentially boils down to writing an honest (re: no exaggerations or buzzwords) and concise email about your cause. It should be well formatted and easy to read, with appropriate line breaks and bullets. Since you'll be using a spreadsheet to send these emails, use a concatenate formula with char(14) for line breaks and char(149) for bullet points. These codes will work in both excel and Google Sheets. It is also worth noting that including the recipient's name in the email will go a long way to build trust, so include it if you can.

Writing the Mail Merge Script

script editor google sheets

To set up your script, open up the script editor in tools.  This will open up a new tab where you can run and debug your scripts. In the white space, paste the following code:

function sendEmails() {
var sheet = SpreadsheetApp.getActiveSheet();
var startRow = 1;// First row of data to process
var numRows = 40; // Number of rows to process
// Fetch the range of cells A2:B3
var dataRange = sheet.getRange(startRow, 1, numRows, 2)
// Fetch values for each row in the Range.
var data = dataRange.getValues();
for (i in data) {
var row = data[i];
var emailAddress = row[0];// First column
var message = row[1]; // Second column
var subject = "Little Rug Shop Affiliate Program Invite";
MailApp.sendEmail(emailAddress, subject, message);
}
}

If you're looking at this and feeling a little bit intimidated, don't be! We'll go right through the different variables so you can understand how to use this script for future projects.

  • startRow refers to the row you would like the script to start reading from. Since we copied our email addresses in A1, we'll set this variable to "1". If you wanted to start from row 15, you would set it to 15 and so on. 
  • numRows refers to the amount of rows that you want to process after the starting row. 
  • dataRange refers to the variable we'll be using to create a range to pull our information from. This uses the numbers we specified below and columns 1 and 2 to create the range. In Sheets you would see this being written equivalently as the range "A1B40".
  • subject is a static variable for the subject line of your email. This will be the same for all emails sent using the script.
  • MailApp.sendEmail(emailAddress, subject, message); refers to the method that actually send the email. It takes the email address, subject and message as parameters. Read this article to see what else is possible with the sendEmail() method.

Once you are all set up, have the right subject line, message and email addresses, hit the play button that is right above where you entered the script. Google will then ask for you authorization to access your document. After clicking yes, the emails will start sending. If you're wondering, Google will let you send a maximum of 100 emails a day under your account. Send any more and you risk having your account flagged as spam and risk suspension. 

Found this guide useful? Feeling generous?