How to Scrape Blogger Emails with Google Sheets

If you read the last post, you’re probably aware that Google Sheets can be used as a primitive scraper. Using xpaths you can search websites and pick them apart to find a specific piece of information. Now we’ll be using Google Sheets to scrape blogger emails. In this guide, we’ll be scraping a list of blogger URLs with Google Sheets from Google Search, and then scraping those URLs for publicly available emails, for the purposes of blogger outreach. If you copy the code, setting up this sheet yourself should be pretty easy to do. See the example below:

Scraping Blogger URLs from Google

The first step to scraping blogger emails will be to generate a list of blogger urls to scrape the emails from. This can be easily done within Google sheets. We will be using an ImportXML formula on a Google Search query to generate this list. To start, copy the following code into your sheet (A6 if you’re following the sample above):

=ImportXML(CONCATENATE("http://www.google.com/search?q={blog ",B3,"}&num=",D3), "//h3[@class='r']/a/@href")

This code lets you alter the keyword and amount of search results you want to appear in your sheet (see references to cells B3 and D3). The keyword “blog” is already hard coded in the search query since we’ll be using this to get blogger emails, but removing this will let you scrape any of sort of site. If you’re curious as to how this is working, this formula is first executing a search query, and then it is returning all the results for the xpath query “//h3[@class='r']/a/@href”. Within a Google search results page, this refers to all H3’s that are links (hrefs).

Cleaning Up the Links

Now that you have a list of links on your sheet, they won’t be workable just yet, so you’ll have to clean them up. To do that, use the following formula and paste it in your sheet (B6 if you’re following along):

=iferror(substitute((RIGHT(A6,LEN(A6)-7)),REGEXEXTRACT(RIGHT(A6,LEN(A6)-7),"& a=.*"),""),"")

his formula uses a REGEXEXTRACT to search for the bit of text that follows the pattern “&sa=” and then, using a SUBSTITUTE, removes anything after it. It then substitutes the first seven characters of the text with nothing. What you are left with is a clean URL that the ImportXML formula can parse and search.

Scraping Blogger URLs for Contact Emails

Having a clean list of blogger URLs, you can now search and scrape them for email addresses. Keep in mind that this formula will not be searching all of the pages of a website for emails and that some emails are generated by javascript and will not be “scrape-able”.

To start, copy the following code in your sheet (C6 for everyone following along):

=iferror(substitute(IMPORTXML(B6,"//a[contains(@href, '@')]/@href"),"mailto:",""),"No Email Found")

This code is fairly simple. It uses the ImportXML function to search for all links that contain the “@” symbol within them. You’ll get some false positives with this method, but this is done to account for all the types of email domains that are out there. Then, if no email is found, the formula returns “No email found” so that your sheet isn’t full of errors.

We can go a step further and check the “contact” page, which typically hosts email addresses if the home page does not. To do this, we will simply be adding “/contact” to the cleaned up URL that we generated before. Copy the following code (cell E6):

=CONCATENATE(B6,"/contact")

Using this formula you’ll then generate a second URL to scrape for sites with a “Contact Us” page. Next step will be to scrape it. Again, use the same formula (H6):

=iferror(IMPORTXML(E6,"//a[contains(@href, '@')]/@href"),"No Email Found")

Using this formula, you’ll be able to scrape the contact page (if it exists) for a second chance at getting an email. You could go further with this and try to search the “Contact-us” page that typically appears on websites, but after following this guide, you should be able to set that up fairly simple for yourself. Bloggers typically shield their site from scrapers by replacing the “@” sign in their email addresses with the word “at”. If you’re willing to get a bunch of false positives you can try to search for a reference of the keyphrase “ at “ to get around this. But, typically, those bloggers aren’t interested in mass emailed pitches either.

While this won’t get all emails out there, it is a free way to quickly collect a lot of blogger email addresses with a script. And surely there are other places that you can easily check for an email address with the ImportXML formula. Not to mention, removing the keyword “blog” from the initial search query will let you scrape emails with a Google Sheet from any other type of site.

Sending a Mail Merge with Google Sheets

Now that you've got your list of emails to use for link building or whatever it is you use a lot of emails for, you'll want to start sending them your message. While not difficult, this is a time consuming task. Even if you've written an email template, you'll have to load each email individually and send them off one by one. Luckily, with a Google Apps Script, we can write a mail merge to mass email our newly scraped email list. 

Setting up Your Sheet

Before writing your script, you'll need to set up your email list and message. To do this, open up a new sheet, paste the email list you've gathered in the first column (A1), then paste the body of your email in the second column (B1). 

If you're curious about writing a good email pitch, this is a good resource on the subject. It essentially boils down to writing an honest (re: no exaggerations or buzzwords) and concise email about your cause. It should be well formatted and easy to read, with appropriate line breaks and bullets. Since you'll be using a spreadsheet to send these emails, use a concatenate formula with char(14) for line breaks and char(149) for bullet points. These codes will work in both excel and Google Sheets. It is also worth noting that including the recipient's name in the email will go a long way to build trust, so include it if you can.

Writing the Mail Merge Script

script editor google sheets

To set up your script, open up the script editor in tools.  This will open up a new tab where you can run and debug your scripts. In the white space, paste the following code:

function sendEmails() {
var sheet = SpreadsheetApp.getActiveSheet();
var startRow = 1;// First row of data to process
var numRows = 40; // Number of rows to process
// Fetch the range of cells A2:B3
var dataRange = sheet.getRange(startRow, 1, numRows, 2)
// Fetch values for each row in the Range.
var data = dataRange.getValues();
for (i in data) {
var row = data[i];
var emailAddress = row[0];// First column
var message = row[1]; // Second column
var subject = "Little Rug Shop Affiliate Program Invite";
MailApp.sendEmail(emailAddress, subject, message);
}
}

If you're looking at this and feeling a little bit intimidated, don't be! We'll go right through the different variables so you can understand how to use this script for future projects.

  • startRow refers to the row you would like the script to start reading from. Since we copied our email addresses in A1, we'll set this variable to "1". If you wanted to start from row 15, you would set it to 15 and so on. 
  • numRows refers to the amount of rows that you want to process after the starting row. 
  • dataRange refers to the variable we'll be using to create a range to pull our information from. This uses the numbers we specified below and columns 1 and 2 to create the range. In Sheets you would see this being written equivalently as the range "A1B40".
  • subject is a static variable for the subject line of your email. This will be the same for all emails sent using the script.
  • MailApp.sendEmail(emailAddress, subject, message); refers to the method that actually send the email. It takes the email address, subject and message as parameters. Read this article to see what else is possible with the sendEmail() method.

Once you are all set up, have the right subject line, message and email addresses, hit the play button that is right above where you entered the script. Google will then ask for you authorization to access your document. After clicking yes, the emails will start sending. If you're wondering, Google will let you send a maximum of 500 emails a day under your account. Send any more and you risk having your account flagged as spam and risk suspension. 

So now you know how to go start a blogger outreach program from gathering email addresses to sending them all a message automatically. After using this method a couple times and finding an email pitch that works for you, you can scrape hundreds of emails and send to them within minutes. This is a great alternative to programs like Scrapebox, since this is free :)

Found this guide useful? Feeling generous?