Regex Basics for SEO

Share on Google+Share on LinkedInTweet about this on TwitterShare on StumbleUponShare on Facebook

You may have heard ‘RegEx’ been banded about now and again. It’s one of those tools which isn’t used as often as it should be by those new to SEO, but using it can be a considerable jump in your SEO efforts. I wish I’d learned about it sooner.

You don’t need to be especially well-versed in code or have a particularly technical mind to know how to use it, but at some point you’re going to come across it. Here’s a simple guide which should cover all the (very) basics, and if you want to delve into it a bit further you can.

What is Regex?

Regex is short for ‘regular expressions’, and is an excellent way of searching for patterns and strings. Each character within it has a special meaning (not unlike most programming lanugages) and you will have seen it used within robots.txt files (to a certain extent) to cover the de-indexing of directories or certain pages (although only the ‘*’ and ‘$’ wildcards).

You may also have seen Regex used in htaccess for 301 redirects and rewrite rules.

So in conclusion, you do need to know some Regex to have a good handle on technical SEO. And don’t worry: it does look scary, but what you need to know isn’t actually that tough, even if you have no experience in any kind of programming syntax.

Basic Regex Commands

The most common Regex commands you will need to use are as follows:

- The dollar sign ($) signals the end of a query, and it will stop anything ending with the predefined value at the end of a certain query being included.

For example, in my post about robots.txt, I said that you can allow all URLs which end in ‘sloth’ with:

Allow: /*sloth$

So this Regex search will only pull anything ending in ‘sloth’.

- The ($1) command lets you replace things, but keep whatever you had after it. This is especially useful in htaccess redirects, and I did touch on it in my post about redirects when redirecting an entire domain.

- The asterix (.*) which counts as a placeholder for ‘anything’. By that, you can use it to knock out anything before or after a certain query value.
So, if you wanted to find some stuff which starts which includes ‘sloth’ but don’t want that query to be affected by what comes before or after it, you would use:

(.*)sloth

Or

Sloth(.*)

- The caret symbol (^) is essentially the opposite of the dollar symbol. It will only return any items which begin with your specified value.

So if you wanted everything which starts with ‘sloths’, you would use ‘^sloths’.

- The pipe bar is the final expression to take note of. It allows you to specify options; it’s basically an ‘or’ function.

Using Regex in Analytics

Say you know what you want to find in Analytics, but don’t want to muck about find each and every page. That’s when the pipe bar comes in.

If I wanted to find all the pages which contain the categories of my site ‘guides’, ‘offsite’ or ‘my-blog’, I would need to choose ‘matching Regexp’ within the ‘advanced’ filter and type the following query:

regex

Just choose ‘exclude’ in the drop-down if you don’t want to include them.

In this manner, you can use Regex to include or exclude the pages which you need to view.

As aforementioned, you can use Regex in htaccess to redirect pages on a domain too. As always, back up your database before mucking about with a htaccess file.

Written by Sarah Chalk

Sarah Chalk

Sarah is an SEO Account Manager at 360i and has a keen interest in all things SEO. She has also written for a number of sites, including Vue cinema’s film blog and a number of tech websites.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>