(There’s plenty of not-really-anything-to-do-with-SEO reasons to apply a reverse proxy to your website, like when making an internal web server accessible to other users. This guide ONLY refers to an SEO issue.)
If your site uses a more complicated CMS such as Umbraco, Sitefinity or an e-commerce platform such as Venda, it’s usually preferable to feature the blog section of a site on WordPress.
WordPress is one of the easier to use CMS platforms, specifically for blog-type sites. You’ll find that clients will very often host a blog on subdomain, hosted on WordPress for ease of use. However, this can create SEO issues, as the separate server makes the link between two sites more tenuous than it needs to be. The SEO authority passing from the subdomain into the main domain is weakened, and content on the subdomain itself is less visible.
A quick way to address this is using a reverse proxy.
What is a Reverse Proxy?
A reverse proxy presents resources from a separate server, as if they were from the server itself. To simplify that, take a look at the URLs.
You may have a blog hosted on a subdomain on your site, the URL of which looks like this:
However, you want the blog to exist as a folder on the main site. When the reverse proxy is introduced, it returns blog.seokitty.net as the URL seokitty.net/blog/.
Using Apache Htaccess to Reverse Proxy
If your subdomain is hosted on WordPress, its on an Apache server. That makes things significantly easier (see the bottom of this post if its an IIS).
You will need to access the .htaccess file on your main website in order for this to work. The .htaccess is essential in this instance, since it allows you to configure your site without altering server config files. Essentially, you need your main domain to return a request for /blog/ from the WordPress subdomain.
In your .htaccess, you need to use Mod_Rewrite. This is a function unique to Apache. It works in a similar way to 301 redirects and uses Regex to include requested URLs.
RewriteRule ^/blog/(.*) http://blog.seokitty.net/$1 [NC,P]
The regex here basically means that any request for a relative URL which starts with ‘/blog/’ (as specified by the ‘^’ symbol, and ‘.*’ meaning anything after it), is returned the result from the subdomain URL (‘$1’).
The ‘NC’ bit just means it isn’t case sensitive, and the ‘P’ bit refers to proxy.
Once that’s done, go into your WordPress Admin and switch the proffered URL to http://seokitty.net/blog/ (change the site address URL ONLY, don’t touch the WordPress address or you’ll break it).
Do Reverse Proxies Cause Duplicate Content?
So now you’ve got the same content on blog.seokitty.net as you do on seokitty.net/blog/. One of the key things you need to do after the reverse proxy is eliminate the chance of duplicate content by stopping the original blog.seokitty.net URLs from being indexed. You also need to retain the power of any links which are pointing to the original URLs.
What about applying a subdomain-wide 301 redirect, then? Nope; you can’t apply a 301 redirect to the blog.seokitty.net domain, because if you do then any request to access either blog.seokitty.net or seokitty.net/blog/ will get stuck in a redirect loop.
This is why you need to implement a canonical tag, pointing to the /blog/ URL, for each of the blog.seokitty.net URLs.
To stop search engine robots from crawling the old URLs, add a robots.txt file to the subdomain simply reading:
And you’re done <3
Note: ALWAYS SAVE A COPY OF YOUR HTACCESS FILE BEFORE MUCKING ABOUT WITH IT. SRSLY.