One of the most important things to consider when trying to set up Google Analytics for your site is the integrity of your visitors’ source/medium data. Keeping this data as accurate as possible will go a long way to helping you make useful decisions about your marketing efforts.

That being said, there is a silent enemy threatening to destroy the harmony of your Google Analytics data: the self-referral. You may first see it rear its ugly in head in the All Traffic Sources report. Not only is it rather disconcerting to see your own site as a visitor’s referrer, but this entry in your reports represents irrevocably lost data. What’s worse, you may even notice that the conversion rate for this segment of traffic is actually quite good. You may be putting lots of time and money in SEO, paid online adversing, e-mail campaigns and print ads, but when someone asks which of these was responsible for the conversion, you really don’t know. Some of those sources may be getting overwritten by your self-referrals.

If your site has subdomains (domain.com and blog.domain.com, for instance), this might be causing the self-referrals to show up in your reports. The standard Google Analytics Tracking code is only good for sites with a single domain and no other structural complications. Anything beyond this and you’ll need to make some kind of modification to the script. Subdomains are one such complication.

Whenever a visitor comes to your site, the Google Analytics Tracking Code on your pages asks the visitor’s browser a question:

1. Do you have any cookies for this domain?

If the answer is no, then a follow-up question is asked:

2. Where did you come from?

Cookies are then created with values based on the information provided in the second question. As long as a visitor stays on a single domain, there’s no confusion. If a visitor moves from domain.com to blog.domain.com, the browser responds to the first question with “No, I don’t have any cookies for blog.domain.com” and to the second question with “I was referred from domain.com.” The cookies from domain.com are not recognized, so the source, medium, campaign name, keywords, etc. information is not available while the visitor is on the blog subdomain.

To make matters worse, if the visitor goes back to main domain from the subdomain, the original source/medium for the main domain may be lost as well. The Google Analytics Tracking Code will ask the browser the same question as before:

1. Do you have any cookies for this domain?

Even though the browser can answer “Yes, I have cookies for domain.com,” there’s a different follow-up question that the Google Analytics Tracking Code asks after a “Yes” response to the first question:

2. Is this a new visit?

If the answer is “No,” then everything is fine. But if it’s been over 30 minutes or the browser has been closed since the last domain.com pageview, then the answer will be “Yes,” and again, the question will be asked:

3. Where did you come from?

Since the visitor is coming from blog.domain.com, the original source and medium will be overwritten with a referral from blog.domain.com.

Fortunately, this issue is simple to resolve. For the ga.js version of Google Analytics, just add the line pageTracker._setDomainName("domain.com"); to every page on your site, both the main domain and any subdomains of that domain. Your final code should look something like the following:

<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-XXXXXX-X");
pageTracker._setDomainName("domain.com");
pageTracker._initData();
pageTracker._trackPageview();
</script>

If you’re still using urchin.js, this line is _udn=”domain.com”;. Again, you should add this line to every page on your site, both the main domain and any subdomains of that domain. Your final code in this case should look something like the following:

<script src=”http://www.google-analytics.com/urchin.js” type=”text/javascript”>
</script>
<script type=”text/javascript”>
_uacct = “UA-XXXXXX-X”
_udn=”domain.com”;
urchinTracker();
</script>

Be sure to replace “domain.com” with your own domain. That’s it! Now the Google Analytics Tracking Code will always look for cookies for domain.com, even if the visitor is on a page for blog.domain.com. So when a visitor moves from domain.com to blog.domain.com, the following conversation ensues:

GATC: Do you have any cookies for domain.com,
Browser: No, I don’t have any cookies for blog.domain…wait a second…did you say domain.com?
GATC: Sure did.
Browser: Yeah, I got those.

And everything is right in the world.

There are a couple of different filters that are useful to have when your site has subdomains. One will include only traffic from a subdomain and the other lets you differentiate between pages with the same name but from different subdomains.

Please feel free to leave any comments or questions.