Today’s tip is a quick and easy one that everybody should be looking to implement. If you offer a print version of your website for your visitors there is, in fact, a right and a wrong way to handle them.
As websites have evolved and become increasingly reliant on graphics to deliver content, more and more savvy webmasters and designers have incorporated printer friendly versions into their sites. Typically a printer friendly link will be a part of the main site navigation section and it’s pretty much just a link to a copy of the web page that will format nicely for the user’s printer.
What if your site prints fine the way it is? Should you still have a printer friendly version of the page?
I’d say no. If you’ve tried printing your site a few times and it looks fine, then I wouldn’t advise adding a printer version just to have one. Actually that brings us to the point of the tip. The primary danger associated with printer pages is inadvertently making the search engines think you are duplicating your content.
The dreaded duplicate content penalty…
Right. Except that it’s not so much a ‘penalty’. I’ve talked to several knowledgeable Google people, like Adam Lasnik, Vanessa Fox and Matt Cutts about duplicate content penalty and they all pretty much say that the duplicate content penalty would probably be more accurately referred to as a duplicate content filter.
Google isn’t necessarily penalizing or punishing sites (in most cases) for duplicated content. However, they also aren’t interested in returning duplicated pages in their search results. As a result, the duplicate content filter is in place to keep the Googlebot from indexing the same content over and over.
So I guess you want to make sure your printer friendly pages don’t get you filtered somehow as duplicated content.
Right. In most cases you wouldn’t want your printer page indexed and your regular page filtered as duplicated for a bunch of reasons. You may not have complete navigation links, you may not have your ads included, – lots of things… think about it this way, if you wouldn’t care if your printer friendly page resulted instead of or way ahead of your normal page, you should probably not have 2 versions of your page to begin with.
So, assuming you do want to have a printer friendly page available, how do you avoid the duplicate content filter.
Well there are 3 main ways to keep robots out of your printer pages… and they’re all really easy to implement.
The first one works on a domain or server level. You can just use your robots.txt file. Basically what you do here is keep all of your printer pages in one directory and exclude all robots from that directory.
http://www.robotstxt.org/wc/norobots.html#code
The second method works on the printer page itself. The trick here is to use a robots META tag in the header of your printer friendly pages. Zapping a noindex instruction in your meta tag set takes all of 5 seconds and it works just fine.
http://www.robotstxt.org/wc/meta-user.html
The third option works on your original page where you link to the printer page. If you don’t want the crawlers to follow the link to you r printer page you can simply add a “nofollow” attribute to your actual hyperlink.
http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html
Is it ok to use more than one option?
Sure that shouldn’t be a problem at all. As a matter of fact it’s probably advisable. If you make a mistake or forget to add one a nofollow to your links, for example… but you have the whole printer page directory excluded in your robots.txt file, you’re covered either way.
iEntry 10th Anniversary
RSS
Feedback

RSS
Bookmark
Twitter
Facebook
Digg

Stumble
Del.icio.us
Reddit
Furl
Google
Yahoo!

Something not mentioned is that your printer pages are often faster to load due to the fact that they are simpler, have less graphics and/or ads and thus often end up as preferred content by the SEs
Never thought about the other options for excluding the pages for duplicate content, had always just used robots.txt. Thanks for the how-to and showing the different options.
You learn something new everyday. Really interesting post.
Hello,
I have a question regarding what you said above.
What happens if there is a document on an external page say “abc.doc” which has been indexed by Google. (As I can see by using the “site:externalsite.com” command)
And, now I just create a website by copying the contents of the .doc file? (The site will be plain html)
Will the new site be regarded as duplicate content? and any chance of being filtered out?
Thanks for your reply.
What Phillip said, why not use Web Standards and simply just use a print style sheet?
Well, three seemed like a nice number for the purposes of this discussion. We got one server level, one page level and one for the actual printer page. These just seemed like the easiest solutions, I’m not suggesting there aren’t others.
Certainly css would be a viable option, but I feel like the three methods we listed are all easier than creating new css.
I’m very surprised that this duplicate content printer friendly discussion did not even mention the web standards method of using a separate CSS print file. This involves no duplicate content, saves disk space and completely avoids all duplication issues.
Tiffany is so cute! couldn’t she wear more sexier clothes in the next video?
Tiffany for president! Keep em comin!
way too scripted i’m afraid… was funny to watch though. FYI – the video’s normally rock and I appreciate you doing them. Perhaps the next one will be better? (don’t give up yet)