Whether or not you believe that Google has an official duplicate content penalty, there's no question that duplicate content hurts. Letting your indexed pages run wild means losing control of your content, and that leaves search engines guessing about your site and its purpose.
Imagine that you have 25,000 pages in the Google index but only 1,000 pages of that site actually have unique content that you want visitors to see. By losing control of your duplicate content, you've essentially diluted all of those important pages by a factor of 25; for each important page, you have 24 other pages that the spiders have to sort through and prioritize. One way or another, your 25,000 pages are all competing against each other.
Of course, removing duplicate content, especially for large, dynamic sites, isn't easy, and figuring out where to focus your efforts can be frustrating at best. Having fought this battle more than once (including some penalty situations), I'd like to offer a few suggestions:
1. Rewrite Title Tags
They may not be glamorous, but HTML title tags are still a major cue for search engines, not to mention visitors, who see them on-page, in SERPs, and in bookmarks. Even in 2008, I too often see sites with one title, usually something like "Bob's Company" or, even worse, something like "Welcome." SEOs may argue about what makes a good page title, but I think most of us would agree on this: if a page is important enough to exist, it's important enough to have its own title.
2. Rewrite META Descriptions
Whether or not they impact rankings, META descriptions are a strong cue for search spiders, especially when it comes to duplication. Take the time to write decent descriptions, or find a way to generate descriptions if your site is dynamic (grab a database field and shorten it if you have to). If you absolutely can't create unique descriptions, consider dropping the META description field altogether. In some cases, it will be better to have the search engines auto-generate the description than duplicate one description across your entire site.
3. Rewrite Page Copy
This one may seem obvious, but if content isn't different, then it's not going to look different. Copy duplication often occurs when people include the same block of text on many pages or copy-and-paste to create content. If you're repeating text everywhere, consider whether it's really important enough to be repeated. If you copy-and-pasted your entire site, it's time to buckle down and write some content.
4. Lighten Your Code
Although search spiders have gotten a lot better about digesting large amounts of content, many sites still have trouble when unique content ends up pushed deep into the source code. This is especially a problem for older, non-CSS sites or for sites with a large amount of header content. Streamlining your code can be a big help; even if you can't make the jump to "pure" CSS, consider moving core elements into a style sheet. If you're repeating a lot of header content on every page, consider whether that could be reduced or if some of it could exist in one place (such as the home page). Often, large blocks of repeated content are as bad for visitors as they are for spiders.
5. Emphasize Unique Content
Consider using emphasis tags (<h1>, <h2>, <b>, etc.) in your unique content, and use them sparingly in your page header. This will help spiders isolate page-specific content and tell pages apart more easily. Using headers and emphasis consistently will also help your visitors and make you a better copywriter, in my experience.
6. Control Duplicate URLs
This subject is a blog post or 10 all by itself, but I'll try to cover the basics. Dynamic sites frequently suffer from content duplicates created by multiple URLs pointing to the same page. For example, you may have 1 page with the following 3 URLs:
• www.mysite.com/product/super-widget
• www.mysite.com/product/12345
• www.mysite.com/product.php?id=12345
Ideally, only one of those URLs would be visible and the others would operate as hidden redirects, but that isn't always feasible. If you can't use consistent URLs, pick the most descriptive format, and block the rest (nofollow or robots.txt). If one format is older and being phased out, make sure you redirect (301) the old versions properly until you can remove them.
7. Block Functional URLs
This is essentially a subset of #6, but concerns URLs that aren't really duplicates but have a functional purpose. For example, you may have something like:
• www.mysite.com/product.php?id=12345
• www.mysite.com/product.php?id=12345&search=super%20widget
• www.mysite.com/product.php?id=12345&print=yes
These extended URLs are essentially functional directives, telling the page to take an action (like displaying a printable version) or passing along information (like a search string). Removing these completely gets pretty elaborate and isn't always possible, but these directives should definitely be blocked to search spiders. They are essentially hidden instructions that have no value to the spiders or SERPs.

Seven Ways to Remove Duplicate Content

0 comments:

Post a Comment

Other Sites