John Mueller from Google has a wonderful response to a question about duplicate content and how to measure how big of an issue it is on your web site. He posted it in the Google Webmaster Help forums.
In short, he called duplicate content issues an issue SEOs can work on, on one of those rainy days. He said Google does a good job dealing with it, and with all the issues you probably have on your site, duplicate content is probably not a primary issue relative to what else you have on your to do list.
Here is what John wrote:
Focusing on artificial metrics like that is not really that critical … Using tools to recognize issues is great, but you need to understand how these tools work, and take their output appropriately. For example, if you’re looking at a new site, it can be useful to get an overview of where potential issues with duplicate content might lie (and for that, crawling the site, using shingles and comparing them – via hash or directly, is a way to get a rough picture). However, when it comes to actually changing things, I’d recommend not blindly focusing on numbers like that and instead reviewing your content manually. “Is the primary content and purpose of these two pages the same? — Can they be combined into a single page?” Sometimes having the same content on multiple pages is desired, it’s certainly not something Google’s algorithms penalize a site for :). For the most part, I’d recommend looking at it as a user, and working your way through the site naturally. You’ll always find things to improve!
… and, as always, try to keep a sense of scale in mind. If you’re spending a week only focusing on filtering out some duplicate content, is that really the best use of your time? How relevant will that de-duplication be in 1 month, in 1 year, in 5 years? Google generally does a good job of dealing with these things, so sometimes it’s worth just jotting the issue down in a “rainy day / when someone new comes on board / for the summer intern” list, and instead focusing on the bigger issues in the meantime.
So much here, I enjoyed reading it and wanted to share it with you all.
Forum discussion at Google Webmaster Help.