Knowing the difference between the filter and the penalty, you can now
understand how a search engine determines what duplicate content is.
There are basically four types of duplicate content that are filtered
out:
Websites with Identical Pages - These pages are considered
duplicate, as well as websites that are identical to another website on
the Internet are also considered to be spam. Affiliate sites with the
same look and feel which contain identical content, for example, are
especially vulnerable to a duplicate content filter. Another example
would be a website with doorway pages. Many times, these doorways are
skewed versions of landing pages. However, these landing pages are
identical to other landing pages. Generally, doorway pages are intended
to be used to spam the search engines in order to manipulate search
engine results.
Scraped Content - Scraped content is taking content from a web
site and repackaging it to make it look different, but in essence it is
nothing more than a duplicate page. With the popularity of blogs on the
internet and the syndication of those blogs, scraping is becoming more
of a problem for search engines.
E-Commerce Product Descriptions - Many eCommerce sites out there
use the manufacturer's descriptions for the products, which hundreds or
thousands of other eCommerce stores in the same competitive markets are
using too. This duplicate content, while harder to spot, is still
considered spam.
Distribution of Articles - If you publish an article, and it gets
copied and put all over the Internet, this is good, right? Not
necessarily for all the sites that feature the same article. This type
of duplicate content can be tricky, because even though Yahoo and MSN
determine the source of the original article and deems it most relevant
in search results, other search engines like Google may not, according
to some experts.
So, how does a search engine's duplicate content filter work?
Essentially, when a
search engine robot crawls a website, it reads the
pages, and stores the information in its database. Then, it compares
its findings to other information it has in its database. Depending
upon a few factors, such as the overall relevancy score of a website, it
then determines which are duplicate content, and then filters out the
pages or the websites that qualify as spam. Unfortunately, if your
pages are not spam, but have enough similar content, they may still be
regarded as spam.
There are several things you can do to avoid the duplicate content
filter. First, you must be able to check your pages for duplicate
content. Using our
Similar Page Checker,
you will be able to determine similarity between two pages and make
them as unique as possible. By entering the
URLs of two pages, this
tool will compare those pages, and point out how they are similar so
that you can make them unique.
Since you need to know which sites might have copied your site or pages,
you will need some help. We recommend using a tool that searches for
copies of your page on the Internet:
www.copyscape.com.
Here, you can put in your web page URL to find replicas of your page
on the Internet. This can help you create unique content, or even
address the issue of someone "borrowing" your content without your
permission.
Let's look at the issue regarding some search engines possibly not
considering the source of the original content from distributed
articles. Remember, some search engines, like Google, use link
popularity to determine the most relevant results. Continue to build
your link popularity, while using tools like
www.copyscape.com to find how many other sites have the same article, and if allowed by
the author, you may be able to alter the article as to make the content
unique.
If you use distributed articles for your content, consider how relevant
the article is to your overall web page and then to the site as a whole.
Sometimes, simply adding your own commentary to the articles can be
enough to avoid the duplicate content filter; the
Similar Page Checker could help you make your content unique. Further, the more relevant
articles you can add to compliment the first article, the better.
Search engines look at the entire web page and its relationship to the
whole site, so as long as you aren't exactly copying someone's pages,
you should be fine.
If you have an eCommerce site, you should write original descriptions
for your products. This can be hard to do if you have many products,
but it really is necessary if you wish to avoid the duplicate content
filter. Here's another example why using the
Similar Page Checker is a great idea. It can tell you how you can change your descriptions
so as to have unique and original content for your site. This also
works well for scraped content also. Many scraped content sites offer
news. With the Similar Page Checker, you can easily determine where the
news content is similar, and then change it to make it unique.
Do not rely on an affiliate site which is identical to other sites or
create identical doorway pages. These types of behaviors are not only
filtered out immediately as spam, but there is generally no comparison
of the page to the site as a whole if another site or page is found as
duplicate, and get your entire site in trouble.
The duplicate content filter is sometimes hard on sites that don't
intend to spam the search engines. But it is ultimately up to you to
help the search engines determine that your site is as unique as
possible. By using the tools in this article to eliminate as much
duplicate content as you can, you'll help keep your site original and
fresh.