SEO Site.in : Search Engine Optimization
Google
 Home 
 Hosting 
 Directory 
Add Your Company  | SEO Glossary
Hi Guest !!    Sign In     

How to use NOINDEX, NOFOLLOW and NOODP

Posted On 09 Oct, 2007Views : 6494 Previous | Next 

Use Meta Tags

Different meta tags have different instructions to follow, commonly used meta tags are:

“NOINDEX” Meta Tag

If you include a tag like:

<META NAME="ROBOTS" CONTENT="NOINDEX">

 

in your HTML document, that document won't be indexed.

Keep in mind that search engines will still spider these pages on a regular basis. They continue to crawl "noindex" pages in order to check the current status of a page's robots meta tag. There is no need to use an index tag; index is the default option. Using a default tag just adds  bloat to your web pages. The only time you might use them is to override a global setting:

<meta name="robots" content="noindex" />
<meta name="googlebot" content="index" />
 
Advantages
  • Allows page level granularity of robots commands.

Disadvantages

  • The use of a noindex meta tag is only possible with html pages (which includes dynamic pages such as php, jsp, asp). It is not possible to exclude other file types such as PDF, DOC, ODT which don't support html meta tags.
  • Pages will still be spidered by search engines to check the current robots meta tag settings. This additional traffic is avoided when using robots.txt file settings.

“Nofollow” Meta Tag

This is another method to tell search engine spiders to ignore one or more links on a page.

If you do:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

 

the links in that document will not be parsed by the robot.

To specify nofollow at the link level, add the attribute rel with the value nofollow to the link:

<a href="mypage.html" rel="nofollow" />
 

Disadvantages

  • Tests show that some search engines do crawl and index nofollow links. The nofollow tag will probably diminish the ranking value a link will provide but it cannot be reliably used to stop search engines from following a link.

“Noarchive” Meta Tag

Most search engines allow a user to view a copy of the web page that was actually indexed by the search engine. This snapshot of a page in time is called the cache copy. Internet visitors can find this functionality to be really useful if the link is no longer available or the site is down.

There are several reasons to consider disabling the cache view feature for a page or an entire website.

  • Web site owners may not want visitors viewing data, such as price lists, which are not necessarily up to date.
  • Web pages viewed in a search engine cache may not display properly if embedded images are unavailable and/or browser code such as CSS and JS does not properly execute.
  • Cached page views will not show up in web log based web analytics systems. Reporting in tagged based solutions may be incorrect as well as the cached view is on a third party domain, not yours.

If you want a search engine to index your page without allowing a user to view a cached copy, use the noarchive attribute which is officially supported by Google, Yahoo!, Windows Live and Ask:

<meta name="robots" content="noarchive" />

Microsoft documents the nocache attribute, which is equivalent to noarchive, also supported by Microsoft; there is no reason to use it.

“nosnippet” Meta Tag

Google offers an option to suppress the generation of page abstracts, called snippets, in the search results. Use the following meta tag in your pages:

<meta name="googlebot" content="nosnippet" />

They note that this also sets the noarchive option. We would suggest you set it explicitly if that is what you want.

“noodp” Meta Tag

Search engines generally use a page's html title when creating a search result title, the link a user clicks on to arrive at a website. In some cases, search engines may use an alternative title taken from a directory such as dmoz, the open directory, or the Yahoo! directory. Historically, many sites have had poor titles – i.e. just the company name, or worse, "default page title". Use of a human edited title from a well known directory was often a good solution. As webmasters improve the usability of their sites, page titles have become much more meaningful – and often better choices than the open directory title. The noodp metatag, supported by Microsoft, Google and Yahoo, allows a webmaster to indicate that a page's title should be used rather than the dmoz title.

<meta name="robots" content="noodp" />

Similarly, Yahoo! offers a "noydir" option to keep Yahoo! from using Yahoo! Directory titles in search results for a site's pages:

<meta name="slurp" content="noydir">

“Expires After with unavailable_after” Meta Tags

One problem with search engines is the delay which occurs from when content is removed from a website and when that content actually disappears from search engine results. Typical time dependent content includes event information and marketing campaigns.

Pages removed from a website which still appear in search engine results generally result in a frustrating user experience – the Internet user clicks through to the website only to find themselves landing on a "Page not found" error page.

In July 2007, Google introduced the "unavailable_after" tag which allows a website to specify in advance when a page should be removed from search engine results, i.e. when it will expire. This tag can be specified as a html meta tag attribute value:

<meta name="robots" content="unavailable_after: dd-yy-mm time CET" />

or in an X-robots http header:

X-Robots-Tag: unavailable_after: dd-yy-mm time GMT

Google says the date format should be one of those specified by the ambiguous and obsolete RFC 850. We hope Google clarifies what date formats their parser can read by referering to a current date standard, such as IETF Internet standard RFC 3339. We'd also like to see detailed page crawl information in Google's Webmaster Tools. Not only could Google show when a page was last crawled, they could add expiration information, confirming proper use of the unavailable_after tag. At one point, Google did show an approximation of the number of pages crawled relative to the number specified in a sitemap, but that feature was removed. This is one case where Google should follow Yahoo's example.

Advantages

  • A nice way to ensure search engine results are syncronized with current website content.

Disadvantages

  • Old date specification RFC 850 is too ambiguous, thus subject to error.
  • unavailable_after support is currently limited to Google. We do hope the other major search engines embrace this approach as well.

Site Preview

Microsoft's Live Search may offer a thumbnail view of the first six search results in some geographies. (Ask offers a similar feature called binoculars). The thumbnail preview can be disabled by blocking the searchpreview robot in the robots.txt file,

User-agent: searchpreview
Disallow: /

or by using a meta tag containing "noimageindex,nomediaindex":

<meta name="robots" content="noimageindex,nomediaindex" />

This meta tag was used by AltaVista at one point; it is not known to be used by any of the other major search engines.

Summary of Meta Tags

Now we can summarize all the meta tags used and their search engine support:

Tag

Description

Search Engine Support

noindex

Don't index a page (implies noarchive and nocache)

Google, Yahoo!, Windows Live, Ask

nofollow

Don't follow, i.e. crawl, the links on the page

Google, Yahoo!, Windows Live, Ask

noarchive

Don't present a cached copy of the indexed page

Google, Yahoo!, Windows Live, Ask

nocache

Same as noarchive

Windows Live

nosnippet

Don't display an abstract for this page. May also imply noarchive.

Google

noodp

Don't use an Open Directory title for this page

Google, Yahoo!, Windows Live

noimageindex, nomediaindex

Don't crawl images / objects specified in this page

Windows Live: uses this to disable a page preview thumbnail

unavailable_after: <date in one of the RFC 850 formats>

Don't offer in search results after this date and time. In reality, Google says:

This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. We currently only support unavailable_after for Google web search results.

Google

 

Some other Methods

Use X-Robots-Tag in your http headers

Use of both Robot.text and meta tags are not error free because site may be exposed in robot.text file and meta tags works only for html documents, so there is no specific indexing instructions for PDF, odt, doc and other non-html files. In July 2007, Google officially introduced a solution to this problem: the ability to deliver indexing instructions in the http header information which is sent by the web server along with an object.

The web server simply needs to add X-Robots-Tag and any of the Google supported meta tag values to the http header for an object:

X-Robots-Tag: noindex
 
Advantages
  • An elegant way to specify search engine crawling instructions for non-html files without having to use robots.txt.
  • Easy to configure using the Apache Header append syntax.

Disadvantages

  • Most webmasters are probably not confortable setting http headers.
  • Microsoft IIS support for adding http headers has traditionally been very limited.
  • X-Robots-Tag support is currently limited to Google. We do hope the other major search engines embrace this approach as well.

Password protect sensitive content

Sensitive content is usually protected by requiring visitors to enter a username and password. Such secure content won't be crawled by search engines. Passwords can be set at the web server level or at the application level. For server level logon setup, consult the Apache Authentication Documentation or the Microsoft IIS documentation.

Advantages
  • An effective way to keep search engines, other robots, and the general public away from content destined for a limited audience.

Disadvantages

  • Visitors will only make an effort to access protected website areas if they have a strong motivation to view that content.

Don't link to pages you want to keep out of search engines

Search engines won't index content unless they know about it. Thus, if no one links to pages nor submits them to a search engine, a search engine won't find them. At least this is the theory. In reality, the web is so large, one can assume that sooner or later a search engine will find a page – someone will link to it.

Disadvantages

  • Anyone can link to your pages at any time.
  • Some search engines can monitor pages visitors view through installed toolbars. They may use this information in the future as a means to discover and index new content.

Partially Stop Page Content from appearing in Search Engines

There are times where only a section of a page should be kept out of a search engine. Yahoo supports a class="robots-nocontent" html tag attribute for this purpose.

Removing pages which have already been indexed.

The best approach is to use one of the above methods. Over time search engines will update their indexes with regular crawling. If you want to remove content immediately, Google offers a tool specifically for this purpose. Pages will be removed for at least six months. This process is not without risk: improperly specify your URL and you may find your entire site removed.


Related articles
What is Search Engine Optimization
Benefits of search engine optimization
Factors that matters for Indexing
Do's and Don't of Search Engine Optimization
Points to remember before building a web site
Google Page Rank
Tips to improve google page rank
Robots and Robots.txt file


 All Rights Reserved to SEOsite.in