The Conversion Chronicles, resources for improving your online conversion rates

Should I Add HTML Pages to a PHP Web Site?


Forward to a friend      
We respect your friends privacy

Question

Hi Jill,

Following your guidelines, I have had excellent results optimizing regular html pages. Thank you!

I am currently doing an associate's site that is written in php with a content-management type of program. The pages are in php and not html.
The program offers a way of adding in meta & title information for each page but it also has something that it calls "gateway html" and they designate it as being for search engine optimization. It is located in the same area that you will add specific title and meta data for a given page.

I believe that what it does is write out an html page of content that you input, but it
is associated with the php page. It will share the same name, title and meta attributes, but you can input different html there than will be viewed in the php page. It will have the same name as the original page, but will be .html instead of .php.

My question is, does this fall foul of "cloaking" whereby you are showing different content via the php and html versions of the same basic page? I have never done anything like this before and I am afraid of having problems with the search engines for fear of cloaking.

Could you please give me your opinion on this?

Thanks very much.

Mark

Jill's Response

Hi Mark,

If I understand you correctly, creating that extra HTML page through your content management system is not something that you need to do.
It may have been worthwhile many years ago (back in the age of the
dinosaur) when the search engines avoided reading dynamic-looking web pages and adding them to their databases. However, this is no longer a problem.

Let's discuss what used to happen in the Stone Age of the Internet for a moment, so that you can have a better understanding of this whole dynamic-website issue that plagues so many people.

Many years ago, when search engine spiders saw a URL that seemed like it went to a dynamically generated site (because it had a bunch of parameters in it like question marks, equal signs, etc.), they wouldn't attempt to crawl it.

One of the reasons for this was that with dynamic sites, you would often find the same content delivered to the user (or browser or
spider) under multiple URLs. So for instance, on an ecommerce site that sold hats, you might be able to get to the black ten-gallon cowboy hat that you had your eye on through a URL that was something like this:

http://www.MyAwesomelyCoolHatShop.com/index.php?
category=cowboy&color=black&type=tengallon.php

This user may have browsed for cowboy hats, then chose the color black, and then the ten-gallon type.

You might also get to the same exact hat page through another URL like
this:

http://www.MyAwesomelyCoolHatShop.com/index.php?
color=black&type=10gallon&category=cowboy

This user may have been looking for a black hat to start out, and then decided on the 10 gallon cowboy type.

These are similar but different URLs that both have the potential for being added to a search engine's database. When that happens it creates a whole pile of URLs for exactly the same content, which is one of the reasons the search e ngines would avoid them. Another reason for their avoidance was that the search engine spiders had the potential for getting stuck in a sort of infinite loop while they were trying to gather up all the pages. With so many different ways to categorize the products, and so many ways for a user to land at the same page, the spider might end up going around in circles. Search engines and website owners don't like that because it can eat up server resources.

Years ago it was easier for the search engines to simply avoid those types of sites, as they were few and far between. Since site owners still wanted to get their sites into the search engines, savvy programmers learned how to create URLs and pages that were friendlier to the search spiders. Some figured out how to make dynamic-looking URLs into static-looking ones by rewriting the URLs so that they didn't use parameters. Others created workarounds whereby the content management system would spit out HTML files that were more crawler-friendly, such as the system Mark was talking about in his question.

Fast-forward a few years.

As websites and businesses began to grow, more and more site owners turned to content management systems to dynamically generate the pages of their websites. It was a whole lot easier and faster and just made sense. Dynamically generated pages were definitely not going to go away, so of course it was in the search engines' best interests to figure out how they could index the information contained on them.

And so they did.

Today's search engines generally have no problem with dynamically generated pages. They don't scurry away as fast as they can when they see a .php or an .asp or a .cfm extension in a URL. They don't even flee when they see parameters in the URLs. Question marks and equal signs have no spider-repelling powers anymore. While I don't understand all the programming behind it, I do know for a fact that the search engines definitely index *most* dynamic-looking URLs just fine.

Notice that I said *most* -- not *all*.

Some believe that if you have more than 3 parameters in the URL, you may have less of a chance at getting those URLs indexed. I've seen some of those in the search engines' databases, however, so it's not a hard-and-fast rule.

Another problem for the search engines is when you require session IDs in your URL. The engines still try to avoid this type of URL because every spider visit to the site might create a completely different ID number and thus a new URL. The engines still prefer to keep hundreds of the same page out of their databases, so they have learned to look for the telltale signs of session-ID URLs in order to avoid indexing them. Because of this you should avoid using "SID=whatever" in your URLs if you want your pages indexed. Plus, Google has stated on their FAQ page for webmasters that they don't index URLs that have "&id" in them, so definitely stay away from those as well.

In answer to the original question posed by Mark regarding cloaking:
From how he described it, creating those pages wouldn't be considered cloaking, just unnecessary. There's no reason to create duplicate pages of the same content that's most likely already being indexed by the search engines.

If you do choose to use the extra pages, then you'll probably want to exclude the dynamically-generated PHP URLs via the robots.txt exclusion file and allow the engines to index only your .html/.htm files. But again, you'll be much better off to just ignore that function of your CMS. If for some reason you start noticing that none of your dynamic pages are getting into the search databases, you may wish to rethink this, but I doubt you will have any indexing problems.

Hope this helps!

Jill
More from this months issue | Archived chronicles | More from this author
Jill WhalenAuthor: Jill Whalen, SEO Consultant

Jill Whalen of High Rankings is an internationally recognized search engine optimization consultant and editor of the free weekly High Rankings Advisor search engine marketing newsletter