Many people do not fully understand what occurs when they search for images using Google's Image search. This article will attempt to give a general overview of what is actually happening. Once you understand the basics, you can make the Google Image search feature work better for getting more traffic to your site.
How do the images get into Google's Index?
First of all, how does Google get the thumbnail images that they display? The answer is simple. They have a program called a spider. It knows how to crawl around the internet looking for photographs. It essentially starts out with a page that is provided to it (a URL to a webpage, in other words).
What will it do when it reaches that page? It is too early to answer that question. The reason is that it may never actually reach the page in question. Why?
Because, Google like most of the major search engines obeys a set of rules governing how spiders operate. They first will check for the existence of certain files at the domain's location. Typically they'll look first for a file called "robots.txt". If they find that file, they load a copy of it and read it to see if it has any instructions regarding the presence of search engine spiders. If they find instructions, then they abide by them. For example a robots.txt file might tell the spider to not spider the site. Or it might tell the spider to not look in certain folders on the server.
Let's see what one of these files looks like. Here is one with only 2 lines:
User-agent: *
Disallow: /
In this example the spider is being told that it may not spider anything on the site. This would usually be a very bad choice, because it will make your site search engine invisible and few people will be able to find it.
Here is another example:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
In this example the spider is being told that it can spider anything on the site but not to look in a folder shown in a Disallow statement.
And here is a final example:
User-agent: BadBot
Disallow: /
This one says that if the spider is from the Search Engine BadBot, that it is not allowed to spider the site.
There are many combinations of instructions that can be included in this file. Google's spider always looks for this file and it always obeys its instructions if it is found.
If no "robots.txt" file is found, then the spider proceeds to try to read the original URL that was its target.
Several things may happen at this point. It may not be able to reach the target URL because the server has some form of security that locks one or more folders from access to the outside world. In that case, the spider will move on since it cannot reach the URL.
Now, let's suppose its finally reached the file it was trying to load. Does it index it and the images it contains at that point? No, not yet! There are still more hurdles to jump. It will read the file into its memory and it will look for certain tags to see if it has permission to index the contents of the file and/or to follow any links in it.
The typical "M E T A" tag it looks for would look like the following:
<M E T A NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This example instructs the spider to Not index the file and to Not follow any of the links within the file.
Note: G1 filters out the word "m e t a" and replaces it with "****", so I've used spaces between each letter so it will show up. But in actual use those spaces would not be there.
Suppose this "m e t a" tag is not present. Now the spider will proceed to index the page. Here is what it does:
- Reads all the file and isolates as best it can the actual text that appears in the file. It enters portions of this information into an index which is stored on Google's site. We are not interested for the purposes of this article into all that goes into this. But if you were doing SEO work for a website, this would be very important to you.
- Looks for images. If it finds an image, it creates a thumbnail of it and stores it on the Google server. It also looks for Alt tags associated with the image and surrounding text, and the file name of the image and indexes this text which will be used later when someone is searching for an image. It also stores the file size and dimensions of the image and its image type, etc.
- Looks for every outbound link on the page that links to another page on the site or a page somewhere else on the internet. It will use these links to continue its crawl around the internet.
The above is a quick overview and is not going into any great detail. But those are the essentials of what happens and how thumbnails of images from a person's website end up in the Google Index. By the way, the thumbnails are fairly low resolution and would be unsuitable for most uses. Blow one of them up to 200%, which is about the minimum size that would be needed for use on a cell phone, and you'll see what I mean.
Now that we have an understanding about how Google indexes your images, let's see how they are used in actual practice.
Searching Google for Images
Now let's see how Google returns image search results. For this test, I'm going to use a site called "mia-farrow.com". I'll search for that exact phrase, because it will give me several images and different pages that will illustrate what is really happening when you search for images.
Here is a screen shot of what you see when you type in this search phrase:
As you can see it has returned images (there are actually hundreds of images returned, but I'm just showing the first 2). It returned the images that matched the text you typed in the search engine based on the text it gathered about the image that we discussed earlier.
Now click on the image that has the red arrow pointing to it and this is what you'll see:
This page is being displayed in "frames". The top frame (A) is pointing to the Google website and shows what you see there. The content in that frame is using Google's bandwidth and not yours. The bottom frame (B) shows the actual webpage where the photo appears. If you were to see the whole page (which you can do by just entering the key phrase I started with in the Google Image Search Engine) then you will find the full size version of the image on that page.
However, if you click the thumbnail image in the upper frame you'll see a new page that contains just the full sized image. The page you see at that point is not being shown from Google's site. If you look at the address bar at that point you'll see this:
As you can see it is being displayed from the actual website that contains the image. Also keep in mind that this display did not cost the website any bandwidth, because the image at this point would already have been in the browser cache (the image was read into the cache when the page was originally shown in the lower frame. Note: There is an exception if the image is no longer contained on the page where it was when Google originally indexed it). Sometimes no image will appear when you click on the thumbnail in the A frame. Why? The server in question may be using server security to control where the images are served from and blocks access to the image (Porn sites often do this) being served in the manner that Google is using (this is an advanced topic) or the image may no longer be on the server, or has moved, or changed its name.
Now let's consider the other image in my earlier screen shot. Here is that shot again:
Click on this image and see what happens. Here is a screen shot at this point:
You notice that the frames are gone and you are on the actual webpage where the image resides. What happened to the frames? If you actually do the above clicks on your own browser you'll see the frames for a split second, but the page we are now going to has special code in it that yanks the page out of the frame and displays it full screen. This is very useful if you want to keep people on your site rather forecefully. The problem is, that if you click the "back" button at this point, it will return to the framed page, and that will immediatley redload the webpage which sort of traps the user on the page unless they know how to use the advanced features of the "back" button.
The above jumping out of the frame trick is used by Porn sites all over the Internet. So, they try every trick they can to get Google to index their images, because they know when someone clicks on one of the thumbnails they can capture the visitor on their actual page outside the frames.
So that in a nutshell is what is happening when someone uses the Google Image Search Engine. If you have questions, please post them in response to this article and I'll include answers in the next installment if warranted.
End of the first installment
*30*