google

Domain Age Advantage: does it spread to Subdomains?

November 23rd, 2007 at 10:49am Under google

Do you remember my test with a site related to the US greencard lottery? This site contains some real (and hopefully useful) information, but it is a two-pager, with very limited links between them. The site is targeted towards Germans, where the greencard lottery always draws some attention. Other then mentioning it here on the blog and submitting it once to digg, no promo was done.

Expectedly, when I look at Google Webmaster Tools, I do not see any traffic or stats for the site. However, when I just looked up my own web stats (awstats), I found out that the site actually received a few hits (surprise, surprise) and some of them were the result from a Google queries. Oops - results from Google SERPs? Interesting… I then did a check and used Google myself to do some queries (look at an example). And, indeed, the site appeared.

The first thing I noticed is that there are very few search results for these keywords, Google lists only 10,900, which means “nothing” in web terms. The next thing I noticed is that my poor page is on a spot above some of its big competitors. My understanding is that these have not optimized their sites for the (most probably) unusual query that I did. So, from that perspective, it looks understandable that my page shows up.

What I wonder about, however, is how quickly Google brought the site online in its index. I had expected quite a delay before it would show up at all.

My leading theory is that the age of its parent domain might be a factor in the equation. The ferientips.com domain is in use for over eleven years now and it contained good content most at the time. Recently, it was heavily outdated because I didn’t maintain it for quite a while. But the content was still solid.

It is often argued that domain age is in important factor in assigning a page’s rank (notice the fine print: I did not say “pagerank” ;)). However, most folks say that the age factor applies only to the hostname. So www.ferientips.com would have that plus, but greencard.ferientips.com would not. I personally always tended to agree to that school of thought. Now, I begin to question it. I had a similar experience with the site spacelaunch.gerhards.net. This is a high quality blog site about space launches and space in general. It gained Google attention and pagerank very quickly.

What both sites have in common is that they are subdomains of domains being in existence for a long time. So I begin to think that Google probably passes some of the “age benefit” down to subdomains of that very domain. This seems plausible, because there is also obviously is a strong relationship between two such sites. Of course, a factor is that my sites do not target heavily competitive keywords. So things may be different in that area. In any case, I’ll keep a keen eye on the development of the sites. Maybe they go to the sandbox and my thoughts were totally wrong ;).

By Rainer Add comment

Google Trust Rank, an update…

November 16th, 2007 at 09:00pm Under google

I have done a lot of research since I wrote my original article on Google Trust Rank. I was too interested to track down this beast.

First of all, I have to admit that a good number of my technical assumptions and conclusions were simply wrong. My thoughts about this being rumor, however, survived the test. With my new knowledge, I think TrustRank is really existing, most probably has for already a while - but many folks (including me up to recently) simply misunderstand it. And that creates rumor around trust rank that is not true.

The picture began to clear up for me when I found a scientific paper on trust rank. It is from Stanford University, but it is not Google-specific. In fact, it used Altavista as its testing bed. However, I am pretty sure that Google has paid attention to it, if they did not even develop something themselves in their lab.

What also helped to get the big picture was an interesting report about Google’s search labs in the New York times. While it has no specific details on trust rank, it has a lot of things that can be read between the lines.

I try to sum up what I think is most important about this concept. It is my personal opinion - read the sources yourself, you may draw different conclusions. Keep in mind that Google’s trustrank is probably different from what was in the paper. But I think it will share the basic ideas, otherwise it would probably be called differently (oh, I forgot that Google doesn’t call it anything after all… ;)).

Most importantly, TrustRank can be algorithmically computed. So my number one invalid assumption in the previous paper was that TrustRank solely depends on human review. Quite the opposite is true and it now fits much better in my overall picture of Google.

TrustRank (TR) is in many ways similar to PageRank (PR). Just the way it starts is different. Let’s ignore that for now. As with PageRank, TrustRank can (and will) be passed from one page to another. A link to a page is a vote for that page. Part of the linking pages’ TR will be carried over to the linked page. How much, is depending on many factors and shall not be of interest for us here. Important is the fact that TR calculation is pretty similar to PageRank (PR) calculation.

What is totally different is the way the initial ranks are calculated. With PR, every site’s (link) votes are equal. In (too) simple words, you crawl the web once, count how many links a page receives and the most linked page has the highest PR. All fully automatic - and all subject to spam or SEO (to phrase it a little less upsetting).

TrustRank, on the other hand, requires manual labor. Humans need to review sites and check how trustworthy they are. Are they spam? Do they have good information? Are they set up as a trap for the reviewer (eg. have good information now but are scheduled to change after acquiring trust)? Is the site owner trustworthy? Just think about it: a government is probably more trustworthy than a private body than the average Joe (OK, some me argue about that, but I think you got the idea…). So even real-world, non-virtual trust plays a role in human review.

It is impractical to review all web sites. It is impractical to review a small fraction of the sites. And it is even impractical to review a fraction of this small fraction. Only a very, very small number of sties can actually undergo human review. So the TR needs to be able to deliver good results on a small, select set of sites. Let’s call these sites “seed sites”. As their number is small, the selection of them is very important. It, too, can be done automatically. For example, sites which are either high on the search engine result pages (SERPs) could be chosen or those with many outgoing links.

The actual method to select them shall not be of our concern here. For Google, it will remain a secret anyhow. Important is that the seed sites get selected by some parameters that qualify them. This is (by intension) very vague, but the point to note is that there must be a reason to be in that set. It does not happen just by accident.

In case of real-world search engine, I’d also say that the seed set is not fixed, but being worked on all the time. So we do not have a static set, but one that evolves over time. Just think about the spam busters that each search engine employs. I guess any site detected to be spammy will also become part of the seed set for trust rank - with a thumbs down vote. And while I am speculating: I’d assume that there also is a time value that comes with the human vote - a more recent review will count higher than a review done month ago. But that is pure speculation. For a software developer like me, it just sounds like the right thing to do…

The seed sites are reviewed to be either trustworthy or not. Note that a vote to be not trustworthy takes some trust away from the sites they link to. This is basically known with PR too - the old “do not go into a bad link neighborhood  paradigm“.

Based on the (ever changing) seed set and the (ever changing ;)) pagerank-like trustrank algorithm, trust is assigned to each and every page. As with pagerank, the closer you are to a trusted site, the more trust you receive (or is taken away from you, if being linked to from a bad page). The TR calculation itself is purely automatic, no human intervention required. The end result is a nice TR value for each page. That value will be ever-changing too, but for a given moment in time it has a specific value. Let’s freeze time now and think about what that value means…

… it is absolutely up to the search engine what it means! Of course, TR will be used to order pages in the SERPs. So it will be used to decide if you site will be shown on page 1 or 1,000. But trustrank alone, IMHO, would be far too inferior to be used as the sole, or major, source of search result page sort order. I guess that Google will use TR as one parameter is uses to compute the overall value that it assigns to a page in regard to this search word. I don’t mean page rank here, which I consider to be just another parameter. I am sure there are a myriad of other parameters. The NY Time interview has quite some good explanation on what may be considered, so if you like more ideas, go and read it.

The question is how much weight Google assigns to TR and PR. You’ll probably never find an official Google answer. And, to be honest, I don’t think one is even needed. It is obvious that Google will tweak that part of the algorithm the same it tweaks other parts of it. So, for example, the weight may be a number x for a given search term and a value of y for another. And the very next day it may even be completely different, because the Google search team has had another bright idea.

Speculation again: what I think what happened by the last ranking update is that Google probably changed the weights as well as some other parameters in its algorithm. I do not think they introduced trustrank for the first time. Its too long known for Google to adopt it at that time. But they’ve probably given it a boost to combat what they consider spam.

So, what’s the lesson to learn? Unfortunately, I can not (and will not) offer any black hat SEO here: nothing has really changed. Google likes sites who get link from authority sites. Google is probably making it harder to fake being an authority site. They don’t like it if you get your link from that poor and unmaintained, heavily spammed university department x link directory. They like it, however, if you get that same link from the hard to obtain spot on that same universities home page. Same applies for other authority sites. I guess the bar has risen in this area.

For us webmasters, it means that it is even more important to try getting links form high-profile sites. Sounds surprising? I hope not… I know it is hard to do that, but I like the idea that there is a reward for high quality content. And, of course, black hats will sneak in and find their ways around the new algorithm. But that, too, will not last too long.

If you intend to build a long-lived site, there is no way around creating high quality, unique content. That will bring the best reward in the long term. And, after all, isn’t that why humans (aka visitors) like and visit web sites?

By Rainer 1 comment


Recent Blog Posts

Categories

Posts by Month

Blogroll