Google double la taille de son index
Par Olivier Duffez, jeudi 11 novembre 2004 à 11:15 :: Google :: #37 :: rss
Google vient de mettre à jour l'indication du nombre de pages présentes dans son index, passant de 4 285 199 774 à 8 058 044 651 pages. Ce n'est pas si étonnant que ça puisque l'on avait remarqué depuis longtemps que certaine requêtes fournissent plus de 4 milliards de résultats (attention toutefois, cette indication n'étant qu'une estimation), et que les robots de Google avaient largement intensifié leur activité depuis quelques semaines. Ainsi par exemple la requête "the" renvoie environ 8 milliards de résultats.
Comme d'habitude, ce genre d'annonces est souvent calculé et fait partie de la stratégie de communication de Google. Il se trouve que Google annonce le doublement de la taille de son index le jour de l'ouverture du moteur de recherche MSN, son grand rival. Coïncidence ?

Sur google.com on peut même trouver aujourd'hui une phrase expliquant que Google double la taille de son index, avec un lien vers le blog officiel de Google.
Google's index nearly doubles.
You probably never notice the large number that appears in tiny type at the bottom of the Google home page, but I do. It's a measure of how many pages we have in our index and gives an indication of how broadly we search to find the information you're looking for. Today that number nearly doubled to more than 8 billion pages. That made me smile.
Comprehensiveness is not the only important factor in evaluating a search engine, but it's invaluable for queries that only return a few results. For example, now when I search for friends who previously generated only a handful of results, I see double that number. These are not just copies of the same pages, but truly diverse results that give more information. The same is true for obscure topics, where you're now significantly more likely to find relevant and diverse information about the subjects. You may also notice that the result counts for broader queries (with thousands or millions of results) have gone up substantially. However, as with any search engine, these are estimates, and the real benefit lies with the queries that generate fewer results.
The documents in Google's index are in dozens of file types from HTML to PDF, including PowerPoint, Flash, PostScript and JavaScript. Together these pages represent a good chunk of the world's information, but hardly all of it. That's why we keep building more advanced systems for crawling the web and creating more sophisticated indices to sort what we find. So 8 billion pages is a milestone worth noting, but it's not the end of the road. The real test is how well we do in finding what you want from within those pages. We'll keep improving that too.
Bill Coughran V.P., Engineering



Commentaires
1. Le jeudi 11 novembre 2004 à 21:34, par Bastien :: site
Ajouter un commentaire
Les commentaires pour ce billet sont fermés.