[tmbchr]™

Could Google’s Cache Be Used To Censor Data?



Professor Pan is a local Baltimore conspiracy blogger who I met and drank with a while back. Great guy. I recently came back in his direction geographically and contacted him to throw back some more brews. He mentioned something about my not having blogged on Pop Occulture in quite awhile. This was highly odd as I was back into the habit of blogging a whole hell of a lot; when I spoke to him over email, I was getting back into the habit of posting 15+ times a day.

I forget exactly how we resolved it. It may have been a cache issue on his end. But part of me wonders: how would I ever possibly know if Google was showing up-to-date info for my site in its search results?

Then, earlier today, I checked back using my site search - which runs through Google - to find out which number I was up to in a sequential feature I’ve been writing, called “Real Life Acting Tips“. I captured this image from the screen:

google-site-search-cache-out-of-date.jpg

What this shows is an actual screen from my Google-ized site search. And if you look at acting tip “#7″ you see that the subtitle being displayed is “Feelings.” However, if you click on the post page itself (here is the link), you’ll see that the actual title is “Once More With Feeling” - which is a reference to a Buffy the Vampire Slayer episode where everyone sings their lines as though it were a musical. Oddly, I changed to the newer pop culture reference-pt enhanced title shortly after I wrote the original piece. No longer than an hour after posting it originally with the shortened title, and probably closer to 15 minutes, because I almost never change a post after that time frame has elapsed, as I have new ideas to focus on constantly.

Clearly, this is no proof of anything, except that Google is serving results from a cached version of that post. Since I update my site so frequently, it only makes sense that some things are going to be slightly out of date like that. One way to test would be to see if newer posts than that are already in the index, something written the day after that, perhaps…

Okay, found one, the slasher-piece I wrote on Daniel Pinchbeck’s latest. I searched for the title of that and it came up okay, even though it was written a day later.

What does that prove? Not a hell of a lot unless we could start collecting a lot more data. But one of the speculative threads it throws out for me is this: does Google ever accidentally or intentionally serve older cached versions of websites when there are much more recent ones available? Is that even possible with their technological set-up? And I’m not talking about just by a few days’ margin either - I’m talking about potentially over weeks or months, the kind of thing which would account for Professor Pan’s not seeing updates on my site for “quite a while.”

Could this also be connected to why my daily traffic has halved over the past few weeks - caching issues perhaps? It strikes me as a weird (not to mention weirdly-timed, subject-wise) thing to suddenly happen, after having fairly consistent traffic ratings for over two years previous to this. Anyone have any solid ideas of how I can investigate and improve this situation? Is there a type of internet system (peer-to-peer or human-assisted searches perhaps?) which we could devise which would enable us to overcome problems like this? How can we side-step giant corporations as information gate-keepers and make sure we’re getting the kind of up-to-date and accurate information from people in our online shared value communities?







2 Reader Responses

  1. What’s Missing From This Blog Post? - Pop Occulture Says:

    […] {Piggy-backing on the idea I just posted about: using cached versions of pages in place of newer updated versions, either intentionally or accidentally…} […]

  2. Tim Boucher Says:

    See also:

    http://www.timboucher.com/journal/2007...09/14/parallel-baidu-google-searches/
    http://www.timboucher.com/journal/2007/09/19/is-rss-killing-pagerank/



SURROUND YOURSELF WITH STRENGTH.