Parallel Baidu Google Searches
I wrote a lot using language like “reference point keyword clusters” a few months ago. It portrayed a distinct style which I know annoyed a lot of people (and I also don’t care - how annoying is that!). Anyway, I heard recently of Baidu, the CHINESE GOOGLE KILLER is I think how they are trying to seed it into the cultural storyline.
But Baidu was once very tight with Google. In fact, Google owned a chunk of them. Then Google got into trouble for collaborating with the Chinese government to censor their internet. I’m guessing this is connected with why they dumped Baidu.
Google doesn’t censor internet results here though, right?
Of course not! Let’s prove it.
Look for “reference point keyword clusters” (without quotes) on both Google and Baidu.com.
Google and I have become friends (for the most part). We understand each other. The pretty little Google algorithm knows I use my website to store information that is important to me, which I often go back, retrieve and build on. So it tailors the results it shows me to make my website look more important (be listed more prominently) for me, than it may for someone who has very different usage patterns.
With *my* version of the Google results for our selected search term, the first URL listed is:
findory.com/read?id=d2d3a138&ib=2
This connects to my FeedBurner feed (FeedBurner was recently bought by Google as well). FeedBurner uses my RSS feed to slice and dice the core content out of the context of my website and make it publishable anywhere else. In this case, all they are doing is redirecting the URL and “stealing” the potential PageRank reward I would get from a more direct link to my site (I have noticed that my PR fell a whole point: RSS re-usage is almost certainly why).
On Baidu though, the full potential of RSS content re-packaging is displayed. The URL listed as the first result is:
www.zhuaxia.com/item/211395092
Then what you see is somebody else’s web domain displaying my content, through the context of their interface. It’s not really wrong, illegal or even that shady. It’s kind of the purpose of RSS and “Web 2.0″ where content can be plucked from context and “mashed up.”
BUT and it’s a big one:
How does anybody know where this information really came from? Did I really write it? Did somebody else? Did they pull out my RSS content and then filter it or change anything? You simply don’t know unless you have an AUTOMATED TOOL to constantly run parallel searches and do statistic analysis for variations of source files. At some point I envision a piece of software whose job is it basically construct a “FITS BEST” version of texts. That’s sort of what Wikipedia is aiming for, and they are doing a great job - all simply by leveraging human intelligence in beautiful natural (and sometimes messy) ways.
So question: is Google still collaborating with the Chinese government through using Baidu as a proxy public image hand-off agent? The interfaces are the same. Google owns FeedBurner, the source of this data. Who owns those random spam domain names after all? There’s no way to trace it.
Food for thought: is your internet being censored, right here in the US? How would you know unless you were constantly running parallel searches, and statistical analysis against various database sources? Is that how BitTorrent basically works?

![[tmbchr]™](/journal/popocculture-blog-logo.jpg)
September 14th, 2007 at 6:03 pm
Huh, my WordPress went down RIGHT after posting this and has been acting suspicious ever since I started this train of thought…
If I were a superstitious man…
September 14th, 2007 at 6:05 pm
Someone told me recently that he thought I hadn’t updated my blog in several months…
A case of an old cache or something else going on?
September 14th, 2007 at 6:06 pm
Chinese cyber-attacks on New Zealand?
http://www.news.com/China+accused+of+c...+New+Zealand/2100-7348_3-6207678.html
Who is programming for the Chinese government?
Starting to sound like Cryptonomicon around here…
September 14th, 2007 at 7:02 pm
http://www.telegraph.co.uk/news/main.j...l?xml=/news/2007/05/17/westonia17.xml
September 14th, 2007 at 7:09 pm
http://www.maxpower.ca/wordpress-plugi...t-detecting-content-theft/2006/09/25/
September 15th, 2007 at 5:24 am
Should we being trying to protect our feeds from indexing and re-use? Doesn’t that defeat the point of the technology?
Is this FeedBurner NoIndex tool actually good for somethin? What the hell is Yahoo Pipes?
http://screencast.com/t/hHkcW7AGrl
October 18th, 2007 at 12:16 pm
[…] See also my analysis of parallel searches on Google and Baidu and how they are using RSS to rip content out of our sites and repackage it in a censor-friendly format. […]
January 14th, 2008 at 5:06 am
[…] CORRECTION: Google set me up the bomb somewhat on that one above. Never say never when the statement is based on a belief that Google’s index is accurate. I should know better. It doesn’t index Cryptogon properly anymore. Tim has warned about this as well. I use the built in search function in WordPress to find things on my own site now because Google is disappearing things. But that’s the beauty of the Intertubes; when a correction is necessary, it doesn’t take long for me to hear about it. […]