[tmbchr]™

Quick Guide To Content Scraping



I get my content scraped all the time, yo.

Content-scraping might be described as the online intellectual property law equivalent of bootlegging. Somebody else copies your product - pretty much exactly - and benefits financially from your work and initial investment.

Getting your content scraped online, I guess, gives you some kind of weird convoluted street-cred. All it really means though, is your website is on some list in some database as a producer of original or semi-original content. That may not even be how “they” define you as an appropriate data source, but it seems like a fairly logical reason. So you have a list of sites producing content and blasting it out in all directions, over RSS and on the Google-controlled adwebs, and then you have some spam algorithms which pluck out sections of your content around highlighted keyword clusters (which somebody somewhere is paying money through some middle man to push and pull the public meaning of words) and then push those out through a vast network of spam-ridden URL’s and strange foreign-esque blogs ridden with trashy Google advertisements. It’s the seedy underbelly of the internet.

And here’s an example, using content plucked from my post about the United States as a corporate brand, which will only be used as a shell as long as it remains profitable to its main shareholders.

Spam sites work the same way: most tend to be fly-by-night operations whose purpose is to rise up in a seemingly-organic groundswell of keyword-laden meaning clusters and then evaporate and transmogrify as soon as the Google spiders come along and try to repair holes in their linguistic-meaning control modules.

Google Satellites Taking Over The World

Either way, I’m proud to be a part of it. Making all of my content freely available in the Public Domain means, technically, that any end or intermediate user can make any use of it which they want. A human can use it, a spammer-human, a spammer-human’s semi-intelligent algorithms - whatever! The sky’s the limit. The more the merrier, I figure. And it’s absolutely nutty to be able to watch firsthand the behavior of information once you have set it free in the wild and it learns to survive on its own. Godspeed you little word-soldiers! May you worm your way across many internets and write yourselves into the viral codes people download on purpose to fix themselves in the distant future!







2 Reader Responses

  1. Big Elk Says:

    Another ironic example of content scraped from this very post:

    http://www.kwcincy.com/quick-guide-to-content-scraping/

  2. Big Elk Says:

    Two new and quick-spawning scrapelets birthed off my datawake:

    http://hdtv.jfcforum.com/2008/04/02/usa-600-tax-rebates-for-hdtv/

    This one is based off me to but is accredited to “Charlie Wood”

    http://www.gearfire.com/?p=33511



SURROUND YOURSELF WITH STRENGTH.