<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Parasite Scraping</title>
	<atom:link href="http://seocracy.com/2007/09/parasite-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://seocracy.com/2007/09/parasite-scraping/</link>
	<description>A blog about technical SEO, Ruby, Web Apps, and more</description>
	<lastBuildDate>Mon, 05 Jul 2010 17:10:23 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Nis</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-48</link>
		<dc:creator>Nis</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-48</guid>
		<description>&lt;p&gt;Isn&#039;t it kind of risky to have that kind of site? --&gt; http://opensource.votio.com/php/forum/poker-gambling &lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Isn&#39;t it kind of risky to have that kind of site? &#8211;&gt; <a href="http://opensource.votio.com/php/forum/poker-gambling" rel="nofollow">http://opensource.votio.com/php/forum/poker-gambling</a> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seocracy</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-49</link>
		<dc:creator>Seocracy</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-49</guid>
		<description>&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Yeah, as we&#039;re talking about here, it is definately easy to exploit scraper sites....not even just ones that have the scraped query in the URL through mod_rewrite.....there are other techniques to exploit other kinds of scrapers....but thats fodder for a later post.&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&#160;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;There are tons of sites like this this you know how to find them&lt;/font&gt;&#160;&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p><font size="2">Yeah, as we&#39;re talking about here, it is definately easy to exploit scraper sites&#8230;.not even just ones that have the scraped query in the URL through mod_rewrite&#8230;..there are other techniques to exploit other kinds of scrapers&#8230;.but thats fodder for a later post.</font></p>
<p>&nbsp;</p>
<p><font size="2">There are tons of sites like this this you know how to find them</font>&nbsp;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Money Maker Blogs</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-50</link>
		<dc:creator>Money Maker Blogs</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-50</guid>
		<description>&lt;p&gt;Thanks for this post. I have been reading about scraping recently and I&#039;m trying to learn how to do it. This is interesting - scraping the scrapers... I like it. :) &lt;/p&gt;&lt;p&gt;This is a newbie question, but how do you implement the code examples above on a webpage to show the scraped content?&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Thanks for this post. I have been reading about scraping recently and I&#39;m trying to learn how to do it. This is interesting &#8211; scraping the scrapers&#8230; I like it. <img src='http://seocracy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </p>
<p>This is a newbie question, but how do you implement the code examples above on a webpage to show the scraped content?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seocracy</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-51</link>
		<dc:creator>Seocracy</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-51</guid>
		<description>That is Ruby. It&#039;s not as simple to run as just copy and pasting.....It requires its own interpreter program to execute the code, and it is not standard on many hosts. Do some research.</description>
		<content:encoded><![CDATA[<p>That is Ruby. It&#39;s not as simple to run as just copy and pasting&#8230;..It requires its own interpreter program to execute the code, and it is not standard on many hosts. Do some research.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Money Maker Blogs</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-52</link>
		<dc:creator>Money Maker Blogs</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-52</guid>
		<description>Thanks for the response, off to my friend google I go :)</description>
		<content:encoded><![CDATA[<p>Thanks for the response, off to my friend google I go <img src='http://seocracy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: taky</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-53</link>
		<dc:creator>taky</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-53</guid>
		<description>&lt;p&gt;Good post, hopefully all the newbies will be able to benefit off of everyone elses Wordze accounts.&lt;/p&gt;&lt;p&gt;Hehe. &lt;/p&gt;&lt;p&gt;I just got finished writing my own content generation program that is similar, the first example scraper site is really good. That is a good blackhatter, and the sites are similar to mine (very clean and neat looking template).&lt;/p&gt;&lt;p&gt;I have been pretty sick of the spammiest looking spam sites you can ever imagine coming up in a G search. I&#039;d also be suprised if those sites converted at all, for clicks or for affiliate programs. They just look so shitty.&lt;/p&gt;&lt;p&gt;The secret is to look neat, and have your PPC ads above the fold. Easy. &lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Good post, hopefully all the newbies will be able to benefit off of everyone elses Wordze accounts.</p>
<p>Hehe. </p>
<p>I just got finished writing my own content generation program that is similar, the first example scraper site is really good. That is a good blackhatter, and the sites are similar to mine (very clean and neat looking template).</p>
<p>I have been pretty sick of the spammiest looking spam sites you can ever imagine coming up in a G search. I&#39;d also be suprised if those sites converted at all, for clicks or for affiliate programs. They just look so shitty.</p>
<p>The secret is to look neat, and have your PPC ads above the fold. Easy. </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Seocracy</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-54</link>
		<dc:creator>Seocracy</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-54</guid>
		<description>&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Whenever I write a post in the wee horus of the morning (like this one), I invariably forget to mention some key things....so here&#039;s a very quick run-down of the key things you should also thing about...&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Be anonymous. Proxy. Rotate IPs if possible. Install tor on your server if you have to.....&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;You shouldn&#039;t do anything half ass, especially this....don&#039;t forget to consider all aspects of your code that might leak your sites IP or other identify features. &lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;So be slick by remembering to pay attention to how you display the CSS for a page....I usually have my scraper scrape all pages ending in .css and then copy them into &lt;style&gt; tags in the header of the page I am scraping. &lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Always filter your html to avoid opening yourself to XSS&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Keep in mind that the people you are going to be targeting will generally have no qualms about fucking with you if they catch you. It wouldnt be hard for them to feed your scraper malicious code.&#160;&lt;/font&gt;&lt;/p&gt;&lt;p&gt;&lt;font size=&quot;2&quot;&gt;Oh...that reminds me....I didnt proxy those examples.....oh shit..uhhhhhhhhhhhhhh......&lt;/font&gt;&lt;/p&gt;

# bus error</description>
		<content:encoded><![CDATA[<p><font size="2">Whenever I write a post in the wee horus of the morning (like this one), I invariably forget to mention some key things&#8230;.so here&#39;s a very quick run-down of the key things you should also thing about&#8230;</font></p>
<p><font size="2">Be anonymous. Proxy. Rotate IPs if possible. Install tor on your server if you have to&#8230;..</font></p>
<p><font size="2">You shouldn&#39;t do anything half ass, especially this&#8230;.don&#39;t forget to consider all aspects of your code that might leak your sites IP or other identify features. </font></p>
<p><font size="2">So be slick by remembering to pay attention to how you display the CSS for a page&#8230;.I usually have my scraper scrape all pages ending in .css and then copy them into &lt;style&gt; tags in the header of the page I am scraping. </font></p>
<p><font size="2">Always filter your html to avoid opening yourself to XSS</font></p>
<p><font size="2">Keep in mind that the people you are going to be targeting will generally have no qualms about fucking with you if they catch you. It wouldnt be hard for them to feed your scraper malicious code.&nbsp;</font></p>
<p><font size="2">Oh&#8230;that reminds me&#8230;.I didnt proxy those examples&#8230;..oh shit..uhhhhhhhhhhhhhh&#8230;&#8230;</font></p>
<p># bus error</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bruce</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-55</link>
		<dc:creator>Bruce</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-55</guid>
		<description>&lt;p&gt;You mentioned installing Tor..&lt;/p&gt;&lt;p&gt;Do you know of any sites that give instructions for installing Tor on a linux server, so that I can run my scripts a &quot;bit&quot; more secretly..&lt;/p&gt;&lt;p&gt;&#160;&lt;/p&gt;&lt;p&gt;Hopeing you can help&lt;/p&gt;&lt;p&gt;&#160;&lt;/p&gt;&lt;p&gt;Bruce&#160;&lt;/p&gt;&lt;p&gt;&#160;&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>You mentioned installing Tor..</p>
<p>Do you know of any sites that give instructions for installing Tor on a linux server, so that I can run my scripts a &quot;bit&quot; more secretly..</p>
<p>&nbsp;</p>
<p>Hopeing you can help</p>
<p>&nbsp;</p>
<p>Bruce&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-56</link>
		<dc:creator>Rob</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-56</guid>
		<description>&lt;p&gt;Bruce, my suggestion is that you read all the help files on the tor website.&lt;/p&gt;&lt;p&gt;&lt;br /&gt;There are so many ways to set up an install on linux dependent on the environment, so I suggest you read the doc files.&lt;/p&gt;&lt;p&gt;&#160;&lt;/p&gt;&lt;p&gt;Make sure you have libevent installed on your machine (that was a stumbling block for me at first)&#160;&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Bruce, my suggestion is that you read all the help files on the tor website.</p>
<p>There are so many ways to set up an install on linux dependent on the environment, so I suggest you read the doc files.</p>
<p>&nbsp;</p>
<p>Make sure you have libevent installed on your machine (that was a stumbling block for me at first)&nbsp;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pcallisto</title>
		<link>http://seocracy.com/2007/09/parasite-scraping/comment-page-1/#comment-57</link>
		<dc:creator>pcallisto</dc:creator>
		<pubDate>Wed, 31 Dec 1969 16:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://127.0.0.1/seocracy/?p=12#comment-57</guid>
		<description>Another way to avoid detection would be to do your scraping off-site on a test server, injecting content into your own db, then uploading to your own sites.&#160; It&#039;s also a good work around if you don&#039;t have VPSes or your own dedicated servers yet.&#160; Can you say WP splogs?</description>
		<content:encoded><![CDATA[<p>Another way to avoid detection would be to do your scraping off-site on a test server, injecting content into your own db, then uploading to your own sites.&nbsp; It&#39;s also a good work around if you don&#39;t have VPSes or your own dedicated servers yet.&nbsp; Can you say WP splogs?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
