Programming, SEO, Theory, Tips & Tricks

Transcraping – Translation Scraping

I want to introduce a new topic to you guys: Translation Scraping. Now a day’s you see lots of scraper sites that scrape RSS feeds and republish content in adsense laden sites. Well that’s all well and good, but clearly, we have other tools in our arsenal to monetize scraper splogs…..we have the ability to translate on the fly.

Consider this: A simple script that takes a keyword, does a google blog search for that keyword, collects all the urls that come up as a match, passes that URL to an online translator, and then posts the translated content to a blog via xml-rpc.

I mean, why not? If you are going to scrape sites in the same language, you might as well cover your bases and give’er in other languages too! Come on, show a little multiculturalism for christ’s sake………..

Here is a little example I hacked up using a post from my good friend Eli over at Blue Hat Seo. From my experience, I happen to know that scraped splogs his content convert really well with the russian market……..( sorry buddy :P )
This is programmed in ruby and uses mechanize and the xml-rpc library

require ‘xmlrpc/client’
module MetaWebLogAPI
class Client
def initialize(server, urlPath, blogid, username, password)
@client = XMLRPC::Client.new(server, urlPath)
@blogid = 1
@username = “bingobango”
@password = “password”
end

def newPost(content, publish)
@client.call(’metaWeblog.newPost’, @blogid, @username,
@password, content, publish)
end

end
end

require ‘mechanize’
agent = WWW::Mechanize.new
agent.user_agent_alias = “Mac Safari”
agent.set_proxy(’localhost’, ‘8118′)

@source = “http://www.bluehatseo.com/followup-seo-empire-part-1/”
@url = “http://www.online-translator.com/url/tran_url.asp?lang=en&url=#@source&direction=er&template=General&cp1=NO&cp2=NO&autotranslate=on&psubmit2.x=40&psubmit2.y=7″

doc = agent.get @url
title = doc.search(”p.post-info”).inner_text
guts = doc.search(”div.post-content”).inner_text

client = MetaWebLogAPI::Client.new(’bingobango.wordpress.com’, ‘/xmlrpc.php’, ‘bingobango’, ‘bingobango’, ‘password’)
blogpost = {’title’ => title, ‘description’ => guts, }
client.newPost(blogpost, true)

And you can stroll on over to http://bingobango.wordpress.com/ to see the results of our handiwork.

The really cool thing about this, is you can create a spider that automates these procedures indefinately…..so create a script that monitors a group of keywords, and create a few blogs (depending on the size of the keyword niche, and how much content you are dealing with) and have your spider automatically translate and post new content as it comes in.

some posts that may be related

4 Comments

speak up

Add your comment below.

Subscribe to these comments.

*Required Fields