Quantcast

80Legs, 50k Computers and a Web Crawler

Dec 22, 2009 | 2 Comments |
|  

By Senior Editor – Kris Smith (@croncast)

Picture 7You need a pile-o-data fast and you got nowhere to get it other than surf, bookmark and beg for interns to copy and paste for you. Where do you turn? Your IT department? Your hackery skills and your shared GoDaddy hosting account for bandwidth? Nah.

80Legs is ready to run a couple miles with your pile of data on their shoulders. You get to pick it up and work with it as you see fit.

Did I mention that they are now offering this as a free service? Well, up to a certain point it is free but for the many is plenty of room to get what they’re looking for.

80Legs offers a unique service that will crawl the internet on your behalf and gather data from the links that you provide. They then take this unstructured data and make it available for further refinement to the customer.

Their value proposition lies in the ability to deliver this service efficiently and affordably. Like I said earlier, it would be difficult if not impossible for an individual run a service to crawl 100,000 pages quickly. 80Legs is offering this as a free service now and it’s all powered by a 50,000 computer network.

The ability to put the data collection into another companies hands allows developers to think about what to do with the data. By freeing up developers more can be done with the data that is returned to them as they have time to think about new algorithms to run across the dataset.

An example of this would be simple search. Developers with more time could work on creating new layers to search that make it more valuable to the end user. Whether it is integrating advanced search functionality or returning results contextually depending on the page that a user is currently searching from.

If you’re interested, the free Basic specs are below. Plus and Premium are listed on their blog.

80Legs Basic Plan:

  • Free to use
  • Normal crawling speed (up to 1 request/second/domain)
  • Access to 80legs Web Portal
  • 1 job running at a time
  • Up to 100K crawled pages per job
  • Low priority in 80legs job queue
  • No recurring jobs allowed

[Via VentureBeat]

Reblog this post [with Zemanta]
Tags: , , , , , , ,

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

2 Comments »

  • Marck said:

    And right now they don’t follow robots.txt and flood sites with their crawler, abusing different IPs – mainly from comcast, cox and verizon.

    Webmasters should block this bad bot in htaccess!

  • Web crawler as a service : 80legs – Good or bad? | Debajyoti Banerjee said:

    [...] 80Legs, 50k Computers and a Web Crawler (techstartups.com) [...]

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.