java - how to implement an url seen test / cache across multiple crawlers and maintain a good performance? -
java - how to implement an url seen test / cache across multiple crawlers and maintain a good performance? -
currently developing little webcrawler private testing purpose.
my basic setup this:
-2 computers crawling (small ram) -1 computer main "database" (big ram).
my primary question how implement url seen test across multiple crawlers?
if had 1 crawler implement bloom filter in java, got stuck on how when have "infinite" crawlers - since needed "synchronized"? there needs central cache or something, , how remain fast?
querying mysql millions of rows after time may slow...
java mysql database caching web-crawler
Comments
Post a Comment