java - how to implement an url seen test / cache across multiple crawlers and maintain a good performance? -



java - how to implement an url seen test / cache across multiple crawlers and maintain a good performance? -

currently developing little webcrawler private testing purpose.

my basic setup this:

-2 computers crawling (small ram) -1 computer main "database" (big ram).

my primary question how implement url seen test across multiple crawlers?

if had 1 crawler implement bloom filter in java, got stuck on how when have "infinite" crawlers - since needed "synchronized"? there needs central cache or something, , how remain fast?

querying mysql millions of rows after time may slow...

java mysql database caching web-crawler

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -