Log In    About

Pupsniffer Evaluation Website


Algorithm Evaluation

   Evaluation Website for Pupsniffer [ <--- Click here to start your evaluation ] [Help]

  1. Real-time Results of Evaluation for Original Algorithm
  2. Real-time Results of Evaluation for Incremental Algorithm of Bilingual Webpages Extracting
  3. Real-time Results of Evaluation for Algorithm of Weak Keys Rescuing
  4. Real-time Results of Evaluation for Algorithm of Bilingual Deep Webpages Detecting

Source Code

   Pupsniffer (Parallel URL Pattern Sniffer, An Efficient Multilingual Web Corpus Tool)


Data Set

  1. Original Algorithm : All Seed Websites List ( .txt [ 12, 800 ] ) , 10% URL Pairs of Bilingual Webpages ( .sql [ 2, 9025 / 290, 247 ] ) by Using Original Algorithm, List of Bilingual URL Pattern Credibility (.txt [ 36, 558 ], List of Pattern Credibiligy Larger Than 100: .txt ) Based on URL Pairs of Bilingual Web Page

  2. Incremental Algorithm of Bilingual Webpages Crawling : All Related Websites List ( .txt [ 9, 577 ] ) , Top-500 Candidate Bilingual Websites List ( By Sum of Link out: .txt, By Sum of PageRank: .txt, By Sum of Credibility-Weighted-PagerRank: .txt ) , 10% URL Pairs of Bilingual Webpages ( .sql [ 3, 749 / 37, 491 ] ) Based on Top-500 Candidate Bilingual Websites by Using Incremental Algorithm of Bilingual Webpages Crawling

  3. Algorithm of Weak Keys Rescuing : 10% URL Pairs of Bilingual Webspages ( .sql [ 1, 002 / 10, 016 ] ) by Using Algorithm of Weak Keys Rescuing

  4. Algorithm of Bilingual Deep Webpages Detecting : All Monolingual 'Deep' URL List of the Seed Websites with Domain 'gov.hk' ( .txt [ 103, 055 ] ), 10% URL Pairs of Bilingual Webpages ( .sql [ 1, 583 / 15, 825 ] ) Based on Deep URL List of the Seed Websites with Domain 'gov.hk' by Using Algorithm of Bilingual Deep Webpages Detecting

  5. Evaluation Result via Random Sampling : All Bilingual Webpages Evaluated via Random Sampling ( .sql [ 17, 223, Including 16, 313 True and 910 False Bilingual Webpages ] )


Publications


Note

  1. All source code and data set of Pupsniffer are all free and released under the GNU/GPL License.
  2. Pupsniffer is language-independent. You can modify 'config.txt' in the Pupsniffer to crawling bilingual URL pairs according to your requirement.
  3. For any question, do not hesitate to contact with Dr. Chengzhi Zhang ( Email: zhangchz # istic.ac.cn ) or Prof. Chunyu Kit ( Email: ctckit # cityu.edu.hk ).