Data Set
-
Original Algorithm : All Seed Websites List ( .txt [ 12, 800 ] ) , 10% URL Pairs of Bilingual Webpages ( .sql [ 2, 9025 / 290, 247 ] ) by Using Original Algorithm, List of Bilingual URL Pattern Credibility (.txt [ 36, 558 ], List of Pattern Credibiligy Larger Than 100: .txt ) Based on URL Pairs of Bilingual Web Page
-
Incremental Algorithm of Bilingual Webpages Crawling : All Related Websites List ( .txt [ 9, 577 ] ) , Top-500 Candidate Bilingual Websites List ( By Sum of Link out: .txt, By Sum of PageRank: .txt, By Sum of Credibility-Weighted-PagerRank: .txt ) , 10% URL Pairs of Bilingual Webpages ( .sql [ 3, 749 / 37, 491 ] ) Based on Top-500 Candidate Bilingual Websites by Using Incremental Algorithm of Bilingual Webpages Crawling
-
Algorithm of Weak Keys Rescuing : 10% URL Pairs of Bilingual Webspages ( .sql [ 1, 002 / 10, 016 ] ) by Using Algorithm of Weak Keys Rescuing
-
Algorithm of Bilingual Deep Webpages Detecting : All Monolingual 'Deep' URL List of the Seed Websites with Domain 'gov.hk' ( .txt [ 103, 055 ] ), 10% URL Pairs of Bilingual Webpages ( .sql [ 1, 583 / 15, 825 ] ) Based on Deep URL List of the Seed Websites with Domain 'gov.hk' by Using Algorithm of Bilingual Deep Webpages Detecting
-
Evaluation Result via Random Sampling : All Bilingual Webpages Evaluated via Random Sampling ( .sql [ 17, 223, Including 16, 313 True and 910 False Bilingual Webpages ] )
|
Publications
|