2014-04-24 22:01
题目:你有10亿条url,怎么检测其中时候有重复呢?
解法:Hash,算签名,然后用K-V数据库保存数据查重。
代码:
1 // 10.6 You have 10 billion URLs, how would you do to detect duplicates in them. 2 // Answer: 3 // 1. Use digital sign algorithm to convert string to a number of checksum. 4 // 2. Use this sign as the hash key, if memory allow, use an in-memory hash table to detect duplicates. 5 // 3. If memory won't fit in, use K-V database instead. 10GB scale should be acceptable for one machine, so I won't seek help from another computer. 6 int main() 7 { 8 return 0; 9 }