PAT-1063 Set Similarity （set集合）

PAT-1063 Set Similarity （set集合）
1063. Set Similarity
Given two sets of integers, the similarity of the sets is defined to be N_c/N_t*100%, where N_c is the number of distinct common numbers shared by the two sets, and N_t is the total number of distinct numbers in the two sets. Your job is to calculate the similarity of any given pair of sets.

Input Specification:

Each input file contains one test case. Each case first gives a positive integer N (<=50) which is the total number of sets. Then N lines follow, each gives a set with a positive M (<=10⁴) and followed by M integers in the range [0, 10⁹]. After the input of sets, a positive integer K (<=2000) is given, followed by K lines of queries. Each query gives a pair of set numbers (the sets are numbered from 1 to N). All the numbers in a line are separated by a space.

Output Specification:

For each query, print in one line the similarity of the sets, in the percentage form accurate up to 1 decimal place.
Sample Input:
```
3
3 99 87 101
4 87 101 5 87
7 99 101 18 5 135 18 99
2
1 2
1 3
```
Sample Output:
```
50.0%
33.3%
```
题目大意：输入n个集合，每个集合中有若干数，现在需要做k次查询，每次给出要比较的两个集合，要求计算出相似度 = Nc / Nt * 100%，其中Nc是两个集合的交集的大小，Nt是两个集合并集的大小。

主要思想：考虑到每一个集合中可能存在重复的数，而且需要做大量的查找操作（找并集时对集合a的每个元素判断是否存在于集合b），很容易想到stl库中的set容器，因为set中不存在重复元素，而且查找操作很快。对于每次查找操作，设置初始值nc为 0， nt 为集合 b 的大小，集合 a 的每个元素，如果存在于集合 b，则 nc+1；如果不存在，则 nt+1（注意：如果用两集合大小之和减去两集合交集大小来计算 nt，可能会出现超时）。
```
#pragma warning(disable: 4786)
#include <cstdio>
#include <vector>
#include <set>
using namespace std;
int main(void) {
    int n, i, j;
    
    scanf("%d", &n);
    vector<set<int> > vec(n);
    set<int>::iterator iter; 
    int m, num;
    for (i = 0; i < n; i++) {
        scanf("%d", &m);
        for (j = 0; j < m; j++) {
            scanf("%d", &num);
            vec[i].insert(num);
        }        
    }
    int k, a, b;
    scanf("%d", &k);
    for (i = 0; i < k; i++) { 
        scanf("%d%d", &a, &b);
        int nc = 0, nt = vec[b-1].size();
        for (iter = vec[a-1].begin(); iter != vec[a-1].end(); iter++) {
            if (vec[b-1].count(*iter))                  //if (vec[b-1].find(*iter) != vec[b-1].end())
                nc++;
			else		
				nt++;
        }
	//  nt = vec[a-1].size() + vec[b-1].size() - nc;	//这样计算可能会超时
        printf("%.1f%%
", nc * 1.0 / nt * 100);       
    }
    
    return 0;
}
```
爬虫中的set容器解决这个问题就更容易了，& 和 | 分别对应交集和并集，唯一不足的就是有一个用例超时了。
```
n = int(input())
L1 = []
for i in range(n):
    st = input()
    L2 = st.split(' ')
    L1.append(set(L2[1:]))
k = int(input())
for i in range(k):
    pair = input().split(' ')
    x, y = int(pair[0]), int(pair[1])
    similarity = len(L1[x-1] & L1[y-1]) / len(L1[x-1] | L1[y-1]) * 100
    print('%.1f%%' % (similarity)
```
相关阅读:
Linux常用的命令
 Docker编写镜像发布个人网站
 Linux安装docker笔记
 单例模式
 Cache一致性协议之MESI
linux环境搭建单机kafka
【Ray Tracing The Next Week 超详解】光线追踪2-4 Perlin noise
【Ray Tracing The Next Week 超详解】光线追踪2-3
【Ray Tracing The Next Week 超详解】光线追踪2-2
【Ray Tracing The Next Week 超详解】光线追踪2-1
原文地址：https://www.cnblogs.com/zhayujie/p/7534847.html