【C++】去除vector里重复元素的方法比较

【C++】去除vector里重复元素的方法比较
背景：构造一个无重复的白名单，之后要在里面进行二分查找。故要求名单有序，且无重复，并且要进行二分查找，所以要采用有：随机访问迭代器类型的容器。这类容器有vector，array，deque。显然要vector和deque合适一点，但是deque并没有体现出其两端和中间插入时间为固定而非线性的优势，因为本例都在尾部插入，vector和deque同为固定时间。而deque的随机存储操作时间长，故采用vector。

一.利用STL算法unique

首先要将vector排序，排序后。利用erase配合unique算法。利用一个含有一百万整数，里面重复数字并不太多的情况测试。
[cpp] view plain copy
1. #include<fstream>
2. #include<iostream>
3. #include <vector>
4. #include<algorithm>
5. #include<ctime>
7. using namespace std;
8. void main()
9. {
10. ifstream fwhite;
11. int number;
12. vector<int> white_list;
13. clock_t cost;
14. fwhite.open("largeW.txt");
15. if(!fwhite.is_open())
16. {//or use .good .fail or directly use ! to judge if the file has been opened successfully
17. cout<<"can't open file list"<<endl;
18. exit(EXIT_FAILURE);
19. }
20. cost=clock();
23. while(!fwhite.eof())
24. {
25. fwhite>>number;
26. white_list.push_back(number);
27. }
28. cost=clock()-cost;
29. cout<<"Time to load data : "<<cost<<endl;
31. sort(white_list.begin(),white_list.end());
32. white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
33. cost = clock()-cost;
34. cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
36. ofstream fout("sort_white.txt",ios::trunc);
38. vector<int>::iterator iter=white_list.begin();
39. while (iter!= white_list.end())
40. {
41. fout<<*iter<<endl;
42. iter++;
43. }
44. cost = clock()-cost;
45. cout<<"Time to write data into file : "<<cost<<endl;
46. exit(EXIT_SUCCESS);
47. };
二.利用set配合copy

读数据的时候就用set，然后直接拷贝到vector。但是拷贝的时候要用到insert_iterator来进行插入拷贝。（溢出问题）
[cpp] view plain copy
1. #include<fstream>
2. #include<iostream>
3. #include <vector>
4. #include<set>
5. #include<algorithm>
6. #include<ctime>
7. #include <iterator>
8. using namespace std;
9. void main()
10. {
11. ifstream fwhite;
12. int number;
13. vector<int> white_list;
14. set<int> ori_list;
15. clock_t cost;
16. fwhite.open("largeW.txt");
17. if(!fwhite.is_open())
18. {//or use .good .fail or directly use ! to judge if the file has been opened successfully
19. cout<<"can't open file list"<<endl;
20. exit(EXIT_FAILURE);
21. }
22. cost=clock();
25. while(!fwhite.eof())
26. {
27. fwhite>>number;
28. ori_list.insert(number);
29. }
30. cost=clock()-cost;
31. cout<<"Time to load data : "<<cost<<endl;
33. insert_iterator<vector<int> > it(white_list,white_list.begin());
34. copy(ori_list.begin(),ori_list.end(),it);
35. cost = clock()-cost;
36. cout<<"Time to copy data from set to vector : "<<cost<<endl;
38. ofstream fout("sort_white.txt",ios::trunc);
39. vector<int>::iterator iter=white_list.begin();
40. while (iter!= white_list.end())
41. {
42. fout<<*iter<<endl;
43. iter++;
44. }
45. cost = clock()-cost;
46. cout<<"Time to write data into file : "<<cost<<endl;
47. exit(EXIT_SUCCESS);
48. };
三.时间开销从开始构造容器开始，利用clock计时

第一种耗时：8.477秒

第二种耗时：23.246秒

看出，还是直接用vector就好，然后配合unique好。原因：同样插入100万个整数，set用时过长，经测试用去了约18秒。为主要开销。

第一种：读取文件到vector开销5.852秒，排序并去除重复元素开销3.205秒，写文件开销15.624秒。总耗时约24秒左右。

第二种：读文件到set开销18.893秒，从set拷贝数据到vector开销4.884秒，写文件开销20秒。总耗时约44秒左右。

但是看出程序写文件很慢，本例中采用iterator迭代取值写文件，如果直接采用索引下标会不会更快？或者采用copy函数和stream_interator？

四.在一的基础上，最后写文件时采用下标而不是迭代器

发现并无明显改进。

五.采用统一复制，配合ostream_iterator使用，在此例中速度缩短近一半。
[cpp] view plain copy
1. #include<fstream>
2. #include<iostream>
3. #include <vector>
4. #include<algorithm>
5. #include<ctime>
6. #include <iterator>
8. using namespace std;
9. void main()
10. {
11. ifstream fwhite;
12. int number;
13. vector<int> white_list;
14. clock_t cost;
15. fwhite.open("largeW.txt");
16. if(!fwhite.is_open())
17. {//or use .good .fail or directly use ! to judge if the file has been opened successfully
18. cout<<"can't open file list"<<endl;
19. exit(EXIT_FAILURE);
20. }
21. cost=clock();
24. while(!fwhite.eof())
25. {
26. fwhite>>number;
27. white_list.push_back(number);
28. }
29. cost=clock()-cost;
30. cout<<"Time to load data : "<<cost<<endl;
32. sort(white_list.begin(),white_list.end());
33. white_list.erase(unique(white_list.begin(),white_list.end()),white_list.end());
34. cost = clock()-cost;
35. cout<<"Time to remove reduplicative data from vector : "<<cost<<endl;
37. ofstream fout("sort_white.txt",ios::trunc);
39. /*vector<int>::iterator iter=white_list.begin();
40. while (iter!= white_list.end())
41. {
42. fout<<*iter<<endl;
43. iter++;
44. }*/
45. //for(unsigned int index = 0;index< white_list.size();index++)
46. //{
47. // fout<<white_list[index]<<endl;
48. //}
50. copy(white_list.begin(),white_list.end(),ostream_iterator<int,char>(fout," "));
51. cost = clock()-cost;
52. cout<<"Time to write data into file : "<<cost<<endl;
53. exit(EXIT_SUCCESS);
54. };
另外：largeW文件是从《算法4》的网站得到的，或者可以采用rand函数先自己制造一个。每行一个int型整数，100万行即可。
相关阅读:
NSOperationQueue
iOS开发网络数据之AFNetworking使用
 NSURLConnection
SQLite3 嵌入式数据库
 iOS中常用的四种数据持久化方法简介
 <转> core Animation动画-2
core Animation动画
 ios数据库操作SQLite
SQLite3-各个函数
 SQLite3-数据类型
原文地址：https://www.cnblogs.com/helloWaston/p/4595231.html

最新文章
四舍五入
 bind cname
Install redis
Redshitf Install
Install gitlab
ansible roles
启动elasticsearch
切换用户启动程序
 jinjia2
su root

【C++】去除vector里重复元素的方法比较

一.利用STL算法unique

二.利用set配合copy

三.时间开销从开始构造容器开始，利用clock计时

四.在一的基础上，最后写文件时采用下标而不是迭代器

五.采用统一复制，配合ostream_iterator使用，在此例中速度缩短近一半。