• 探讨:java中删除数组中重复元素


      这个是一个老问题,但是发现大多数人说的还不够透。小弟就在这里抛砖引玉了,欢迎拍砖.......

      问题:比如我有一个数组(元素个数为0哈),希望添加进去元素不能重复。

      拿到这样一个问题,我可能会快速的写下代码,这里数组用ArrayList.

       private static void testListSet(){
    List<String> arrays = new ArrayList<String>(){
    @Override
    public boolean add(String e) {
    for(String str:this){
    if(str.equals(e)){
    System.out.println("add failed !!! duplicate element");
    return false;
    }else{
    System.out.println("add successed !!!");
    }
    }
    return super.add(e);
    }
    };

    arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
    for(String e:arrays)
    System.out.print(e);
    }

      这里我什么都不关,只关心在数组添加元素的时候做下判断(当然添加数组元素只用add方法),是否已存在相同元素,如果数组中不存在这个元素,就添加到这个数组中,反之亦然。这样写可能简单,但是面临庞大数组时就显得笨拙:有100000元素的数组天家一个元素,难道要调用100000次equal吗?这里是个基础。

          问题:加入已经有一些元素的数组了,怎么删除这个数组里重复的元素呢?

      大家知道java中集合总的可以分为两大类:List与Set。List类的集合里元素要求有序但可以重复,而Set类的集合里元素要求无序但不能重复。那么这里就可以考虑利用Set这个特性把重复元素删除不就达到目的了,毕竟用系统里已有的算法要优于自己现写的算法吧。

        public static void removeDuplicate(List<People> list){
    HashSet<People> set = new HashSet<People>(list);
    list.clear();
    list.addAll(set);
    }
      private static People[] ObjData = new People[]{
            new People(0, "a"),new People(1, "b"),new People(0, "a"),new People(2, "a"),new People(3, "c"),
        }; 
    public class People{
    private int id;
    private String name;

    public People(int id,String name){
    this.id = id;
    this.name = name;
    }

    @Override
    public String toString() {
    return ("id = "+id+" , name "+name);
    }

    }

      上面的代码,用了一个自定义的People类,当我添加相同的对象时候(指的是含有相同的数据内容),调用removeDuplicate方法发现这样并不能解决实际问题,仍然存在相同的对象。那么HashSet里是怎么判断像个对象是否相同的呢?打开HashSet源码可以发现:每次往里面添加数据的时候,就必须要调用add方法:

             @Override
    94 public boolean add(E object) {
    95 return backingMap.put(object, this) == null;
    96 }

      这里的backingMap也就是HashSet维护的数据,它用了一个很巧妙的方法,把每次添加的Object当作HashMap里面的KEY,本身HashSet对象当作VALUE。这样就利用了Hashmap里的KEY唯一性,自然而然的HashSet的数据不会重复。但是真正的是否有重复数据,就得看HashMap里的怎么判断两个KEY是否相同。

             @Override public V put(K key, V value) {
    390 if (key == null) {
    391 return putValueForNullKey(value);
    392 }
    393
    394 int hash = secondaryHash(key.hashCode());
    395 HashMapEntry<K, V>[] tab = table;
    396 int index = hash & (tab.length - 1);
    397 for (HashMapEntry<K, V> e = tab[index]; e != null; e = e.next) {
    398 if (e.hash == hash && key.equals(e.key)) {
    399 preModify(e);
    400 V oldValue = e.value;
    401 e.value = value;
    402 return oldValue;
    403 }
    404 }
    405
    406 // No entry for (non-null) key is present; create one
    407 modCount++;
    408 if (size++ > threshold) {
    409 tab = doubleCapacity();
    410 index = hash & (tab.length - 1);
    411 }
    412 addNewEntry(key, value, hash, index);
    413 return null;
    414 }

       总的来说,这里实现的思路是:遍历hashmap里的元素,如果元素的hashcode相等(事实上还要对hashcode做一次处理),然后去判断KEY的eqaul方法。如果这两个条件满足,那么就是不同元素。那这里如果数组里的元素类型是自定义的话,要利用Set的机制,那就得自己实现equal与hashmap(这里hashmap算法就不详细介绍了,我也就理解一点)方法了:

    public class People{
    private int id; //
    private String name;

    public People(int id,String name){
    this.id = id;
    this.name = name;
    }

    @Override
    public String toString() {
    return ("id = "+id+" , name "+name);
    }

    public int getId() {
    return id;
    }

    public void setId(int id) {
    this.id = id;
    }

    public String getName() {
    return name;
    }

    public void setName(String name) {
    this.name = name;
    }

    @Override
    public boolean equals(Object obj) {
    if(!(obj instanceof People))
    return false;
    People o = (People)obj;
    if(id == o.getId()&&name.equals(o.getName()))
    return true;
    else
    return false;
    }

    @Override
    public int hashCode() {
    // TODO Auto-generated method stub
    return id;
    //return super.hashCode();
    }
    }

      这里在调用removeDuplicate(list)方法就不会出现两个相同的people了。

          好吧,这里就测试它们的性能吧:

    View Code
    public class RemoveDeplicate {

    public static void main(String[] args) {
    // TODO Auto-generated method stub
    //testListSet();
    //removeDuplicateWithOrder(Arrays.asList(data));
    //ArrayList<People> list = new ArrayList<People>(Arrays.asList(ObjData));

    //removeDuplicate(list);

    People[] data = createObjectArray(10000);
    ArrayList<People> list = new ArrayList<People>(Arrays.asList(data));

    long startTime1 = System.currentTimeMillis();
    System.out.println("set start time --> "+startTime1);
    removeDuplicate(list);
    long endTime1 = System.currentTimeMillis();
    System.out.println("set end time --> "+endTime1);
    System.out.println("set total time --> "+(endTime1-startTime1));
    System.out.println("count : " + People.count);
    People.count = 0;

    long startTime = System.currentTimeMillis();
    System.out.println("Efficient start time --> "+startTime);
    EfficientRemoveDup(data);
    long endTime = System.currentTimeMillis();
    System.out.println("Efficient end time --> "+endTime);
    System.out.println("Efficient total time --> "+(endTime-startTime));
    System.out.println("count : " + People.count);




    }
    public static void removeDuplicate(List<People> list)
    {
    HashSet<People> set = new HashSet<People>(list);
    list.clear();
    list.addAll(set);
    }

    public static void removeDuplicateWithOrder(List<String> arlList)
    {
    Set<String> set = new HashSet<String>();
    List<String> newList = new ArrayList<String>();
    for (Iterator<String> iter = arlList.iterator(); iter.hasNext();) {
    String element = iter.next();
    if (set.add( element))
    newList.add( element);
    }
    arlList.clear();
    arlList.addAll(newList);
    }


    @SuppressWarnings("serial")
    private static void testListSet(){
    List<String> arrays = new ArrayList<String>(){
    @Override
    public boolean add(String e) {
    for(String str:this){
    if(str.equals(e)){
    System.out.println("add failed !!! duplicate element");
    return false;
    }else{
    System.out.println("add successed !!!");
    }
    }
    return super.add(e);
    }
    };

    arrays.add("a");arrays.add("b");arrays.add("c");arrays.add("b");
    for(String e:arrays)
    System.out.print(e);
    }

    private static void EfficientRemoveDup(People[] peoples){
    //Object[] originalArray; // again, pretend this contains our original data
    int count =0;
    // new temporary array to hold non-duplicate data
    People[] newArray = new People[peoples.length];
    // current index in the new array (also the number of non-dup elements)
    int currentIndex = 0;

    // loop through the original array...
    for (int i = 0; i < peoples.length; ++i) {
    // contains => true iff newArray contains originalArray[i]
    boolean contains = false;

    // search through newArray to see if it contains an element equal
    // to the element in originalArray[i]
    for(int j = 0; j <= currentIndex; ++j) {
    // if the same element is found, don't add it to the new array
    count++;
    if(peoples[i].equals(newArray[j])) {

    contains = true;
    break;
    }
    }

    // if we didn't find a duplicate, add the new element to the new array
    if(!contains) {
    // note: you may want to use a copy constructor, or a .clone()
    // here if the situation warrants more than a shallow copy
    newArray[currentIndex] = peoples[i];
    ++currentIndex;
    }
    }

    System.out.println("efficient medthod inner count : "+ count);

    }

    private static People[] createObjectArray(int length){
    int num = length;
    People[] data = new People[num];
    Random random = new Random();
    for(int i = 0;i<num;i++){
    int id = random.nextInt(10000);
    System.out.print(id + " ");
    data[i]=new People(id, "i am a man");
    }
    return data;
    }

          测试结果:

    set end time -->  1326443326724
    set total time --> 26
    count : 3653
    Efficient start time --> 1326443326729
    efficient medthod inner count : 28463252
    Efficient end time --> 1326443327107
    Efficient total time --> 378
    count : 28463252






  • 相关阅读:
    Java(八)——面向对象(4)-抽象类与接口
    Java(七)——面向对象(3)-多态
    Java(六)——面向对象(2)-继承
    Java(五)——面向对象(1)-基础
    Java(四)——数组
    Java(三)——流程控制
    Java(二)——Java基础
    易忘小技巧--yum
    网络测速命令--speedtest
    大型网站架构技术读后感
  • 原文地址:https://www.cnblogs.com/slider/p/2320313.html
Copyright © 2020-2023  润新知