• NSArray enumeration performance examined 不及格的程序员


    NSArray enumeration performance examined

    One day, I was thinking about NSArray enumeration (also called iteration): since Mac OS X 10.6 and iOS 4, there's the wonderful new world of blocks, and with it came enumerateObjectsUsingBlock:. I assumed it must be slower than fast enumeration (for (object in array) { ... }) due to overhead, but I didn't know for sure and thus decided to actually measure the performance.

    Which kinds of enumeration do exist?

    Basically, we have four kinds of enumeration available (see also Mike Ash's Friday Q&A 2010-04-09: Comparison of Objective-C Enumeration Techniques).

    1. objectAtIndex: enumeration

      Using a for loop which increases an integer and querying the object using [myArray objectAtIndex:index] is the most basic form of enumeration.

      NSUInteger count = [myArray count];
      for (NSUInteger index = 0; index < count ; index++) {
          [self doSomethingWith:[myArray objectAtIndex:index]];
      }
    2. NSEnumerator

      This is a form of external iteration[myArray objectEnumerator] returns an object. This object has a method nextObject that we can call in a loop until it returns nil

      NSEnumerator *enumerator = [myArray objectEnumerator];
      id object;
      while (object = [enumerator nextObject]) {
          [self doSomethingWith:object];
      }
    3. NSFastEnumerator

      The idea behind fast enumeration is to use fast C array access to optimize iteration. Not only is it supposed to be faster than traditional NSEnumerator, but Objective-C 2.0 also provides a very concise syntax.

      id object;
      for (object in myArray) {
          [self doSomethingWith:object];
      }
    4. Block enumeration

      Available since the introduction of blocks, this allows to iterate an array with blocks. Its syntax isn't as nice as fast enumeration, but there is one very interesting feature: concurrent enumeration. If enumeration order is not important and the jobs can be done in parallel without locking, this can provide a considerable speedup on a multi-core system. More about that in the concurrent enumeration section.

      [myArray enumerateObjectsUsingBlock:^(id object, NSUInteger index, BOOL *stop) {
          [self doSomethingWith:object];
      }];
      [myArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
          [self doSomethingWith:object];
      }];

    Linear enumeration

    First, let's discuss linear enumeration: one item after the other.

    Graphs

    NSArray enumeration performance, Mac OS X, logarithmic scale
    NSArray enumeration performance, Mac OS X, linear scale
    NSArray enumeration performance, iOS, logarithmic scale
    NSArray enumeration performance, iOS, linear scale

    Conclusions

    Somewhat surprisingly, NSEnumerator is even slower than using objectAtIndex:. This is true for both Mac OS X and iOS. I suspect this is due to the enumerator checking whether the array was modified with each iteration. Somewhat unsurprisingly, fast enumeration holds up to its name and is the fastest solution.

    For small arrays, block enumeration is a bit slower than objectAtIndex:, but with more elements in the array the performance becomes almost as fast as fast enumeration.

    The difference between fast enumeration and NSEnumeration is already quite big with a million entries: on the iPhone 4S, the former took about 0.037 seconds while the later took about 0.140 seconds. That's already a factor of 3.7.

    An oddity

    The very first time an NSArray is allocated in a program and the very first time an enumerator is requested with objectEnumerator it takes an unusual long time to complete. For example, to allocate an array with one element the median was 415 nanoseconds on my 2007 MacBook Pro 17". But the very first time an array was allocated it take more than 500,000 nanoseconds, sometimes even up to 1,000,000 nanoseconds! The same with querying the enumerator: instead of the median of 673 nanoseconds it took also took more than 500,000 nanoseconds.

    I can only guess about the reason, but I suspect lazy loading is to blame. You probably won't notice this in a real application because by the time your code is running Cocoa or Cocoa Touch probably already has created an array.

    Concurrent enumeration

    Block enumeration offers the option to enumerate the objects concurrently, if possible. This means the work load is potentially spread over several CPU cores. Not every operation to be done while enumerating can be parallelized, so concurrent enumeration only makes sense if no locking is required within the block: either because each operation really is absolutely independent or because atomic operations can be used (like OSAtomicAdd32 and friends).

    So, how well does it perform compared to the other enumeration methods?

    The graphs

    NSArray enumeration performance with concurrency, Mac OS X, logarithmic scale
    NSArray enumeration performance with concurrency, Mac OS X, linear scale
    NSArray enumeration performance with concurrency, iOS, logarithmic scale
    NSArray enumeration performance with concurrency, iOS, linear scale

    Conclusions

    For small number of elements, the concurrent enumeration is by far the slowest method. This probably has to do with the additional work that needs to be done preparing the array for concurrent access and starting the threads (I don't know whether GCD or "traditional" threading is used and it doesn't matter; it's an implementation detail we shouldn't need to care about).

    However, if the array is large enough all of a sudden concurrent enumeration becomes the fastest method, just as expected. On the iPhone 4S and a million entries, it took about 0.024 seconds to enumerate with concurrent enumeration, but 0.036 seconds with fast enumeration. By contrast, NSEnumeration took 0.139 seconds for the same array! That's already a pretty big difference, a factor of 5.7.

    On my office 2011 iMac 24" with a Core i7 quad-core CPU, the million entries where enumerated concurrently in 0.0016 seconds. Fast enumeration of the same array took 0.0044 seconds and NSEnumeration 0.0093 seconds. That's a factor of 5.8 which is pretty close to the iPhone 4S results. I was expecting a bigger difference here, though. On my 2007 MacBook Pro with Core2 Duo dual-core CPU, the factor was "just" 3.7 here.

    The threshold when concurrent enumeration became useful was somewhere between 10,000 and 50,000 elements in my tests. With less elements, just go with normal block iteration.

    Allocation

    I also wanted to know whether the performance is any different depending on how the array was created. I tested two different methods:

    1. Create a C array which references the object instances and create the array using initWithObjects:count:.
    2. Create a NSMutableArray and subsequently add objects using addObject:.

    There's no difference when iterating, but there is a difference when allocating: the initWithObjects:count: method is faster. With a very large number of objects, this difference can become significant. Here's an example on how to create an array with NSNumbers:

    NSArray *generateArrayMalloc(NSUInteger numEntries) {
            id *entries;
            NSArray *result;
            
            entries = malloc(sizeof(id) * numEntries);
            for (NSUInteger i = 0; i < numEntries; i++) {
                    entries[i] = [NSNumber numberWithUnsignedInt:i];
            }
        
            result = [NSArray arrayWithObjects:entries count:numEntries];
        
            free(entries);
            return result;
    }

    NSArray allocation performance, Mac OS X, logarithmic scale
    NSArray allocation performance, Mac OS X, linear scale
    NSArray allocation performance, iOS, logarithmic scale
    NSArray allocation performance, iOS, linear scale

    How did I measure?

    You can download the test application to see how I've measured. Basically I'm measuring how long it takes to iterate an array without actually doing anything else, and repeat that 1000 times. For the graph, the median for each array size was taken. Compilation was done with optimization turned off (-O0). For iOS, testing was done with an iPhone 4S. For Mac OS X, I used my 2007 MacBook Pro 17" and my office 2011 iMac 24". The graphs for Mac OS X show the results of the iMac, but on the MacBook Pro the graphs looked similar, it just was slower.


     

    What's an effective and great way to compare all the values of NSArray that contains NSNumbersfrom floats to find the biggest one and the smallest one?

    Any ideas how to do this nice and quick in objective-c?

    shareeditflag
     
        
       
    There is no such thing as an NSArray that contains floats. It would have to be an NSArray that contains NSNumbers. – matt Apr 10 '13 at 18:43
        
       
    @matt Thanks! I've corrected the question name – Sergey Grischyov Apr 10 '13 at 22:11 

    5 Answers

    up vote122down voteaccepted

    If execution speed (not programming speed) is important, then an explicit loop is the fastest. I made the following tests with an array of 1000000 random numbers:

    Version 1: sort the array:

    NSArray *sorted1 = [numbers sortedArrayUsingSelector:@selector(compare:)];
    // 1.585 seconds
    

    Version 2: Key-value coding, using "doubleValue":

    NSNumber *max=[numbers valueForKeyPath:@"@max.doubleValue"];
    NSNumber *min=[numbers valueForKeyPath:@"@min.doubleValue"];
    // 0.778 seconds
    

    Version 3: Key-value coding, using "self":

    NSNumber *max=[numbers valueForKeyPath:@"@max.self"];
    NSNumber *min=[numbers valueForKeyPath:@"@min.self"];
    // 0.390 seconds
    

    Version 4: Explicit loop:

    float xmax = -MAXFLOAT;
    float xmin = MAXFLOAT;
    for (NSNumber *num in numbers) {
        float x = num.floatValue;
        if (x < xmin) xmin = x;
        if (x > xmax) xmax = x;
    }
    // 0.019 seconds
    

    Version 5: Block enumeration:

    __block float xmax = -MAXFLOAT;
    __block float xmin = MAXFLOAT;
    [numbers enumerateObjectsUsingBlock:^(NSNumber *num, NSUInteger idx, BOOL *stop) {
        float x = num.floatValue;
        if (x < xmin) xmin = x;
        if (x > xmax) xmax = x;
    }];
    // 0.024 seconds
    

    The test program creates an array of 1000000 random numbers and then applies all sorting techniques to the same array. The timings above are the output of one run, but I make about 20 runs with very similar results in each run. I also changed the order in which the 5 sorting methods are applied to exclude caching effects.

    Update: I have now created a (hopefully) better test program. The full source code is here: https://gist.github.com/anonymous/5356982. The average times for sorting an array of 1000000 random numbers are (in seconds, on an 3.1 GHz Core i5 iMac, release compile):

    Sorting      1.404
    KVO1         1.087
    KVO2         0.367
    Fast enum    0.017
    Block enum   0.021
    

    Update 2: As one can see, fast enumeration is faster than block enumeration (which is also stated here: http://blog.bignerdranch.com/2337-incremental-arrayification/).

    EDIT: The following is completely wrong, because I forgot to initialize the object used as lock, as Hot Licks correctly noticed, so that no synchronization is done at all. And with lock = [[NSObject alloc] init]; the concurrent enumeration is so slow that I dare not to show the result. Perhaps a faster synchronization mechanism might help ...)

    This changes dramatically if you add the NSEnumerationConcurrent option to the block enumeration:

    __block float xmax = -MAXFLOAT;
    __block float xmin = MAXFLOAT;
    id lock;
    [numbers enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(NSNumber *num, NSUInteger idx, BOOL *stop) {
        float x = num.floatValue;
        @synchronized(lock) {
            if (x < xmin) xmin = x;
            if (x > xmax) xmax = x;
        }
    }];
    

    The timing here is

    Concurrent enum  0.009
    

    so it is about twice as fast as fast enumeration. The result is probably not representative because it depends on the number of threads available. But interesting anyway! Note that I have used the "easiest-to-use" synchronization method, which might not be the fastest.

    shareeditflag
     
    1  
       
    often block enumeration is faster than fast enumeration. did u try that? how often did u execute your examples? – vikingosegundo Apr 10 '13 at 16:50
        
       
    @vikingosegundo: I have updated the answer. Block enumeration is slower than fast enumeration here. – Martin R Apr 10 '13 at 17:00 
        
       
    How often did u execute the codes? one or few runs wont give you a good result. – vikingosegundo Apr 10 '13 at 17:01
        
       
    So explicit loop is fastest!!! Quite unbelievable – Anoop Vaidya Apr 10 '13 at 17:05 
        
       
    @vikingosegundo: I will improve the test program to determine the average execution time for each method, and then report the result later. – Martin R Apr 10 '13 at 17:06 
  • 相关阅读:
    【计算几何】多边形交集
    【计算几何】点在多边形内部
    【计算几何】线段相交
    【计算几何】多边形点集排序
    【JavaScript学习】JavaScript对象创建
    【CUDA学习】内核程序调试
    【CUDA学习】共享存储器
    【CUDA学习】全局存储器
    Charles是Mac的Fiddler抓包工具
    Charles是mac的iddler抓包工具
  • 原文地址:https://www.cnblogs.com/ioriwellings/p/2886384.html
Copyright © 2020-2023  润新知