• facebook c++大会,博客直播


    facebook最近动作频频,不光开源了folly c++库,还专门组织了c++专场,有兴趣的可以参考: 

    I managed to snatch tickets for the Facebook C++ Conference. I will be live blogging the event all day. Hope you enjoy!

    20.49

    Summary

    Mainstream hardware is becoming permanently parallel, heterogeneous and distributed.
    - Moore’s Mine is ending, but we know the answer: Open a new mine.
    - As usual, the end of one wave overlaps the beginning of the next.
    The long-term free lunch app requires lots of juicy latent parallelism expressed in a form that can be spread across a machine with a variable number of cores of different kinds – big/small/specialized cores, and local and distributed cores.
    - The filet mignon of throughput gains is still on the menu, but now it costs extra effort – development, complexity, and testing.

    20.12

    Hardware is in motion! This talk is tougher to describe, as the presentation relies on many graphics.

    20.00

    We are observing the end of Moore’s Law. We can no longer rely on single-threaded performance increases, we need to go beyond multi-core to cloud-core and heterogeneous cores.

    19.52

    Trifecta
    - Multicore (2005-) e.g. iPad 2, Kindle Fire, etc..
    – Initial transition now complete in all mainstream form factors
    - Heterogeneous cores (2009-) e.g. compute-capable GPUs.
    – 100x and 1000x parallelism already available on home machines for apps that can harness the GPU.
    - Elastic compute cloud cores (HaaS, 2010-) e.g. AWS, Azure
    – Summer 2011

    19.50

    We are entering a Jungle! 2011-201x Put a heterogeneous supercomputer cluster on every desk, in every home and in every pocket.

    19.47

    Welcome to the Jungle by Herb Sutter

    19.35

    Short break before the next talk by Herb Sutter.

    19.30

    Summary
    Performance and Efficiency

    Critical for Technological Advancement
    enables Fundamentally New Products and Experiences
    Incremental Steps Add Up to Game-Changing Optimizations

    19.28

    Summary
    AtomicHashMap

    Atomic Operations are Slow
    - Remove Ordering Constraints if Possible
    Unique Set of Trade Offs
    - Not Right For All Applications

    19.18

    Reference Implementation
    Micro-Sharding

    - Fragment Data to Reduce Contention
    - Use Spin Locks Since Contention is Low

    19.13

    folly::AtomicHashMap
    Benchmarking

    - Simple
    - Isolate Important Components

    19.10

    Projecting Hash Values

    Hybrid Solution:
    - Make Array Optimal Size
    - Mask by Next Power of 2
    - If Outside the Array, Use Modulus

    19.09

    Projecting Hash Values

    Modulus is Slow
    - Integer Division
    Masking With Power of 2 Is Wasteful
    - nextPowTwo(1000 Entries/ 80% load factor = 1250)

    19.06

    Approach

    Increments Cached Per Thread
    - Periodically Flushed Atomically
    - Traverse All Threads for Exact Value

    New ThreadLocal Class
    - Fast
    - Provides Iterator Across All Threads

    19.01

    folly::ThreadLocal
    Implementation

    Store Arrays of Pointers with __thread
    Unique Index Per ThreadLocal Instance
    User Mutex on ctor/dtor/iterate
    - One per “Tag”
    About 4x Faster than boost::thread_specific_ptr

    18.58

    Tracking Size

    Obvious Solution : Atomic Increment
    - But wait! 50M inc/sec < 100 insert/sec
    - Atomic Increment is a Bottleneck

    Try again: ThreadCachedInt

    Sharded Atomic Increment
    - Requires a Lot of Shards
    Thread Local Storage
    - Better with New ThreadLocal Class

    18.56

    Remaining Issues

    How to Track Size?
    Modulus is Slow

    18.54

    Erase

    1. Do a Find
    2. Miss -> Failed Erase
    3. Found ->compxchg to Empty Sentinel Key

    Empty -> Locked -> Valid -> Erased

    18.50

    Growth

    1. Fill to Max Load Factor
    - Can’t Rehash

    2. Create Another SubMap

    3. Re-Project and Continue Probing

    18.43

    Atomic Probing Algorithm

    Always in a Consistent State
    - find() has No Memory Barriers

    insert() Uses Keys for Locking
    - Very Low Contention
    - No Additional Storage

    18.41

    Probing Algorithm
    - Project Hash Onto Array
    - Advance Until Equal or Empty
    - Wrap Around

    18.39

    New Approach: AtomicHashMap

    folly:AtomicHashMap
    Result Preview
    - Blistering Speed (1.3x-5x)
    - Memory Efficient + Compact References
    - std::unordered_map Interface

    Except…
    - Only Integer Keys
    - Sensitive to Initialization Size
    - Erased Entities Never Reclaimed

    18.36

    Graph Storage and Access
    Possible Approaches

    Micro-Sharding
    Layered Maps
    Intel tbb:concurrent_hash_map
    Buil Something New

    18.36

    Graph Storage and Access
    Considerations

    - CPU Bottleneck
    - Memory Sensitivity
    - Many Cores Accessing the Same Data
    - Updated in Real-Time

    18.35

    What’s it look like?

    Huge trove of data, many users have 1M 2nd degree connections
    Data and Access Highly Dynamic, live updates, new products
    Bottom line: we need performance

    18.33

    Facebook: The Graph?

    What’s it Good For?
    News feed, what’s going on in the action graph.
    Search, find people and things
    Ads, Groups, etc.

    18.32

    What should we do about performance? Challenge assumptions, hardware tradeoffs are changing, talk and share code.

    18.30

    At facebook: incremental changes add up, speed is critical for virality, help the environment -> performance is very important.

    18.29

    Massively Parallel Hashmaps: Spencer Ahrens

    Why do we care about hashmap performance?

    18.19

    Short break between talks, will be back!

    18.15

    folly: https://github.com/facebook/folly

    18.14

    Folly components:

    1. Containers
    2. Parsing and formatting
    3. Multithreading helpers
    4. General utilities
    5. More to come

    18.13

    fbstring will be open-sourced!

    folly: Facebook’s Open-Source Llibrary

    18.13

    Andrei emphasizes that you should learn math, cs people really need to know math.

    18.06

    fbstring speedup – 2.53 times faster with ILP and loop unrolling

    18.02

    modulo subtraction is injective

    17.55

    ILP speedup to atoul gives a 193% speedup over atoul.

    17.53

    associative means parallelizable

    17.53

    Minimizing data dependencies

    “5235″ -> 5235! string to integral (ids are numbers stored as strings)

    string atoui -> loops each char and converts to the appropriate digit… see atoui implementations

    fbstring actually takes advantage of loop unrolling when converting strings to ints.

    17.45

    Better instruction level parallelism = fewer data dependencies

    17.45

    Instruction Level Parallelism

    1. Pipelining
    2. Superscalar execuion
    3. OoO execution
    4. Register renaming
    5. Speculative execution

    17.43

    The Forgotten Parallelism

    People forget about CPU parallelism which could be taken advantage of. Each core in facebook server, nehalem cpu, has 3 ALUs which are forgotten. You can do three simultaneous arithmetic calculations in a single thread! (SandyBridge has more than 3 ALUs). How do we use them?

    17.39

    Some Benchmarks

    push_back speedup vs std::string is huge compared to the standard implementation. 400% speedup for 1 push_back call, 500% for 2 push back calls, less than 200% for 256k push_back calls. Plateau is 1.4 times faster than standard implementation.

    17.34

    The French Connection: jemalloc

    Jason Evans wrote a great allocator for FreeBSD and it’s used everywhere at facebook.

    jemalloc
    1. Highly concurrent
    2. Cache-friendly
    3. Fast

    Extended allocator interface
    1. Allocate in allocator-sized slabs
    2. In-place expansion where possible

    fbstring solves fragmentation by knowing what jemalloc wants, the string will use what jemallocs wants to allocate its internal representation.

    jemalloc allows reallocation in place – e.g. a large string built in a loop will be improved and occasional quadratic algorithms become linear! woot!

    17.27

    23 >> 15 (23 is the max chars in a fbstring, 15 is what other libraries use for string)

    23, 15 is not a big deal… actually it is because at facebook they have many protocols that store ints as strings, many ints are smalls but not all! User ids at facebook are randomly generated, 64 bits. Almost all these ids don’t fit in 15 chars… thus will spill into allocated memory but with fbstring none will spill and thus be highly optimized.

    17.24

    fbstring has control. fbstring lets them take full control over where data goes, optimizes memory allocation. The performance impact is huge.

    Andrei is making fun of Python/Java programmers for not knowing what happens in memory.

    17.22

    fbstring data layout

    Small
    hello, world! slash0 | 9 (9 is the number of bytes unused, allows them to append quickly and the last byte is used as a null terminator when there are no more bytes unused)

    Medium/Large
    char* data | size_t length | size_t capacity | FF
    (if FF is all ones then we will consider the string in this format otherwise in the small format.

    Large
    atomic refCount | hello slash0

    17.17

    fbstring’s 3-tiered layout

    1. In-place store for 0..23 chars
    2. Eagerly copied values 24..255 chars
    3. Copy-on-write 256 chars and above

    17.16

    We are live again! Andrei is speaking, the title of the talk is Sheer Folly.

    C++, why do we care? Why do we still care about system languages.
    1. Data Layout Control – How data sits in memory? C++ allows you to literally tell where ever byte goes. Example: fbstring. Hang on, let’s not invent the wheel here but we did… demotivation for std string : 1 – std::string as a container is fail – 2 – std::allocator is fail – 3 – std::char_traits is fail – 4 – algorithm integration is fail – 5 – copy-on-write became fail… motivation: 1 – because its there which lets us take existing code and link against fbstring

    2. Cooperation with the allocator

    3. Copy on write optimization

    16.06

    Lunch Break! Following lunch, Andrei Alexandrescu will be making an announcement of interest to C++ programmers around the world. Stay tuned.

    16.02

    Slides from the talk will become available available after the event.

    16.00

    Summary

    - Perfect forwarding fails for some kinds of arguments
    - To “specialize” forwarding templates, use helper class templates or std:enable_if
    - templates + pImple => explicitly force instantiations.

    More information on comp.std.c++ discussion initiated 16 January 2010

    15.58

    VC11 compiler rejects the explicit instantiation. MS has confirmed two compiler bugs but a workaround exists.

    The specialization of the size of the const char, we can fix that but it is left as homework to the attendee.

    15.56

    All you need to do is take a look at the g++ error message, copy it into a source file and this will explicitly instantiate the template but you have to double check the type deduced by the compiler because it may not be the same as the type passed into the function. But g++ is so self aware that it will actually compile/link/run. :D

    15.54

    Explicit Instantiation

    The templates again:

    1
    2
    template&lt;typename T&gt; Widget::Widget(T&amp;&amp; param) // forwarding ctor
    template&lt;typename T&gt; void Widget::setName(T&amp;&amp; param) // forwarding setting

    The uses again:

    1
    2
    3
    Widget w(&quot;This is a test&quot;)
    w.setName(std::string(&quot;This is another test&quot;));
    w.setName(s);

    Instantionals we need:
    Widget ctor with T= const char(&)[15)
    setName with T= std::sring and also T = std::string&

    15.52

    Forwarding and the pImpl Idiom. Pimplemization is straight forward and compiles without fuss on client code, but the linker disagrees and can’t link! Ha! The usual template instantiation problem: compilers need template source for instantiation. Pimple precludes that.
    Possible solutions:
    - abandon pImpl, i.e. admit defeat (!)
    - use different build chain -> some use link-time instantiation
    - perform explicit instantiation

    15.48

    This is the first adventure in perfect forwarding, how do you deal with specializing for different types without messing with the universal binding, syntax T&&, forced on you.

    15.46

    Specializing for shared_ptr does not specialize for const shared_ptr. To catch both we can add another partial specializing.

    15.45

    Random comment from the crowd: I love being able to write >> in the new C++11 without the space. Scott Meyer is envious that someone can be so happy about that.

    15.44

    We definitely want to have a specialization for shared_ptrs.

    15.44

    Specializing for pointers unrealistic – rarely useful to distinguish pointer lvalues and rvalues.

    Specializing for UDTS very realistic.
    - Often useful to distinguish lvalues and rvales
    - std::shared_ptr is such a UDT. Moving cheaper than copying.

    15.42

    Dispatching through a Class Template … enables use of class template partial specialization allows us to avoid the enable_if problem.

    15.38

    Too much code on the slide but there are several drawbacks to enable_if, especially when we start using expressions such as const char*

    15.37

    One solution: overload via std::enable_if

    Below code is generalization…

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    template&lt;typename T&gt;
    typename std::enable_if&lt;
        !std::is_pointer&lt;
            typename std::remove_reference&lt;T&gt;::type
        &gt;::value
    &gt;::type
    fwd(T&amp;&amp; param)
    {
      f(std::forward&lt;T&gt;(param))
    }
     
    template&lt;typename T&gt;
    typename std::enable_if&lt;
        std::is_pointer&lt;
            typename std::remove_reference&lt;T&gt;::type
        &gt;::value
    &gt;::type
    fwd(T&amp;&amp; param)
    {
      f(std::forward&lt;T&gt;(param))
    }

    15.28

    !!!
    Function templates can’t be specialized, only overloaded. But the form of forwarding templates is constrained, the only parameter type that works is T&&!. Works = handles all combos of const/non-consts expressions.

    15.27

    “Specializng” forward templates

    Forwarding to f is easy

    1
    2
    3
    template&lt;typename T&gt; void fwd(T&amp;&amp; param){
    f(std::forward&lt;T&gt;(param));
    }

    but what if we want to specialize our template for pointers?

    15.26

    Several kinds of arguments can’t be perfect-forwarded.
    * 0 as null pointer constant
    * braced initializer lists
    * integral consts…

    15.25

    std::make_shared is a function that will return a shared pointer and will typically more efficient than creating your object and than wrapping it in a shared pointer.

    std::make_shared:

    1
    2
    3
    4
    5
    6
    7
    std::string s(&quot;PF example&quot;);
    auto pw = std::make_shared&lt;Widget&gt;(s, std:vector&lt;int&gt;({1,2,3}));
     
    template&lt;typename T, typename.. ParamTs&gt; shared_ptr&lt;T&gt; make_shared...
     
    // in g++
    this will go through several layers, and thus perfect forwarding becomes extremely desirable.

    15.17

    Perfect forwarding is our main interest. Perfect forwarding allows you to pass params straight through templates, we have a common use case, when we build constructors in templated classes. This is when we have problems with the new move construct… perfect forwarding is difficult because we will have to be careful with rvalues and rvalue references.

    15.13

    In order to support moves in C++11, we have a new construct called rvalue references. New syntax type&& will give you rvalue references. Rvalue references identify candidates for moving versus copying.

    Must be careful, in function templates type&& are not really rvalue references. They mean “bind to anything” -> universal references.

    Example:

    1
    2
    3
    4
    template&lt;typename T&gt; void f(T&amp;&amp; param);
    int x;
    f(x); //lvalue: T is int&amp;, instantiates f(int&amp;)
    f(factory()); // rvalue: T is std::shared_ptr&lt;Widget&gt;, instantiates f(std::shared_ptr&lt;Widget&gt;&amp;&amp;)

    15.07

    Scott will be speaking about C++11. Lvalues are generally expressions that take an address and they are important for taking copies, copy requests. In C++11, copy requests can often become move requests, the langauge is now taking more care into the difference between l and r values.

    15.05

    C++ is seeing a new growth because we can’t get faster processors very much anymore. Scott will now give his talk.

    15.04

    Opening remarks starting, Andrei Alexandrescu is our MC, hosting the first #fb #cpp #conf. There is a great demand and knowledge for C++ and facebook is very excited about this new growth in interest.

    14.46

    Slides are coming up: Adventures in Perfect Forwarding by Scott Meyers. Opening remarks are expected to be on time. #fb #cpp #conf

    14.31

    Blinds are going down here at the fb cpp conf. Stage is being prepared. 30 minutes to go before the opening remarks. Scott Meyer’s talk will follow.

    14.13

    48 minutes to go before opening remarks, the conference hall is slowly filling up.

    13.59

    The Facebook conference registration will be starting at 11. The’ve got sweet nametags.

    1
    2
    3
    [&amp;facebook]() {
        return std::move(fast) &amp; break_things();
    }
  • 相关阅读:
    (转载)C++内存分配方式详解——堆、栈、自由存储区、全局/静态存储区和常量存储区
    网络编程前奏(三)
    Linux学习之第三课时--Linux目录
    Linux学习之第二课时--linux命令格式及命令概述
    Linux学习之第一课时--linux前生今世
    python学习之第十四课时--基本数据练习
    python学习之第十三课时--其他数据类型,其他
    python学习之第十二课时--基本数据类型(set)
    python学习之第十一课时--基本数据类型(dict)
    python学习之第十课时--基本数据类型(tuple)
  • 原文地址:https://www.cnblogs.com/inteliot/p/2534536.html
Copyright © 2020-2023  润新知