layout: true name: blank styling: styling.css styling-by: Martin Weitzel .stylehint[ Styled with [{{styling}}]({{styling}}) by {{styling-by}} ] --- layout: true name: plain copyright: (CC) BY-SA branding: [Dipl.-Ing. Martin Weitzel](http://tbfe.de) customer: [für MicroConsult Training & Consulting GmbH](http://microconsult.de) {{header}} .pagefooter[ {{copyright}}: {{branding}} {{customer}} .microconsult-logo[] ] --- layout: true name: linkinfo copyright: (CC) BY-SA branding: [Dipl.-Ing. Martin Weitzel](http://tbfe.de) customer: [für MicroConsult Training & Consulting GmbH](http://microconsult.de) {{header}} .infographic[ [](InfoGraphics/{{graphic}}.png "Click to open – add [CTRL+] SHIFT for new [tabbed] window") ] .pagefooter[ {{copyright}}: {{branding}} {{customer}} .microconsult-logo[] ] --- layout: true name: withinfo copyright: (CC) BY-SA branding: [Dipl.-Ing. Martin Weitzel](http://tbfe.de) customer: [für MicroConsult Training & Consulting GmbH](http://microconsult.de) {{header}} .infolink.right[ [Click here for Info-Graphic {{graphic}}](InfoGraphics/{{graphic}}.png "add [CTRL+] SHIFT for own [tabbed] window") {{section}} ] .pagefooter[ {{copyright}}: {{branding}} {{customer}} .microconsult-logo[] ] --- layout: false template: blank [C++ STL]: 00_topics.html#agenda # [C++ STL] Performance and Memory-Footprint ------------------------------------------------------------------------------- * [Estimating Performance (Basics) ](#performance_estimation_basics) * [Measuring Performance ](#measuring_performance) * [Estimating Memory-Footprint (Basics) ](#footprint_estimation_basics) * [Measuring Memory-Footprint ](#measuring_footprint) * [Frequently Applied Improvements ](#frequently_applied_improvements) ------------------------------------------------------------------------------- .center[**NOTE**] This is an optional part with topics loosely connected to the main part of this course, which put the focus on C++ STL Containers and Algorithms. The topics in this part have been moved here as they are still of some interest – if time allows – but might distract from the general picture if covered in the main parts. --- template: plain name: performance_estimation_basics header: ### Estimating Performance (Basics) Performance estimations may come from a theoretical analysis, * either based on heuristics on * expensive operations vs. * cheap operations * or on an assembler code analysis * adding the clock cycles for the instructions to be executed. .N[ It is clear that the first approach leaves room for vague assumptions and hence can give only a first idea, but is not very accurate. ] But the second approach too has its difficulties for common CPU designs with several levels of on-chip memory caches, making algorithms with a good cache hit rate more efficient, often by an order of magnitude or even more.._[] .F[: E.g. the linear traversal of an `std::array` or an `std::vector` will typically be much faster compared to traversal of an `std::forward_list`, an`std::list`, or even an `std::unordered_set`, as only the hash-buckets are stored in adjacent memory but the actual data is in a linked list (per hash bucket) and hence spread out over memory. Of course, from the theoretical point of view both have O(N) performance … only that the "Big-O" is much smaller for data structure in contiguous memory. ] --- template: plain name: measuring_performance header: ### Measuring Performance The alternative approach is to determine performance through a series of measurements, * either based on the central taken into considered for solving the problem data structures in isolation. * or based on "the real thing", i.e. the whole application * potentially instrumented to get performance data of the parts of interest, or * by comparing several competing solutions. .N[ Great care must be taken in the first case, not to draw conclusions from the parts scrutinized in isolation, that do not extrapolate to the whole. ] To get results applicable to "real-life" usage of an application, also care must be taken to do the analysis in a real-life context with real-life data.._[] .F[: E.g. register assignment for frequently accessed variables may be quite different for an algorithm tested in isolation, compared to the same algorithm linked with a large program. Also sorting step might perform much worse for "nearly sorted data" compared to random data, so evaluating it with random data makes no sense if this is not typical for the real employment. ] --- template: plain name: footprint_estimation_basics header: ### Estimating Memory-Footprint (Basics) [Undefined Behaviour]: http://en.cppreference.com/w/cpp/language/ub Estimating memory footprint on a theoretical base can be rather exact. The only pitfalls are: * Not taking into regard aggressive compiler optimization. * Not considering possible clever tricks, applied by an implementation, maybe with compile time meta programming. .N[ Also the contrary is possible: Underestimating the complexity of a general solution and assuming "too simple" data structures. ] Note that the STL is generally specified in a way that allow extremely efficient solutions for the problems at hand (e.g. container iterators). .W[ This is even true if safety is traded in for speed, though [Undefined Behaviour] is often slightly misunderstood: it is no hindrance for the implementor of a C++ translation system to generate extra tests, not **mandated** by the Standard. ] --- template: plain name: measuring_footprint header: ### Measuring Memory-Footprint Measuring footprint is often possible with very simple means: * Applying the `sizeof` operation to some class * directly tells the size it uses in memory, * including any amount of padding necessary for alignment. * For dynamically allocated data there is * typically a slight overhead with respect to additional memory required, * but the runtime performance degradation is often much more pressing to avoid unnecessary heap usage. --- template: plain name: frequently_applied_improvements header: ### Frequently Applied Improvements [Stateless Allocators]: http://stackoverflow.com/questions/12545072/what-does-it-mean-for-an-allocator-to-be-stateless [Small Buffer Optimization]: https://akrzemi1.wordpress.com/2014/04/14/common-optimizations/#sbo Some frequent improvements, also applied in modern STL implementations, are: * Selecting a more efficient data structure – i.e. purging member data – when possible. * This is e.g. the case for [Stateless Allocators]. * The [Small Buffer Optimization] (or SBO) i.e. for data structure using heap allocation * to store small amounts of data in the holder object * instead of pointers (as done in the general case). * Lazy evaluation or transparently avoiding copies (in the hope such will finally turn out to be dispensable). * Transparently splitting the work to a number of threads.._[] .F[: In fact, the specification of some STL algorithms effectively gives that much leeway, but of course for CPU-bound tasks this pays only on multi-core or hyper-threaded hardware architectures. ]