Optimizing software in CAn optimization guide for Windows, Linux, and MacplatformsBy Agner Fog. Technical University of Denmark.Copyright © 2004 - 2019. Last updated 2019-10-18.Contents1 Introduction ....................................................................................................................... 31.1 The costs of optimizing ............................................................................................... 42 Choosing the optimal platform........................................................................................... 52.1 Choice of hardware platform ....................................................................................... 52.2 Choice of microprocessor ........................................................................................... 62.3 Choice of operating system......................................................................................... 62.4 Choice of programming language ............................................................................... 72.5 Choice of compiler...................................................................................................... 92.6 Choice of function libraries........................................................................................ 112.7 Choice of user interface framework........................................................................... 132.8 Overcoming the drawbacks of the C language...................................................... 143 Finding the biggest time consumers ................................................................................ 153.1 How much is a clock cycle? ...................................................................................... 153.2 Use a profiler to find hot spots .................................................................................. 163.3 Program installation .................................................................................................. 183.4 Automatic updates .................................................................................................... 183.5 Program loading ....................................................................................................... 183.6 Dynamic linking and position-independent code ....................................................... 193.7 File access................................................................................................................ 193.8 System database ...................................................................................................... 193.9 Other databases ....................................................................................................... 203.10 Graphics ................................................................................................................. 203.11 Other system resources.......................................................................................... 203.12 Network access ...................................................................................................... 203.13 Memory access....................................................................................................... 213.14 Context switches..................................................................................................... 213.15 Dependency chains ................................................................................................ 213.16 Execution unit throughput ....................................................................................... 214 Performance and usability ............................................................................................... 225 Choosing the optimal algorithm ....................................................................................... 236 Development process...................................................................................................... 247 The efficiency of different C constructs........................................................................ 257.1 Different kinds of variable storage............................................................................. 257.2 Integers variables and operators............................................................................... 287.3 Floating point variables and operators ...................................................................... 317.4 Enums ...................................................................................................................... 337.5 Booleans................................................................................................................... 337.6 Pointers and references............................................................................................ 357.7 Function pointers ...................................................................................................... 377.8 Member pointers....................................................................................................... 377.9 Smart pointers .......................................................................................................... 377.10 Arrays ..................................................................................................................... 387.11 Type conversions.................................................................................................... 407.12 Branches and switch statements............................................................................. 437.13 Loops...................................................................................................................... 4427.14 Functions ................................................................................................................ 477.15 Function parameters ............................................................................................... 497.16 Function return types .............................................................................................. 507.17 Function tail calls .................................................................................................... 517.18 Recursive functions................................................................................................. 517.19 Structures and classes............................................................................................ 527.20 Class data members (instance variables) ............................................................... 537.21 Class member functions (methods)......................................................................... 547.22 Virtual member functions ........................................................................................ 557.23 Runtime type identification (RTTI)........................................................................... 557.24 Inheritance.............................................................................................................. 557.25 Constructors and destructors .................................................................................. 567.26 Unions .................................................................................................................... 567.27 Bitfields ................................................................................................................... 577.28 Overloaded functions .............................................................................................. 577.29 Overloaded operators ............................................................................................. 587.30 Templates............................................................................................................... 587.31 Threads .................................................................................................................. 617.32 Exceptions and error handling ................................................................................ 627.33 Other cases of stack unwinding .............................................................................. 667.34 Propagation of NAN and INF .................................................................................. 667.35 Preprocessing directives......................................................................................... 677.36 Namespaces........................................................................................................... 678 Optimizations in the compiler .......................................................................................... 678.1 How compilers optimize ............................................................................................ 678.2 Comparison of different compilers............................................................................. 758.3 Obstacles to optimization by compiler....................................................................... 798.4 Obstacles to optimization by CPU............................................................................. 838.5 Compiler optimization options ................................................................................... 838.6 Optimization directives.............................................................................................. 848.7 Checking what the compiler does ............................................................................. 869 Optimizing memory access ............................................................................................. 899.1 Caching of code and data ......................................................................................... 899.2 Cache organization................................................................................................... 899.3 Functions that are used together should be stored together...................................... 909.4 Variables that are used together should be stored together ...................................... 909.5 Alignment of data...................................................................................................... 929.6 Dynamic memory allocation...................................................................................... 929.7 Data structures and container classes ...................................................................... 959.8 Strings .................................................................................................................... 1029.9 Access data sequentially ........................................................................................ 1029.10 Cache contentions in large data structures ........................................................... 1039.11 Explicit cache control ............................................................................................ 10510 Multithreading.............................................................................................................. 10710.1 Simultaneous multithreading................................................................................. 10911 Out of order execution................................................................................................. 11012 Using vector operations............................................................................................... 11212.1 AVX instruction set and YMM registers ................................................................. 11412.2 AVX512 instruction set and ZMM registers ........................................................... 11412.3 Automatic vectorization......................................................................................... 11512.4 Using intrinsic functions ........................................................................................ 11812.5 Using vector classes ............................................................................................. 12112.6 Transforming serial code for vectorization............................................................. 12612.7 Mathematical functions for vectors........................................................................ 12812.8 Aligning dynamically allocated memory................................................................. 12912.9 Aligning RGB video or 3-dimensional vectors ....................................................... 12912.10 Conclusion.......................................................................................................... 12913 Making critical code in multiple versions for different instruction sets........................... 131313.1 CPU dispatch strategies........................................................................................ 13213.2 Model-specific dispatching.................................................................................... 13313.3 Difficult cases........................................................................................................ 13413.4 Test and maintenance .......................................................................................... 13513.5 Implementation ..................................................................................................... 13613.6 CPU dispatching at load time in Linux................................................................... 13813.7 CPU dispatching in Intel compiler ......................................................................... 13914 Specific optimization topics ......................................................................................... 14114.1 Use lookup tables ................................................................................................. 14114.2 Bounds checking .................................................................................................. 14414.3 Use bitwise operators for checking multiple values at once................................... 14514.4 Integer multiplication............................................................................................. 14614.5 Integer division...................................................................................................... 14714.6 Floating point division ........................................................................................... 14914.7 Do not mix float and double .................................................................................. 15014.8 Conversions between floating point numbers and integers ................................... 15014.9 Using integer operations for manipulating floating point variables ......................... 15114.10 Mathematical functions ....................................................................................... 15514.11 Static versus dynamic libraries............................................................................ 15514.12 Position-independent code.................................................................................. 15714.13 System programming.......................................................................................... 15915 Metaprogramming ....................................................................................................... 16015.1 Template metaprogramming ................................................................................. 16015.2 Metaprogramming with constexpr branches.......................................................... 16315.3 Metaprogramming with constexpr functions .......................................................... 16416 Testing speed.............................................................................................................. 16416.1 Using performance monitor counters .................................................................... 16616.2 The pitfalls of unit-testing ...................................................................................... 16716.3 Worst-case testing ................................................................................................ 16817 Optimization in embedded systems............................................................................. 16918 Overview of compiler options....................................................................................... 17119 Literature..................................................................................................................... 17520 Copyright notice .......................................................................................................... 176
评论