深入理解并行编程
深入理解并行编程1. 简介.........................................................................................................141.1. 导致并行编程困难的历史原因......................................................141.2. 并行编程的目标..............................................................................151.2.1. 性能...........................................................................................161.2.2. 生产率.......................................................................................171.2.3. 通用性.......................................................................................181.3. 并行编程的替代方案......................................................................201.3.1. 顺序应用多实例化...................................................................201.3.2. 使用现有的并行软件...............................................................211.3.3. 性能优化...................................................................................211.4. 是什么使并行编程变得复杂?......................................................221.4.1. 工作分割...................................................................................221.4.2. 并行访问控制...........................................................................231.4.3. 资源分割和复制.......................................................................241.4.4. 与硬件交互...............................................................................241.4.5. 组合使用...................................................................................241.4.6. 语言和环境如何对这样的任务进行支持?...........................251.5. 本书导读..........................................................................................251.5.1. 小问题.......................................................................................251.5.2. 随书源码...................................................................................262. 硬件的习性.............................................................................................282.1. 概述..................................................................................................282.1.1. CPU 流水线..............................................................................292.1.2. 内存引用...................................................................................302.1.3. 原子操作...................................................................................312.1.4. 内存屏障...................................................................................322.1.5. Cache Miss................................................................................332.1.6. I/O 操作 ....................................................................................342.2. 开销..................................................................................................352.2.1. 硬件体系结构...........................................................................362.2.2. 操作的开销...............................................................................372.3. 硬件的免费午餐?............................................................................382.3.1. 3D 集成.....................................................................................392.3.2. 新材料和新工艺.......................................................................392.3.3. 专用加速器...............................................................................39深入理解并行编程2.3.4. 现有的并行软件.......................................................................402.4. 软件设计 Implication ......................................................................403. 工具.........................................................................................................433.1. 脚本语言..........................................................................................433.2. POSIX 多进程.................................................................................443.2.1. POSIX 进程创建和撤销..........................................................443.2.2. POSIX 线程的创建和撤销......................................................463.2.3. POSIX 锁..................................................................................483.2.4. POSIX 读写锁..........................................................................523.3. 原子操作..........................................................................................553.4. Linux 内核中类似 POSIX 的操作 .................................................563.5. 趁手的工具——该如何选择?......................................................584. 计数.........................................................................................................594.1. 为什么并发计数不可小看?..........................................................604.2. 统计计数器......................................................................................624.2.1. 设计...........................................................................................624.2.2. 基于数组的实现.......................................................................624.2.3. 结果一致的实现.......................................................................644.2.4. 基于每线程变量的实现...........................................................664.2.5. 讨论...........................................................................................694.3. 近似上限计数器..............................................................................694.3.1. 设计...........................................................................................694.3.2. 简单的上限计数器实现...........................................................704.3.3. 关于简单上限计数器的讨论...................................................764.3.4. 近似上限计数器的实现...........................................................764.3.5. 关于近似上限计数器的讨论...................................................774.4. 精确上限计数器..............................................................................774.4.1. 原子上限计数器的实现...........................................................774.4.2. 关于原子上限计数器的讨论...................................................864.4.3. Signal-Theft 上限计数器的设计 .............................................864.4.4. Signal-Theft 上限计数器的实现 .............................................874.4.5. Signal-Theft 上限计数器讨论 .................................................944.5. 特殊的并行计数器..........................................................................954.6. 并行计数的讨论..............................................................................965. 分割和同步设计...................................................................................100深入理解并行编程5.1. 分割练习........................................................................................1005.1.1. 哲学家就餐问题.....................................................................1005.1.2. 双端队列.................................................................................1025.1.3. 关于分割问题示例的讨论..................................................... 1115.2. 设计准则........................................................................................ 1115.3. 同步粒度........................................................................................1135.3.1. 串行程序.................................................................................1145.3.2. 代码锁.....................................................................................1165.3.3. 数据锁.....................................................................................1175.3.4. 数据所有权.............................................................................1205.3.5. 锁粒度与性能.........................................................................1215.4. 并行快速路径................................................................................1215.4.1. 读写锁.....................................................................................1225.4.2. 层级锁.....................................................................................1235.4.3. 资源分配器缓存.....................................................................1255.5. 性能总结........................................................................................1316. 锁...........................................................................................................1326.1. 生存(staying alive) ...................................................................1336.1.1. 死锁.........................................................................................1336.1.2. 活锁.........................................................................................1366.1.3. 不公平.....................................................................................1376.1.4. 低效率.....................................................................................1376.2. 锁的类型........................................................................................1376.2.1. 互斥锁.....................................................................................1386.2.2. 读写锁.....................................................................................1386.2.3. Beyond Reader-Writer Locks .................................................1386.3. 基于锁的存在担保(existence guarantee)................................1387. 数据所有者...........................................................................................1408. 延迟处理...............................................................................................1428.1. 屏障................................................................................................1428.2. 引用计数........................................................................................1428.2.1. 引用计数类型的实现.............................................................1438.2.2. 支持引用计数的 Linux 原语.................................................1508.2.3. 计数器优化.............................................................................1518.3. Read-Copy Update(RCU).........................................................151深入理解并行编程8.3.1. RCU 基础 ...............................................................................1518.3.2. RCU 用法 ...............................................................................1638.3.3. Linux 内核中的 RCU API......................................................1768.3.4. “玩具式”的 RCU 实现 ......................................................1838.3.5. RCU 练习 ...............................................................................2069. 使用 RCU .............................................................................................2079.1. RCU 和基于每线程变量的统计计数器 ......................................2079.1.1. 设计.........................................................................................2079.1.2. 实现.........................................................................................2079.1.3. 讨论.........................................................................................2119.2. RCU 和可移除 I/O 设备的计数器...............................................21110. 验证:调试及分析...............................................................................21411. 数据结构...............................................................................................21612. 高级同步...............................................................................................21812.1. 避免锁............................................................................................21812.2. 内存屏障........................................................................................21812.2.1. 内存序及内存屏障..........................................................21812.2.2. 如果 B 在 A 后面, 并且 C 在 B 后面, 为什么 C 不在 A后面? 22012.2.3. 变量可以拥有多个值......................................................22112.2.4. 能信任什么东西?............................................................22212.2.5. 锁实现回顾......................................................................22912.2.6. 一些简单的规则..............................................................23012.2.7. 抽象内存访问模型..........................................................23012.2.8. 设备操作..........................................................................23312.2.9. 保证..................................................................................23312.2.10. 什么是内存屏障?............................................................23412.2.11. 锁约束..............................................................................24712.2.12. 内存屏障示例..................................................................24812.2.13. CPU..................................................................................25112.2.14. 哪里需要内存屏障?........................................................25312.3. 非阻塞同步....................................................................................25312.3.1. 简单 NBS........................................................................25312.3.2. 冒险指针..........................................................................25312.3.3. 原子数据结构..................................................................253深入理解并行编程12.3.4. ``Macho'' NBS..................................................................25313. 易于使用...............................................................................................25413.1. Rusty Scale for API Design ...........................................................25413.2. Shaving the Mandelbrot Set...........................................................25514. 时间管理...............................................................................................25815. 未来的冲突...........................................................................................25915.1. 可交易内存....................................................................................25915.1.1. I/O 操作 ..........................................................................26015.1.2. RPC 操作........................................................................26015.1.3. 内存映射操作..................................................................26115.1.4. 多线程事务......................................................................26215.1.5. 外部的事务访问..............................................................26315.1.6. 延时..................................................................................26415.1.7. 锁......................................................................................26415.1.8. 读者-写者锁 ....................................................................26515.1.9. 持续性..............................................................................266TM 如何提供类似的持续性功能?.................................................................26615.1.10. 动态链接装载..................................................................26615.1.11. 调试..................................................................................26715.1.12. exec() 系统调用..............................................................26815.1.13. RCU .................................................................................26815.1.14. 讨论..................................................................................27015.2. 共享内存并行编程........................................................................27015.3. 基于任务的并行编程....................................................................270A. 重要问题...............................................................................................271A.1 “after“的含义是什么?...................................................................271B. 同步原语...............................................................................................277B.1 初始化............................................................................................277B.1.1 smp_init()................................................................................277B.2 线程创建、销毁及控制................................................................278B.2.1 create_thread() ........................................................................278B.2.2 smp_thread_id()......................................................................278B.2.3 for_each_thread()....................................................................278B.2.4 for_each_running_thread() .....................................................279B.2.5 wait_thread()...........................................................................279深入理解并行编程B.2.6 wait_all_threads() ...................................................................279B.2.7 用法示例..........................................................................279B.3 锁....................................................................................................280B.3.1 spin_lock_init().......................................................................280B.3.2 spin_lock() ..............................................................................280B.3.3 spin_trylock()..........................................................................281B.3.4 spin_unlock() ..........................................................................281B.3.5 用法示例..........................................................................281B.4 每线程变量....................................................................................281B.4.1 DEFINE_PER_THREAD()....................................................282B.4.2 DECLARE_PER_THREAD()................................................282B.4.3 per_thread().............................................................................282B.4.4 __get_thread_var()..................................................................282B.4.5 init_per_thread() .....................................................................282B.4.6 用法示例..........................................................................282B.5 性能................................................................................................283C. 为什么使用内存屏障...........................................................................284C.1 Cache 结构....................................................................................284C.2 缓存一致性协议............................................................................286C.2.1 MESI 状态.............................................................................286C.2.2 MESI 协议消息.....................................................................287C.2.3 MESI 状态图..........................................................................288C.2.4 MESI 协议示例.....................................................................289C.3 不必要的存储延迟........................................................................291C.3.1 Store Buffers...........................................................................291C.3.2 Store Forwarding ....................................................................292C.3.3 存储缓冲区及内存屏障..................................................293C.4 不必要的存储延迟........................................................................296C.4.1 无效队列..........................................................................296C.4.2 使无效队列及使无效应答..............................................296C.4.3 无效队列及内存屏障......................................................297C.5 读和写内存屏障............................................................................300C.6 内存屏障示例................................................................................300C.6.1 乱序体系结构..................................................................300C.6.2 示例 1..............................................................................301深入理解并行编程C.6.3 示例 2..............................................................................302C.6.4 示例 3..............................................................................303C.7 特定 CPUs 的内存屏障指令 ........................................................304C.7.1 Alpha.......................................................................................306C.7.2 AMD64 ...................................................................................308C.7.3 ARMv7-A/R ...........................................................................3096 ISB();...............................................................................................................309C.7.4 IA64 ........................................................................................309C.7.5 PA-RISC..................................................................................310C.7.6 POWER / Power PC ...............................................................310C.7.7 SPARC RMO, PSO, and TSO ................................................311C.7.8 x86...........................................................................................312C.7.9 zSeries.....................................................................................313C.8 内存屏障是永恒的?......................................................................313C.9 对硬件设计者的建议....................................................................314D. RCU 实现 .............................................................................................315D.1 可睡眠 RCU 实现........................................................................315D.1.1 SRCU 实现原理....................................................................316D.1.2 SRCU API 及用法.................................................................317D.1.3 实现..................................................................................320D.1.4 SRCU 概述............................................................................326D.2 分级 RCU 概述 ...........................................................................326D.2.1 RCU 基础回顾 ......................................................................326D.2.2 经典 RCU 实现概要......................................................327D.2.3 RCU 迫切要解决的问题.......................................................328D.2.4 可扩展 RCU 实现...........................................................329D.2.5 迈向不成熟的 RCU 实现...............................................332D.2.6 状态机..............................................................................334D.2.7 用例..................................................................................335D.2.8 测试..................................................................................340D.2.9 结论..................................................................................345D.3 分级 RCU 代码走查.....................................................................346D.3.1 数据结构及内核参数......................................................346D.3.2 外部接口..........................................................................354D.3.3 初始化..............................................................................362深入理解并行编程D.3.4 CPU 热插拨...........................................................................367D.3.5 杂项函数..........................................................................372D.3.6 Grace-Period 检测函数 ..........................................................373D.3.7 Dyntick-Idle 函数..................................................................385D.3.8 强制静止状态..................................................................390D.3.9 CPU-延迟检测 .......................................................................397D.3.10 可能的缺陷及变更..........................................................400D.4 可抢占 RCU .................................................................................400D.4.1 RCU 概念 ...............................................................................401D.4.2 可抢占 RCU 算法概述 ...................................................402D.4.3 验证可抢占 RCU............................................................419E. 形式验证...............................................................................................422E.1 什么是 Promela 和 Spin? ...........................................................422E.2 Promela 示例: 非原子性递增.....................................................423E.3 Promela 示例: 原子递增.............................................................426E.3.1 组合.........................................................................................427E.4 如何使用 Promela ........................................................................428E.4.1 Promela 特性 .........................................................................428E.4.2 Promela 编程技巧 ..................................................................429E.5 Promela 示例: 锁.........................................................................430E.6 Promela 示例: QRCU...................................................................433E.6.1 运行 QRCU 示例 .................................................................438E.6.2 到底需要多少读者和写者?...................................................439E.6.3 可选方法: 正确性校验..........................................................439E.6.4 可选方法: 更多工具..............................................................440E.6.5 可选方法: 分而治之..............................................................440E.7 Promela Parable: dynticks 和可抢占 RCU .................................440E.7.1 可抢占 RCU 和 dynticks 介绍............................................441E.7.2 验证可抢占 RCU 和 dynticks................................................445E.7.3 回顾.........................................................................................466E.8 简单的避免形式校验....................................................................467E.8.1 简单 Dynticks 接口的状态变量 ...........................................467E.8.2 进入和退出 Dynticks-Idle 模式............................................468E.8.3 从 Dynticks-Idle 模式进入 NMIs.........................................469E.8.4 Interrupts From Dynticks-Idle Mode ......................................470深入理解并行编程E.8.5 检查 Dynticks 静止状态.......................................................471E.8.6 讨论.........................................................................................473E.9 概要................................................................................................473F. 问题答案...............................................................................................474G. 术语表...................................................................................................475H. 感谢.......................................................................................................476
评论