Writing Fast Programs |
Services Products Resources |
Writing Fast Programs: A Practical Guide for Scientists and Engineers John Riley Published by Cambridge International Science Publishing ISBN: 1-904602-40-1 Overview "Writing Fast Programs" provides the basic elements of code optimization and provides strategies for reducing bottlenecks in practical simulation and numerical modeling code. The target audience is scientists and engineers and students in these fields. One pre-publication reviewer called this a much-needed intermediate text to bridge the gap between existing introductory and more advance programming books aimed at scientists. "Writing Fast Programs" does not teach basic programming; some programming proficiency is assumed, along with familiarity with the basic programming terminology. Code examples are presented in C, but BASIC (as a convenient pseudo-language) examples are provided for those not familiar with C. In general, the strategies presented are not language specific and should therefore benefit a wide programming audience. For example, similar techniques have been discussed for Java. Table of ContentsForward Acknowledgements List of Figures List of Tables Part I:The Foundations 1. Introduction to Code Optimization 1.1 Motivation for Writing High Performance Code 1.2 Scientist Programmer vs. Computer Programmer 1.3 Programming Languages 1.3.1 One Extreme: BASIC for Ease of Use 1.3.2 Another Extreme: C For Speed 1.3.3 Program Development Efficiency 1.3.4 Scripting 1.3.5 Procedural vs. Object Oriented Languages 1.4 General Thoughts on Optimization 1.4.1 Why Optimize 1.4.2 How Difficult is Optimization 1.4.3 When and What to Optimize 1.5 Overview of Samples Presented in This Book 1.6 Additional Reading 2. PC Hardware 2.1 Basic System Architecture 2.2 Numerical Representations 2.3 Addressing 2.4 Basic CPU Architecture and Introduction to ASSEMBLY Language 2.5 Code Execution and Timing 2.6 Pipelining, Speculative Execution and Branch Prediction 2.7 Optimization at the CPU Level 2.8 A Brief Historical Summary of Desktop Computer CPUs 2.8.1 Intel Processors 2.8.2 AMD Processors 2.8.3 Cyrix Processors 2.8.4 Motorola/IBM Processors 2.9 Modern Physical Memory Architectures 2.9.1 Basic Memory Architecture 2.9.2 SDRAM and DDR SDRAM Memory 2.9.3 RAMBUS Memory 3. Operating System Considerations 3.1 The Operating System in Perspective 3.2 Operating Systems and Performance 3.3 Operating System Architectures 3.4 Specific PC Operating Systems 3.4.1 MS Windows 3.4.2 Linux 3.5 A Brief Comparison of Windows and Linux Performance 3.5.1 Simple Numerical Procedure 3.5.2 Thread/Process Creation 3.5.3 Context Switching 4. Compiler Considerations 4.1 Interpreters vs. Compilers 4.2 Compiler Optimizations 4.2.1 Aliasing 4.2.2 Array Bounds Checking 4.2.3 Numerical Overflow Checking 4.2.4 Unrounded Floating Point Operations 4.2.5 FDIV Bug Checking 4.2.6 Inline and Intrinsic Functions 4.2.7 Common Sub Expressions 4.3 Using Programs Compiled with Old Compilers 4.4 C++ and Similar Compilers 4.4.1 MS Visual C++ 4.4.2 MS Visual C# 4.4.3 g++ 4.4.4 VectorC 4.4.5 KAI C++ Part II: Implementation 5. Data Management 5.1 Implicit Declaration and the Variant Problem 5.2 Eliminating Implicit Type Conversions 5.2.1 Type Matching to Function Calls 5.2.2 Loop Counters 5.3 Immediates, Constants and Variables 5.4 Type Specific Operators 5.5 Variable Scope 5.6 Data Organization 5.6.1 Scalars, Arrays and Hashes 5.6.2 Pointers 5.6.3 Queues and Stacks 5.6.4 Linked Lists and Trees 5.6.5 Structures/User Defined Types 5.6.6 Objects and Classes 5.7 Naming Variables, Functions, Objects and Classes 6. Function and Procedure Calling: Optimizing Program Flow 6.1 Mechanism of Function Calling and Inline Code 6.2 Programming in the Sub-Routine Style 6.3 Calling with Reduced Stack Overhead 6.4 Using Library Functions 6.5 Hand Coding vs. Using Language Functions: A Pseudo Random Number Generator 6.6 Recursive Functions: The Factorial 6.7 Function Calling Conventions 7. Loops and Vectors 7.1 General Vector Concepts and the Potential for Complex Code Structures 7.2 Loop Unrolling in a Trade-Off with Generality Matrix Multiplication 7.3 Partial Loop Unrolling Simpson’s Rule Integration 7.4 A Practical Example: Time Dependent Heat Flow 8. Programming in the RISC Style 8.1 The KISS Principle 8.2 Practical Example The Lennard Jones Energy 8.3 Practical Example The Boltzmann Factor 8.4 Other Mathematical Tricks 9. Look-Up Tables 9.1 Memory Access vs. Computation 9.2 The LUT in Action: 1 Dimensional Examples 9.2.1 Lennard Jones Energy 9.2.2 Other 1-D Examples 9.2.3 Introduced Error in the LUT Technique 9.3 The LUT in Action: 2 Dimensional Examples 9.3.1 Distance 9.3.2 Heterogeneous Lennard Jones Energy 9.4 Virtual vs. Physical Memory 10. Other Algorithm Optimization Techniques 10.1 The Use of Symmetry 10.2 Elimination of Nested Loops 10.3 Reducing Decision Logic Overhead: Bit Flag Encoding Part III: Parallel Processing 11. Multi-Tasking Basics 11.1 Multi-Tasking Terms 11.2 Multi-Threaded Programming 11.2.1 Simple Thread Creation and Termination 11.2.2 Communicating between Threads 11.2.3 When to Use Multiple Threads 11.3 Multi-Process Programming 11.3.1 Simple Process Creation and Termination 11.3.2 Communicating Between Processes 11.3.3 When to Use Multiple Processes 12. Parallel Computation Basics 12.1 Parallel Architecture Basics 12.1.1 Multiple Execution Units in the CPU 12.1.2 Symmetric Multi-Processors 12.1.3 Single Instruction, Multiple Data (SIMD) 12.1.4 Multiple Instruction, Multiple Data (MIMD) 12.1.5 Algorithm Considerations 12.1.6 Performance Considerations in Parallel Systems 12.2 SIMD 12.2.1 Integer Array Addition with MMX 12.2.2 Floating Point SIMD and Testing for SIMD Capability 12.2.3 Vector Dot Product using SIMD 12.2.4 4x4 Matrix Multiplication using SIMD 12.2.5 Simpson’s Rule Integration using SIMD 12.2.6 A Comparison of 3dNow! to SSE SIMD Performance 12.2.7 Compilers for SIMD and Additional SIMD Extensions 12.3 MIMD 12.3.1 Networks of Workstations 12.3.2 Clusters 12.3.3 Distributed Computing 12.4 Implementing MIMD 12.4.1 Simple Parallel Computing Without Messages 12.4.2 Simple Master-Slave Parallel Computing 12.4.3 Simple Message Passing Parallel Computing 12.5 Message Passing Tools 12.5.1 Parallel Virtual Machine (PVM) 12.5.2 Message Passing Interface (MPI) 12.5.3 Other Tools Appendix A: A List of Modern Programming Tools Appendix B: How to Buy A High Performance PC Appendix C: Contents of the CD-Rom Appendix D: Complete Listing of MCGas Monte Carlo Program (Demo) Appendix E: BASIC Listings for the Part I and Part II Demo Programs Bibliography Index |
DSB Scientific Consulting . New Bern, NC . 803-413-4768 TEL
© DSB Scientific Consulting 2005-2014 |