Writing Fast Programs

Services

Current Physical Chemistry Research Projects

Analytical Chemistry Consulting Services

Products

Resources

Writing Fast Programs: A Practical Guide for Scientists and Engineers
John Riley
Published by Cambridge International Science Publishing
ISBN: 1-904602-40-1

Overview
"Writing Fast Programs" provides the basic elements of code optimization and provides strategies for reducing bottlenecks in practical simulation and numerical modeling code. The target audience is scientists and engineers and students in these fields. One pre-publication reviewer called this a much-needed intermediate text to bridge the gap between existing introductory and more advance programming books aimed at scientists.

"Writing Fast Programs" does not teach basic programming; some programming proficiency is assumed, along with familiarity with the basic programming terminology. Code examples are presented in C, but BASIC (as a convenient pseudo-language) examples are provided for those not familiar with C. In general, the strategies presented are not language specific and should therefore benefit a wide programming audience. For example, similar techniques have been discussed for Java.

Preface
Forward
Acknowledgements
List of Figures
List of Tables
Part I:The Foundations
1. Introduction to Code Optimization
    1.1 Motivation for Writing High Performance Code
    1.2 Scientist Programmer vs. Computer Programmer
    1.3 Programming Languages
      1.3.1 One Extreme: BASIC for Ease of Use
      1.3.2 Another Extreme: C For Speed
      1.3.3 Program Development Efficiency
      1.3.4 Scripting
      1.3.5 Procedural vs. Object Oriented Languages
    1.4 General Thoughts on Optimization
      1.4.1 Why Optimize
      1.4.2 How Difficult is Optimization
      1.4.3 When and What to Optimize
    1.5 Overview of Samples Presented in This Book
    1.6 Additional Reading
2. PC Hardware
    2.1 Basic System Architecture
    2.2 Numerical Representations
    2.3 Addressing
    2.4 Basic CPU Architecture and Introduction to ASSEMBLY Language
    2.5 Code Execution and Timing
    2.6 Pipelining, Speculative Execution and Branch Prediction
    2.7 Optimization at the CPU Level
    2.8 A Brief Historical Summary of Desktop Computer CPUs
      2.8.1 Intel Processors
      2.8.2 AMD Processors
      2.8.3 Cyrix Processors
      2.8.4 Motorola/IBM Processors
    2.9 Modern Physical Memory Architectures
      2.9.1 Basic Memory Architecture
      2.9.2 SDRAM and DDR SDRAM Memory
      2.9.3 RAMBUS Memory
3. Operating System Considerations
    3.1 The Operating System in Perspective
    3.2 Operating Systems and Performance
    3.3 Operating System Architectures
    3.4 Specific PC Operating Systems
      3.4.1 MS Windows
      3.4.2 Linux
    3.5 A Brief Comparison of Windows and Linux Performance
      3.5.1 Simple Numerical Procedure
      3.5.2 Thread/Process Creation
      3.5.3 Context Switching
4. Compiler Considerations
    4.1 Interpreters vs. Compilers
    4.2 Compiler Optimizations
      4.2.1 Aliasing
      4.2.2 Array Bounds Checking
      4.2.3 Numerical Overflow Checking
      4.2.4 Unrounded Floating Point Operations
      4.2.5 FDIV Bug Checking
      4.2.6 Inline and Intrinsic Functions
      4.2.7 Common Sub Expressions
    4.3 Using Programs Compiled with Old Compilers
    4.4 C++ and Similar Compilers
      4.4.1 MS Visual C++
      4.4.2 MS Visual C#
      4.4.3 g++
      4.4.4 VectorC
      4.4.5 KAI C++
Part II: Implementation
5. Data Management
    5.1 Implicit Declaration and the Variant Problem
    5.2 Eliminating Implicit Type Conversions
      5.2.1 Type Matching to Function Calls
      5.2.2 Loop Counters
    5.3 Immediates, Constants and Variables
    5.4 Type Specific Operators
    5.5 Variable Scope
    5.6 Data Organization
      5.6.1 Scalars, Arrays and Hashes
      5.6.2 Pointers
      5.6.3 Queues and Stacks
      5.6.4 Linked Lists and Trees
      5.6.5 Structures/User Defined Types
      5.6.6 Objects and Classes
    5.7 Naming Variables, Functions, Objects and Classes
6. Function and Procedure Calling: Optimizing Program Flow
    6.1 Mechanism of Function Calling and Inline Code
    6.2 Programming in the Sub-Routine Style
    6.3 Calling with Reduced Stack Overhead
    6.4 Using Library Functions
    6.5 Hand Coding vs. Using Language Functions: A Pseudo Random Number Generator
    6.6 Recursive Functions: The Factorial
    6.7 Function Calling Conventions
7. Loops and Vectors
    7.1 General Vector Concepts and the Potential for Complex Code Structures
    7.2 Loop Unrolling in a Trade-Off with Generality Matrix Multiplication
    7.3 Partial Loop Unrolling Simpson’s Rule Integration
    7.4 A Practical Example: Time Dependent Heat Flow
8. Programming in the RISC Style
    8.1 The KISS Principle
    8.2 Practical Example The Lennard Jones Energy
    8.3 Practical Example The Boltzmann Factor
    8.4 Other Mathematical Tricks
9. Look-Up Tables
    9.1 Memory Access vs. Computation
    9.2 The LUT in Action: 1 Dimensional Examples
      9.2.1 Lennard Jones Energy
      9.2.2 Other 1-D Examples
      9.2.3 Introduced Error in the LUT Technique
    9.3 The LUT in Action: 2 Dimensional Examples
      9.3.1 Distance
      9.3.2 Heterogeneous Lennard Jones Energy
    9.4 Virtual vs. Physical Memory
10. Other Algorithm Optimization Techniques
    10.1 The Use of Symmetry
    10.2 Elimination of Nested Loops
    10.3 Reducing Decision Logic Overhead: Bit Flag Encoding
Part III: Parallel Processing
11. Multi-Tasking Basics
    11.1 Multi-Tasking Terms
    11.2 Multi-Threaded Programming
      11.2.1 Simple Thread Creation and Termination
      11.2.2 Communicating between Threads
      11.2.3 When to Use Multiple Threads
    11.3 Multi-Process Programming
      11.3.1 Simple Process Creation and Termination
      11.3.2 Communicating Between Processes
      11.3.3 When to Use Multiple Processes
12. Parallel Computation Basics
    12.1 Parallel Architecture Basics
      12.1.1 Multiple Execution Units in the CPU
      12.1.2 Symmetric Multi-Processors
      12.1.3 Single Instruction, Multiple Data (SIMD)
      12.1.4 Multiple Instruction, Multiple Data (MIMD)
      12.1.5 Algorithm Considerations
      12.1.6 Performance Considerations in Parallel Systems
    12.2 SIMD
      12.2.1 Integer Array Addition with MMX
      12.2.2 Floating Point SIMD and Testing for SIMD Capability
      12.2.3 Vector Dot Product using SIMD
      12.2.4 4x4 Matrix Multiplication using SIMD
      12.2.5 Simpson’s Rule Integration using SIMD
      12.2.6 A Comparison of 3dNow! to SSE SIMD Performance
      12.2.7 Compilers for SIMD and Additional SIMD Extensions
    12.3 MIMD
      12.3.1 Networks of Workstations
      12.3.2 Clusters
      12.3.3 Distributed Computing
    12.4 Implementing MIMD
      12.4.1 Simple Parallel Computing Without Messages
      12.4.2 Simple Master-Slave Parallel Computing
      12.4.3 Simple Message Passing Parallel Computing
    12.5 Message Passing Tools
      12.5.1 Parallel Virtual Machine (PVM)
      12.5.2 Message Passing Interface (MPI)
      12.5.3 Other Tools
Appendix A: A List of Modern Programming Tools
Appendix B: How to Buy A High Performance PC
Appendix C: Contents of the CD-Rom
Appendix D: Complete Listing of MCGas Monte Carlo Program (Demo)
Appendix E: BASIC Listings for the Part I and Part II Demo Programs
Bibliography
Index

Writing Fast Programs

Table of Contents