Performance Tips for NAG Fortran Compiler
General Performance Tips
- Use -O3 or -O4 instead of just -O. This will lengthen compile time (sometimes substantially with -O4), but runtime performance is usually improved.
- If you use assumed-shape arrays and you know that the actual arguments are always contiguous (i.e. you do not pass array slices using section notation), use -Oassumed=always_contig. With this option, a runtime error occurs if a non-contiguous actual argument is detected (so it is also useful for discovering whether you use such array sections).
If you are not 100% sure, but you think that this is true all or almost all of the time, use -Oassumed. With this option, non-contiguous actual arguments will be accepted though access to them will be slow.
Performance tips for Intel Linux
This compiler option may provide a worthwhile speed up on this platform. However it may also have some pitfalls.
It may give incorrect results when either common blocks or derived types have double precision entities following an odd number of single precision entities.
COMMON/c/x(3),d INTEGER x DOUBLE PRECISION dor
TYPE t DOUBLE PRECISION value1 LOGICAL flag DOUBLE PRECISION value2 END TYPEYou can often avoid these problems by ensuring that double precision entities are at the beginning of common blocks and structures, e.g.
TYPE t DOUBLE PRECISION value1,value2 LOGICAL flag END TYPEBut, if your code does not use common blocks or derived types with the above pitfalls, a good speed up may be expected on many programs.
This typically speeds up an application by a factor of three at the cost of losing IEEE gradual underflow. Speed-ups of more than a factor of 100 (that is not a typo!) have been seen in some cases.
This option increases speed yet further over -ieee=nonstd. However, some numerically unsafe optimisations are done, and floating-point exceptions are sometimes reported later than expected.
Performance tips for IBM Risc System 6000
The floating-point hardware on the RS/6000 is much slower when floating-point traps are enabled. By default, these traps are enabled by NAGWare f95 (because it greatly eases debugging); by using -ieee=full floating-point operations run several times faster.
Performance tips for Sun SPARC running Solaris
If your application makes significant use of denormalised numbers, but does not rely on them for accurate results, this option can improve performance substantially. (This is not true of all SPARC processors; the switch is only important if a significant fraction of execution time is "system" rather than "user" time).