NAG Library for the Xeon Phi Coprocessor, Mark 23

FSLM623DCL - License Managed

Linux 64 (Intel 64 and MIC), Intel Fortran, Double Precision

Users' Note



Contents


1. Introduction

This document is essential reading for every user of the NAG Library for the Xeon Phi Coprocessor implementation specified in the title. It provides implementation-specific detail that augments the information provided in the NAG Mark 23 Library Manual (which we will refer to as the Library Manual). Wherever that manual refers to the "Users' Note for your implementation", you should consult this note.

In addition, NAG recommends that before calling any Library routine you should read the following reference material (see Section 5):

(a) Essential Introduction
(b) Introduction to the NAG Library for the Xeon Phi Coprocessor
(b) Chapter Introduction
(c) Routine Document

The library supplied with this implementation has been compiled in a manner that facilitates its use within a multithreaded application. If you intend to use the NAG library within a multithreaded application please refer to the document on Thread Safety in the Library Manual (see Section 5).

Further information about using the supplied Intel MKL libraries with threaded applications is available at http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications.

2. Post Release Information

Please check the following URL:

http://www.nag.co.uk/doc/inun/fs23/lm6dcl/postrelease.html

for details of any new information related to the applicability or usage of this implementation.

3. General Information

The NAG Library for the Xeon Phi Coprocessor may be used in two different modes of execution: heterogeneous and native. Heterogeneous execution involves launching an executable on the host, and some parts of the computation may be offloaded to the Xeon Phi coprocessor. Native execution involves using the Xeon Phi coprocessor as a stand-alone compute node; native executables must be cross-compiled on the host and will execute on the Xeon Phi coprocessor only. For more information see the Introduction to the NAG Library for the Xeon Phi Coprocessor document.

Note that the term MIC is used throughout this document. MIC stands for Many Integrated Core. This is the name of the architecture of the Xeon Phi coprocessor. Many of Intel's environment variables are prefixed with MIC_.

3.1. Accessing the Library

In this section we assume that the library has been installed in the directory [INSTALL_DIR].

By default [INSTALL_DIR] (see Installer's Note (in.html)) is /opt/NAG/fslm623dcl or /usr/local/NAG/fslm623dcl depending on your system; however it could have been changed by the person who did the installation. To identify [INSTALL_DIR] for this installation:

3.1.1. Building Heterogeneous Executables

To use the NAG Library for Xeon Phi Coprocessor and the Intel MKL libraries to build heterogeneous executables, you may link in the following manner:
  ifort -align array64byte -auto -openmp -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/intel64/libnagsmp.a \
      [INSTALL_DIR]/lib/intel64/libnag_performance_parameters.a -mkl
where driver.f90 is your application program; or
  ifort -align array64byte -auto -openmp -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      -L[INSTALL_DIR]/lib/intel64 -lnagsmp -lnag_performance_parameters -mkl
if the shareable library is required. Please note the shareable library is fully resolved so that, as long as the environment variable LD_LIBRARY_PATH is set correctly at link time (see below), you need not link against other run-time libraries explicitly.

If your application has been linked with the shareable NAG libraries then the environment variables LD_LIBRARY_PATH and MIC_LD_LIBRARY_PATH must be extended, as follows, to allow run-time linkage.

In the C shell, type:

  setenv LD_LIBRARY_PATH [INSTALL_DIR]/lib/intel64:${LD_LIBRARY_PATH}
  setenv MIC_LD_LIBRARY_PATH [INSTALL_DIR]/lib/intel64:${MIC_LD_LIBRARY_PATH}

In the Bourne shell, type:

  LD_LIBRARY_PATH=[INSTALL_DIR]/lib/intel64:${LD_LIBRARY_PATH}
  export LD_LIBRARY_PATH
  MIC_LD_LIBRARY_PATH=[INSTALL_DIR]/lib/intel64:${MIC_LD_LIBRARY_PATH}
  export MIC_LD_LIBRARY_PATH

This implementation has been compiled and tested on the computer system detailed in Section 2.1 of the Installer's Note.

3.1.2. Building Native Executables for the MIC architecture

To use the NAG Library for Xeon Phi Coprocessor and the Intel MKL libraries to build native executables for the Xeon Phi coprocessor, you may link in the following manner:

  ifort -mmic -align array64byte -auto -openmp -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/mic/libnagsmp.a -mkl
where driver.f90 is your application program; or
  ifort -mmic -align array64byte -auto -openmp -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      -L[INSTALL_DIR]/lib/mic -lnagsmp -mkl -lifport -lifcoremt -limf -lsvml -lintlc -lifcore
if the shareable library is required. Please note the shareable library is fully resolved so that, as long as the environment variable LD_LIBRARY_PATH is set correctly at link time (see below), you need not link against other run-time libraries explicitly.

To run a native executable first you should log into the coprocessor you want to use. For example, ssh mic0 will log you into the first device in a multicard system, provided that you have been given an account on that device.

Note that the settings below assume that [INSTALL_DIR] is in a location that is mounted on the device being used. If this is not the case then it is also possible to transfer the libraries being linked to using the scp command, e.g. scp [INSTALL_DIR]/lib/mic/libnagsmp.so mic0:naglibs would transfer the shared library into a directory called naglibs in the user's home directory on the device. In this case LD_LIBRARY_PATH would need to point to ~naglibs instead. Note that scp may be used similarly to transfer your executable to the device if it is in a directory that is not mounted on the device. See http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-developers-quick-start-guide for more details about using your Xeon Phi coprocessor with native executables.

The Bourne shell is used on the coprocessor. After logging in, type:

  LD_LIBRARY_PATH=[INSTALL_DIR]/lib/mic:[INSTALL_DIR]/rtl/mic:[INSTALL_DIR]/mkl/lib/mic
  export LD_LIBRARY_PATH
to set the LD_LIBRARY_PATH when using an executable linked to the shareable NAG library, or:
  LD_LIBRARY_PATH=[INSTALL_DIR]/rtl/mic:[INSTALL_DIR]/mkl/lib/mic
  export LD_LIBRARY_PATH
to set the LD_LIBRARY_PATH when using an executable linked to the static NAG library.

Note that the above LD_LIBRARY_PATH settings will load the native Intel compiler run-time libraries and MKL libraries that are supplied by NAG. It should also be possible to set LD_LIBRARY_PATH to point to the locations of the corresponding libraries in your Intel compiler and MKL installation, assuming that its location is also mounted on the coprocessor. Once this environment variable has been set it is possible to set the number of OpenMP threads to use (see below) and run the native executable.

The example scripts demonstrate all of the different ways to link to the NAG library discussed in this section.

Note that the example scripts use the compiler's default optimization level (i.e. no -On flag is supplied). This is the case for all examples except the following examples as built for native execution: c05ndfe c05pdae c05pdfe c05qdfe c05rdfe. These examples require compilation at optimization level 1 (i.e. -O1) in order to avoid a compiler bug.

3.1.3. Setting the Number of Threads to use for Heterogeneous Executables

For heterogeneous executables by default environment variables set on the host are copied across to the Xeon Phi coprocessor during an offload. This is typically not what is required if e.g. OMP_NUM_THREADS is being used to control the number of OpenMP threads, since the number required on the Xeon Phi coprocessor will likely be much greater than needed on the host system. Thus, users should also set MIC_ENV_PREFIX=MIC to enable a separate environment variable, MIC_OMP_NUM_THREADS.

For heterogeneous executables use the following settings to control the number of threads:

In the C shell type:

  setenv MIC_ENV_PREFIX MIC
  setenv OMP_NUM_THREADS N1
  setenv MIC_OMP_NUM_THREADS N2
In the Bourne shell, type:
  MIC_ENV_PREFIX=MIC
  export MIC_ENV_PREFIX
  OMP_NUM_THREADS=N1
  export OMP_NUM_THREADS
  MIC_OMP_NUM_THREADS=N2
  export MIC_OMP_NUM_THREADS
where N1 is the number of threads required on the host and N2 is the number of threads required on the Xeon Phi coprocessor. OMP_NUM_THREADS and MIC_OMP_NUM_THREADS may be re-set between each execution of the program, as desired.

When setting the number of threads to use users should be aware that newer Intel processors (Nehalem or later) support a facility known as Hyperthreading, which allows each physical core to support up to two threads at the same time and thus appear to the operating system as two logical cores. It may be beneficial to make use of this functionality, but this choice will depend on the particular algorithms and problem size(s) used. You are advised to benchmark performance critical applications with and without Hyperthreading enabled, to determine the best choice for you. It is possible to avoid Hyperthreading through affinity settings. If OMP_NUM_THREADS is set to the number of physical cores then the following setting is advised for the host:

  KMP_AFFINITY="granularity=fine,scatter"
  export KMP_AFFINITY

The MIC architecture uses a similar approach where each physical core appears to the operating system as four logical cores. However, there are key differences compared to Hyperthreading which mean that making use of multiple threads in the same physical core is encouraged in most cases for the Xeon Phi coprocessor. Hyperthreading allows two threads to access different resources (functional units) within the same physical core, and thus if threads are trying to make use of the same resource (e.g. the floating point unit) there is no benefit. In contrast, in the MIC architecture threads share a physical core using context switching in order to hide the latency associated with in-order instructions. In general it is recommended that at least two threads are used per physical core. In order to assign threads to cores fairly, and to fix their affinity the following settings are recommended when fewer than the recommended maximum number of threads is being used:

  MIC_KMP_AFFINITY="granularity=fine,scatter"
  export MIC_KMP_AFFINITY
or:
  MIC_KMP_AFFINITY="granularity=fine,balanced"
  export MIC_KMP_AFFINITY
The difference between scatter and balanced is what happens when more threads are being used than there are physical cores. In the case of scatter neighbouring threads do not share the same core; threads are assigned to cores in a round-robin manner. For balanced, the threads allocated to the same core are neighbours of one another, which may help with cache utilization in some cases. Note that the balanced option is available on the MIC architecture only.

For both the host and MIC architectures if the number of threads being used is equal to the recommended maximum then the following settings are advised:

  KMP_AFFINITY="granularity=fine,compact"
  export KMP_AFFINITY
  MIC_KMP_AFFINITY="granularity=fine,compact"
  export MIC_KMP_AFFINITY

The supplied Intel MKL libraries include additional environment variables to allow greater control of the threading within MKL. These are discussed at http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading. Many NAG routines make calls to routines within MKL, thus the MKL environment variables may indirectly affect the operation of the NAG library as well. The default settings of the MKL environment variables should be suitable for most purposes, thus it is recommended that you do not explicitly set these variables. Please contact NAG for further advice if required.

3.1.4. Setting the Number of Threads to use for Native Executables

For native executables, after logging into the Xeon Phi coprocessor, set the number of threads required as follows:

  OMP_NUM_THREADS=N
  export OMP_NUM_THREADS

where N is the number of threads required.

Affinity should be set to the following when using the recommended maximum number of threads for native execution:

  KMP_AFFINITY="granularity=fine,compact"
  export KMP_AFFINITY

When using fewer than the recommended maximum number of threads affinity settings should be as follows:

  KMP_AFFINITY="granularity=fine,balanced"
  export KMP_AFFINITY

3.1.5. Recommended Maximum Number of Threads

The recommended maximum number of threads for the host processor is equal to the number of logical cores, which is usually twice the number of physical cores. For the Xeon Phi coprocessor the recommended maximum number of threads for heterogeneous executables is 4*(number of physical cores - 1). This is because the MIC architecture is designed with hardware to support up to 4 threads per core. The -1 is because one core is used by the Operating System on the Xeon Phi coprocessor to handle offloads. For native executables this restriction does not apply, so the number of threads should not be greater than 4*(number of physical cores). Note that, on both the host and Xeon Phi coprocessor, the best performance may be achieved using fewer than the recommended maximum number of threads. Users are encouraged to experiment with different numbers of threads.

The number of physical cores in both the host and Xeon Phi coprocessor varies depending on the particular hardware being used. Please contact NAG if you would like help determining the specific details of your system.

3.1.6. Other System Settings

The performance of offload regions can sometimes be improved when arrays are allocated using 2MB rather than the default 4KB pages on the Xeon Phi coprocessor. The compiler offload run-time selects to allocate using 2MB pages heap variables whose size is greater than the value of the environment variable MIC_USE_2MB_BUFFERS. For example, if MIC_USE_2MB_BUFFERS=64K arrays greater than 64KB in size will use 2MB pages. Users are advised to experiment with this setting.

Performance can also be improved by aligning arrays on 64 byte address boundaries. In Fortran this can be achieved using the -align array64byte compiler flag. It is recommended to always use this flag.

3.2. Interface Blocks

The NAG Library for the Xeon Phi Coprocessor interface blocks define the type and arguments of each user callable NAG Library for the Xeon Phi Coprocessor routine. These are not essential to calling the NAG Library for the Xeon Phi Coprocessor from Fortran programs. However, they are required if the supplied examples are used. Their purpose is to allow the Fortran compiler to check that NAG Library for the Xeon Phi Coprocessor routines are called correctly. The interface blocks enable the compiler to check that:

(a) subroutines are called as such;
(b) functions are declared with the right type;
(c) the correct number of arguments are passed; and
(d) all arguments match in type and structure.

The NAG Library for the Xeon Phi Coprocessor interface block files are organised by Library chapter. They are aggregated into one module named

  nag_library
The modules are supplied in pre-compiled form (.mod files) and they can be accessed by specifying the -Ipathname option on each compiler invocation, where pathname ([INSTALL_DIR]/nag_interface_blocks) is the path of the directory containing the compiled interface blocks.

The .mod module files were compiled with the compiler shown in Section 2.1 of the Installer's Note. Such module files are compiler-dependent, so if you wish to use the NAG example programs, or use the interface blocks in your own programs, when using a compiler that is incompatible with these modules, you will first need to create your own module files. See the Post Release Information page

http://www.nag.co.uk/doc/inun/fs23/lm6dcl/postrelease.html

where more information may be available, or contact NAG for further help.

3.3 Performance Parameters

The NAG Library for the Xeon Phi Coprocessor uses a module called nag_performance_parameters. The module is used to hold 3 types of variables specific to heterogeneous use of the NAG Library, as discussed in the document "Introduction to the NAG Library for the Xeon Phi Coprocessor". The three types of variables are:

Users are encouraged to USE nag_performance_parameters and experiment with routine-specific switches to find out whether or not offloading is profitable.

In this implementation the decision whether or not to offload is based on problem size thresholds determined by NAG using a 61-core Xeon Phi coprocessor code-named "Knights Corner" with 8GB memory and a clock speed of 1.1GHz, attached to a host system with a single socket 8-core Xeon processor code-named "Sandy Bridge" with 20MB cache and a clock speed of 3.3GHz.

An alternative set of problem size thresholds based on a similar machine with a dual socket containing the same 8-core Xeon processors is also provided. In order to enable the NAG Library for the Xeon Phi Coprocessor to make decisions based on these thresholds instead of the default single socket values users should link to the alternative performance_parameters module as follows:

  ifort -align array64byte -auto -openmp -I[INSTALL_DIR]/nag_interface_blocks driver.f90 \
      [INSTALL_DIR]/lib/intel64/libnagsmp.a \
      [INSTALL_DIR]/lib/intel64/libnag_performance_parameters_sb2.a -mkl

More information about nag_performance_parameters and the conditions for offloads to occur internally within NAG routine calls is given in "Introduction to the NAG Library for the Xeon Phi Coprocessor".

3.4. Example Programs

The example results distributed were generated at Mark 23, using the software described in Section 2.2 of the Installer's Note. These example results may not be exactly reproducible if the example programs are run in a slightly different environment (for example, a different Fortran compiler, a different compiler library, or a different set of Basic Linear Algebra Subprograms (BLAS) or Linear Algebra PACKage (LAPACK) routines). The results which are most sensitive to such differences are: eigenvectors (which may differ by a scalar multiple, often -1, but sometimes complex); numbers of iterations and function evaluations; and residuals and other "small" quantities of the same order as the machine precision.

Note that the example material has been adapted, if necessary, from that published in the Library Manual, so that programs are suitable for execution with this implementation with no further changes. The distributed example programs should be used in preference to the versions in the Library Manual wherever possible. The directory [INSTALL_DIR]/scripts contains a number of scripts which illustrate how to link the examples:

The scripts will provide you with a copy of an example program (and its data and options file, if any), compile the program and link it with the appropriate libraries (showing you the compile command so that you can recompile your own version of the program). Finally, the executable program will be run with appropriate arguments specifying data, options and results files as needed.

The example program concerned, the number of OpenMP threads to use on the host, and the number of OpenMP threads to use on the Xeon Phi coprocessor are specified by the arguments to the command, e.g.

nagsmp_example e04fcfe 4 180
will copy the example program and its data and options files (e04fcfe.f90, e04fcfe.d and e04fcfe.opt) into the current directory, compile the program and run it using 4 OpenMP threads on the host and 180 threads within any regions offloaded to the Xeon Phi coprocessor to produce the example program results in the file e04fcfe.r. The scripts for native execution additionally require the username and device name to use when running the example, e.g.
nagsmp_example_native e04fcfe 120 micusr mic0
will copy the example program and its data and options files (e04fcfe.f90, e04fcfe.d and e04fcfe.opt) into the current directory, compile the program and run it using 120 OpenMP threads on the device mic0 as user micusr to produce the example program results in the file e04fcfe.r.

3.5. Fortran Types and Interpretation of Bold Italicised Terms

The NAG Library and documentation use parameterized types for floating-point variables. Thus, the type

      REAL(KIND=nag_wp)
appears in documentation of all NAG Library for the Xeon Phi Coprocessor routines, where nag_wp is a Fortran KIND parameter. The value of nag_wp can be obtained by use of the nag_library module. We refer to the type nag_wp as the NAG Library "working precision" type, because most floating-point arguments and internal variables used in the library are of this type.

In addition, a small number of routines use the type

      REAL(KIND=nag_rp)
where nag_rp stands for "reduced precision type". Another type, not currently used in the library, is
      REAL(KIND=nag_hp)
for "higher precision type" or "additional precision type".

For correct use of these types, see almost any of the example programs distributed with the Library.

For this implementation, these types have the following meanings:

      REAL (kind=nag_rp)      means REAL (i.e. single precision)
      REAL (kind=nag_wp)      means DOUBLE PRECISION
      COMPLEX (kind=nag_rp)   means COMPLEX (i.e. single precision complex)
      COMPLEX (kind=nag_wp)   means double precision complex (e.g. COMPLEX*16)

In addition, the Manual has adopted a convention of using bold italics to distinguish some terms.

One important bold italicised term is machine precision, which denotes the relative precision to which DOUBLE PRECISION floating-point numbers are stored in the computer, e.g. in an implementation with approximately 16 decimal digits of precision, machine precision has a value of approximately 1.0D-16.

The precise value of machine precision is given by the routine X02AJF. Other routines in Chapter X02 return the values of other implementation-dependent constants, such as the overflow threshold, or the largest representable integer. Refer to the X02 Chapter Introduction for more details.

The bold italicised term block size is used only in Chapters F07 and F08. It denotes the block size used by block algorithms in these chapters. You only need to be aware of its value when it affects the amount of workspace to be supplied – see the parameters WORK and LWORK of the relevant routine documents and the Chapter Introduction.

3.6. Explicit Output from NAG Routines

Certain routines produce explicit error messages and advisory messages via output units which have default values that can be reset by using X04AAF for error messages and X04ABF for advisory messages. (The default values are given in Section 4.) These routines are potentially not thread safe and in general output is not recommended in a multithreaded environment.

4. Routine-specific Information

Any further information which applies to one or more routines in this implementation is listed below, chapter by chapter.
  1. C06

    In this implementation calls to the Intel Discrete Fourier Transforms Interface (DFTI) routines, from the supplied MKL library, are made whenever possible in the following NAG routines:
     C06PAF  C06PCF  C06PFF  C06PJF  C06PKF  C06PPF  C06PQF  C06PRF
     C06PSF  C06PUF  C06PXF  C06RAF  C06RBF  C06RCF  C06RDF
    
    The Intel DFTI routines allocate their own workspace internally, so no changes are needed to the size of workspace array WORK passed to the NAG C06 routines listed above from that specified in their respective library documents.
  2. F06, F07, F08 and F16

    In Chapters F06, F07, F08 and F16, alternate routine names are available for BLAS and LAPACK derived routines. For details of the alternate routine names please refer to the relevant Chapter Introduction. Note that applications should reference routines by their BLAS/LAPACK names, rather than their NAG-style names, for optimum performance.

    Many LAPACK routines have a "workspace query" mechanism which allows a caller to interrogate the routine to determine how much workspace to supply. Note that LAPACK routines from the MKL library may require a different amount of workspace from the equivalent NAG versions of these routines. Care should be taken when using the workspace query mechanism.

    In this implementation calls to BLAS and LAPACK routines are implemented by calls to MKL, except for the following routines:

    BLAS_DMAX_VAL    BLAS_DMIN_VAL
    DBDSDC    DCOPY     DGEES     DGEESX    DGEEV     DGEEVX    DGEQP3    DGERFS
    DGGES     DGGESX    DGGEV     DGGEVX    DHGEQZ    DHSEQR    DPOSVX    DSBTRD
    DSGESV    DSPOSV    DSTEVR    DSYSVX    ZCGESV    ZCPOSV    ZGEES     ZGEESX
    ZGEEV     ZGEEVX    ZGGES     ZGGESX    ZGGEVX    ZHESVX    ZHSEQR    ZPOSVX
    ZSYSVX
    

    The following NAG named routines are wrappers to call LAPACK routines from the vendor library:
    F01VAF/DTRTTP    F01VBF/ZTRTTP    F01VCF/DTPTTR    F01VDF/ZTPTTR
    F01VEF/DTRTTF    F01VFF/ZTRTTF    F01VGF/DTFTTR    F01VHF/ZTFTTR
    F01VJF/DTPTTF    F01VKF/ZTPTTF    F01VLF/DTFTTP    F01VMF/ZTFTTP
    F06AAF/DROTG     F06ABF/DROTMG    F06EAF/DDOT      F06ECF/DAXPY
    F06EDF/DSCAL     F06EGF/DSWAP     F06EJF/DNRM2     F06EKF/DASUM
    F06EPF/DROT      F06EQF/DROTM     F06ERF/DDOTI     F06ETF/DAXPYI
    F06EUF/DGTHR     F06EVF/DGTHRZ    F06EWF/DSCTR     F06EXF/DROTI
    F06GAF/ZDOTU     F06GBF/ZDOTC     F06GCF/ZAXPY     F06GDF/ZSCAL
    F06GFF/ZCOPY     F06GGF/ZSWAP     F06GRF/ZDOTUI    F06GSF/ZDOTCI
    F06GTF/ZAXPYI    F06GUF/ZGTHR     F06GVF/ZGTHRZ    F06GWF/ZSCTR
    F06HMF/ZROT      F06JDF/ZDSCAL    F06JJF/DZNRM2    F06JKF/DZASUM
    F06JLF/IDAMAX    F06JMF/IZAMAX    F06PAF/DGEMV     F06PBF/DGBMV
    F06PCF/DSYMV     F06PDF/DSBMV     F06PEF/DSPMV     F06PFF/DTRMV
    F06PGF/DTBMV     F06PHF/DTPMV     F06PJF/DTRSV     F06PKF/DTBSV
    F06PLF/DTPSV     F06PMF/DGER      F06PPF/DSYR      F06PQF/DSPR
    F06PRF/DSYR2     F06PSF/DSPR2     F06SAF/ZGEMV     F06SBF/ZGBMV
    F06SCF/ZHEMV     F06SDF/ZHBMV     F06SEF/ZHPMV     F06SFF/ZTRMV
    F06SGF/ZTBMV     F06SHF/ZTPMV     F06SJF/ZTRSV     F06SKF/ZTBSV
    F06SLF/ZTPSV     F06SMF/ZGERU     F06SNF/ZGERC     F06SPF/ZHER
    F06SQF/ZHPR      F06SRF/ZHER2     F06SSF/ZHPR2     F06WAF/DLANSF
    F06WBF/DTFSM     F06WCF/DSFRK     F06WNF/ZLANHF    F06WPF/ZTFSM
    F06WQF/ZHFRK     F06YAF/DGEMM     F06YCF/DSYMM     F06YFF/DTRMM
    F06YJF/DTRSM     F06YPF/DSYRK     F06YRF/DSYR2K    F06ZAF/ZGEMM
    F06ZCF/ZHEMM     F06ZFF/ZTRMM     F06ZJF/ZTRSM     F06ZPF/ZHERK
    F06ZRF/ZHER2K    F06ZTF/ZSYMM     F06ZUF/ZSYRK     F06ZWF/ZSYR2K
    F07AAF/DGESV     F07ABF/DGESVX    F07ADF/DGETRF    F07AEF/DGETRS
    F07AFF/DGEEQU    F07AGF/DGECON    F07AJF/DGETRI    F07ANF/ZGESV
    F07APF/ZGESVX    F07ARF/ZGETRF    F07ASF/ZGETRS    F07ATF/ZGEEQU
    F07AUF/ZGECON    F07AVF/ZGERFS    F07AWF/ZGETRI    F07BAF/DGBSV
    F07BBF/DGBSVX    F07BDF/DGBTRF    F07BEF/DGBTRS    F07BFF/DGBEQU
    F07BGF/DGBCON    F07BHF/DGBRFS    F07BNF/ZGBSV     F07BPF/ZGBSVX
    F07BRF/ZGBTRF    F07BSF/ZGBTRS    F07BTF/ZGBEQU    F07BUF/ZGBCON
    F07BVF/ZGBRFS    F07CAF/DGTSV     F07CBF/DGTSVX    F07CDF/DGTTRF
    F07CEF/DGTTRS    F07CGF/DGTCON    F07CHF/DGTRFS    F07CNF/ZGTSV
    F07CPF/ZGTSVX    F07CRF/ZGTTRF    F07CSF/ZGTTRS    F07CUF/ZGTCON
    F07CVF/ZGTRFS    F07FAF/DPOSV     F07FDF/DPOTRF    F07FEF/DPOTRS
    F07FFF/DPOEQU    F07FGF/DPOCON    F07FHF/DPORFS    F07FJF/DPOTRI
    F07FNF/ZPOSV     F07FRF/ZPOTRF    F07FSF/ZPOTRS    F07FTF/ZPOEQU
    F07FUF/ZPOCON    F07FVF/ZPORFS    F07FWF/ZPOTRI    F07GAF/DPPSV
    F07GBF/DPPSVX    F07GDF/DPPTRF    F07GEF/DPPTRS    F07GFF/DPPEQU
    F07GGF/DPPCON    F07GHF/DPPRFS    F07GJF/DPPTRI    F07GNF/ZPPSV
    F07GPF/ZPPSVX    F07GRF/ZPPTRF    F07GSF/ZPPTRS    F07GTF/ZPPEQU
    F07GUF/ZPPCON    F07GVF/ZPPRFS    F07GWF/ZPPTRI    F07HAF/DPBSV
    F07HBF/DPBSVX    F07HDF/DPBTRF    F07HEF/DPBTRS    F07HFF/DPBEQU
    F07HGF/DPBCON    F07HHF/DPBRFS    F07HNF/ZPBSV     F07HPF/ZPBSVX
    F07HRF/ZPBTRF    F07HSF/ZPBTRS    F07HTF/ZPBEQU    F07HUF/ZPBCON
    F07HVF/ZPBRFS    F07JAF/DPTSV     F07JBF/DPTSVX    F07JDF/DPTTRF
    F07JEF/DPTTRS    F07JGF/DPTCON    F07JHF/DPTRFS    F07JNF/ZPTSV
    F07JPF/ZPTSVX    F07JRF/ZPTTRF    F07JSF/ZPTTRS    F07JUF/ZPTCON
    F07JVF/ZPTRFS    F07KDF/DPSTRF    F07KRF/ZPSTRF    F07MAF/DSYSV
    F07MDF/DSYTRF    F07MEF/DSYTRS    F07MGF/DSYCON    F07MHF/DSYRFS
    F07MJF/DSYTRI    F07MNF/ZHESV     F07MRF/ZHETRF    F07MSF/ZHETRS
    F07MUF/ZHECON    F07MVF/ZHERFS    F07MWF/ZHETRI    F07NNF/ZSYSV
    F07NRF/ZSYTRF    F07NSF/ZSYTRS    F07NUF/ZSYCON    F07NVF/ZSYRFS
    F07NWF/ZSYTRI    F07PAF/DSPSV     F07PBF/DSPSVX    F07PDF/DSPTRF
    F07PEF/DSPTRS    F07PGF/DSPCON    F07PHF/DSPRFS    F07PJF/DSPTRI
    F07PNF/ZHPSV     F07PPF/ZHPSVX    F07PRF/ZHPTRF    F07PSF/ZHPTRS
    F07PUF/ZHPCON    F07PVF/ZHPRFS    F07PWF/ZHPTRI    F07QNF/ZSPSV
    F07QPF/ZSPSVX    F07QRF/ZSPTRF    F07QSF/ZSPTRS    F07QUF/ZSPCON
    F07QVF/ZSPRFS    F07QWF/ZSPTRI    F07TEF/DTRTRS    F07TGF/DTRCON
    F07THF/DTRRFS    F07TJF/DTRTRI    F07TSF/ZTRTRS    F07TUF/ZTRCON
    F07TVF/ZTRRFS    F07TWF/ZTRTRI    F07UEF/DTPTRS    F07UGF/DTPCON
    F07UHF/DTPRFS    F07UJF/DTPTRI    F07USF/ZTPTRS    F07UUF/ZTPCON
    F07UVF/ZTPRFS    F07UWF/ZTPTRI    F07VEF/DTBTRS    F07VGF/DTBCON
    F07VHF/DTBRFS    F07VSF/ZTBTRS    F07VUF/ZTBCON    F07VVF/ZTBRFS
    F07WDF/DPFTRF    F07WEF/DPFTRS    F07WJF/DPFTRI    F07WKF/DTFTRI
    F07WRF/ZPFTRF    F07WSF/ZPFTRS    F07WWF/ZPFTRI    F07WXF/ZTFTRI
    F08AAF/DGELS     F08AEF/DGEQRF    F08AFF/DORGQR    F08AGF/DORMQR
    F08AHF/DGELQF    F08AJF/DORGLQ    F08AKF/DORMLQ    F08ANF/ZGELS
    F08ASF/ZGEQRF    F08ATF/ZUNGQR    F08AUF/ZUNMQR    F08AVF/ZGELQF
    F08AWF/ZUNGLQ    F08AXF/ZUNMLQ    F08BAF/DGELSY    F08BEF/DGEQPF
    F08BHF/DTZRZF    F08BKF/DORMRZ    F08BNF/ZGELSY    F08BSF/ZGEQPF
    F08BTF/ZGEQP3    F08BVF/ZTZRZF    F08BXF/ZUNMRZ    F08CEF/DGEQLF
    F08CFF/DORGQL    F08CGF/DORMQL    F08CHF/DGERQF    F08CJF/DORGRQ
    F08CKF/DORMRQ    F08CSF/ZGEQLF    F08CTF/ZUNGQL    F08CUF/ZUNMQL
    F08CKF/DORMRQ    F08CSF/ZGEQLF    F08CTF/ZUNGQL    F08CUF/ZUNMQL
    F08CVF/ZGERQF    F08CWF/ZUNGRQ    F08CXF/ZUNMRQ    F08FAF/DSYEV
    F08FBF/DSYEVX    F08FCF/DSYEVD    F08FDF/DSYEVR    F08FEF/DSYTRD
    F08FFF/DORGTR    F08FGF/DORMTR    F08FLF/DDISNA    F08FNF/ZHEEV
    F08FPF/ZHEEVX    F08FQF/ZHEEVD    F08FRF/ZHEEVR    F08FSF/ZHETRD
    F08FTF/ZUNGTR    F08FUF/ZUNMTR    F08GAF/DSPEV     F08GBF/DSPEVX
    F08GCF/DSPEVD    F08GEF/DSPTRD    F08GFF/DOPGTR    F08GGF/DOPMTR
    F08GNF/ZHPEV     F08GPF/ZHPEVX    F08GQF/ZHPEVD    F08GSF/ZHPTRD
    F08GTF/ZUPGTR    F08GUF/ZUPMTR    F08HAF/DSBEV     F08HBF/DSBEVX
    F08HCF/DSBEVD    F08HNF/ZHBEV     F08HPF/ZHBEVX    F08HQF/ZHBEVD
    F08HSF/ZHBTRD    F08JAF/DSTEV     F08JBF/DSTEVX    F08JCF/DSTEVD
    F08JEF/DSTEQR    F08JFF/DSTERF    F08JGF/DPTEQR    F08JHF/DSTEDC
    F08JJF/DSTEBZ    F08JKF/DSTEIN    F08JLF/DSTEGR    F08JSF/ZSTEQR
    F08JUF/ZPTEQR    F08JVF/ZSTEDC    F08JXF/ZSTEIN    F08JYF/ZSTEGR
    F08KAF/DGELSS    F08KBF/DGESVD    F08KCF/DGELSD    F08KDF/DGESDD
    F08KEF/DGEBRD    F08KFF/DORGBR    F08KGF/DORMBR    F08KHF/DGEJSV
    F08KJF/DGESVJ    F08KNF/ZGELSS    F08KPF/ZGESVD    F08KQF/ZGELSD
    F08KRF/ZGESDD    F08KSF/ZGEBRD    F08KTF/ZUNGBR    F08KUF/ZUNMBR
    F08LEF/DGBBRD    F08LSF/ZGBBRD    F08MEF/DBDSQR    F08MSF/ZBDSQR
    F08NEF/DGEHRD    F08NFF/DORGHR    F08NGF/DORMHR    F08NHF/DGEBAL
    F08NJF/DGEBAK    F08NSF/ZGEHRD    F08NTF/ZUNGHR    F08NUF/ZUNMHR
    F08NVF/ZGEBAL    F08NWF/ZGEBAK    F08PKF/DHSEIN    F08PXF/ZHSEIN
    F08QFF/DTREXC    F08QGF/DTRSEN    F08QHF/DTRSYL    F08QKF/DTREVC
    F08QLF/DTRSNA    F08QTF/ZTREXC    F08QUF/ZTRSEN    F08QVF/ZTRSYL
    F08QXF/ZTREVC    F08QYF/ZTRSNA    F08SAF/DSYGV     F08SBF/DSYGVX
    F08SCF/DSYGVD    F08SEF/DSYGST    F08SNF/ZHEGV     F08SPF/ZHEGVX
    F08SQF/ZHEGVD    F08SSF/ZHEGST    F08TAF/DSPGV     F08TBF/DSPGVX
    F08TCF/DSPGVD    F08TEF/DSPGST    F08TNF/ZHPGV     F08TPF/ZHPGVX
    F08TQF/ZHPGVD    F08TSF/ZHPGST    F08UAF/DSBGV     F08UBF/DSBGVX
    F08UCF/DSBGVD    F08UEF/DSBGST    F08UFF/DPBSTF    F08UNF/ZHBGV
    F08UPF/ZHBGVX    F08UQF/ZHBGVD    F08USF/ZHBGST    F08UTF/ZPBSTF
    F08VAF/DGGSVD    F08VEF/DGGSVP    F08VNF/ZGGSVD    F08VSF/ZGGSVP
    F08WEF/DGGHRD    F08WHF/DGGBAL    F08WJF/DGGBAK    F08WNF/ZGGEV
    F08WSF/ZGGHRD    F08WVF/ZGGBAL    F08WWF/ZGGBAK    F08XSF/ZHGEQZ
    F08YEF/DTGSJA    F08YFF/DTGEXC    F08YGF/DTGSEN    F08YHF/DTGSYL
    F08YKF/DTGEVC    F08YLF/DTGSNA    F08YSF/ZTGSJA    F08YTF/ZTGEXC
    F08YUF/ZTGSEN    F08YVF/ZTGSYL    F08YXF/ZTGEVC    F08YYF/ZTGSNA
    F08ZAF/DGGLSE    F08ZBF/DGGGLM    F08ZEF/DGGQRF    F08ZFF/DGGRQF
    F08ZNF/ZGGLSE    F08ZPF/ZGGGLM    F08ZSF/ZGGQRF    F08ZTF/ZGGRQF
    
  3. G02

    The value of ACC, the machine-dependent constant mentioned in several documents in the chapter, is 1.0D-13.
  4. P01

    On hard failure, P01ABF writes the error message to the error message unit specified by X04AAF and then stops.
  5. S07 - S21

    Functions in these Chapters will give error messages if called with illegal or unsafe arguments.

    The constants referred to in the Library Manual have the following values in this implementation, on both the Intel64 and MIC architectures:

    S07AAF  F_1 = 1.0E+13
            F_2 = 1.0E-14
    
    S10AAF  E_1 = 1.8715E+1
    S10ABF  E_1 = 7.080E+2
    S10ACF  E_1 = 7.080E+2
    
    S13AAF  x_hi = 7.083E+2
    S13ACF  x_hi = 1.0E+16
    S13ADF  x_hi = 1.0E+17
    
    S14AAF  IFAIL = 1 if X > 1.70E+2
            IFAIL = 2 if X < -1.70E+2
            IFAIL = 3 if abs(X) < 2.23E-308
    S14ABF  IFAIL = 2 if X > x_big = 2.55E+305
    
    S15ADF  x_hi = 2.65E+1
    S15AEF  x_hi = 2.65E+1
    S15AFF  underflow trap was necessary
    S15AGF  IFAIL = 1 if X >= 2.53E+307
            IFAIL = 2 if 4.74E+7 <= X < 2.53E+307
            IFAIL = 3 if X < -2.66E+1
    
    S17ACF  IFAIL = 1 if X > 1.0E+16
    S17ADF  IFAIL = 1 if X > 1.0E+16
            IFAIL = 3 if 0 < X <= 2.23E-308
    S17AEF  IFAIL = 1 if abs(X) > 1.0E+16
    S17AFF  IFAIL = 1 if abs(X) > 1.0E+16
    S17AGF  IFAIL = 1 if X > 1.038E+2
            IFAIL = 2 if X < -5.7E+10
    S17AHF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -5.7E+10
    S17AJF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -1.9E+9
    S17AKF  IFAIL = 1 if X > 1.041E+2
            IFAIL = 2 if X < -1.9E+9
    S17DCF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    S17DEF  IFAIL = 2 if Im(Z) > 7.00921E+2
            IFAIL = 3 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 4 if abs(Z) or FNU+N-1 > 1.07374E+9
    S17DGF  IFAIL = 3 if abs(Z) > 1.02399E+3
            IFAIL = 4 if abs(Z) > 1.04857E+6
    S17DHF  IFAIL = 3 if abs(Z) > 1.02399E+3
            IFAIL = 4 if abs(Z) > 1.04857E+6
    S17DLF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    
    S18ADF  IFAIL = 2 if 0 < X <= 2.23E-308
    S18AEF  IFAIL = 1 if abs(X) > 7.116E+2
    S18AFF  IFAIL = 1 if abs(X) > 7.116E+2
    S18DCF  IFAIL = 2 if abs(Z) < 3.92223E-305
            IFAIL = 4 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 5 if abs(Z) or FNU+N-1 > 1.07374E+9
    S18DEF  IFAIL = 2 if Re(Z) > 7.00921E+2
            IFAIL = 3 if abs(Z) or FNU+N-1 > 3.27679E+4
            IFAIL = 4 if abs(Z) or FNU+N-1 > 1.07374E+9
    
    S19AAF  IFAIL = 1 if abs(X) >= 5.04818E+1
    S19ABF  IFAIL = 1 if abs(X) >= 5.04818E+1
    S19ACF  IFAIL = 1 if X > 9.9726E+2
    S19ADF  IFAIL = 1 if X > 9.9726E+2
    
    S21BCF  IFAIL = 3 if an argument < 1.583E-205
            IFAIL = 4 if an argument >= 3.765E+202
    S21BDF  IFAIL = 3 if an argument < 2.813E-103
            IFAIL = 4 if an argument >= 1.407E+102
    
  6. X01

    The values of the mathematical constants, on both the Intel64 and MIC architectures, are:

    X01AAF (pi) = 3.1415926535897932
    X01ABF (gamma) = 0.5772156649015328
    
  7. X02

    The values of the machine constants, on both the Intel64 and MIC architectures, are:

    The basic parameters of the model on both the Intel64 and MIC architectures

    X02BHF   = 2
    X02BJF   = 53
    X02BKF   = -1021
    X02BLF   = 1024
    X02DJF   = .TRUE.
    

    Derived parameters of the floating-point arithmetic

    X02AJF = 1.11022302462516E-16 X02AKF = 2.22507385850721E-308 X02ALF = 1.79769313486231E+308 X02AMF = 2.22507385850721E-308 X02ANF = 4.45014771701441E-308

    Parameters of other aspects of the computing environment

    X02AHF = 1.42724769270596E+45 X02BBF = 2147483647 X02BEF = 15 X02DAF = .TRUE.
  8. X04

    The default output units for error and advisory messages for those routines which can produce explicit output are both Fortran Unit 6.

5. Documentation

The Library Manual is available as part of the installation or via download from the NAG website. The most up-to-date version of the documentation is accessible via the NAG website at http://www.nag.co.uk/numeric/FL/xeon_phi_documentation.

The Library Manual is supplied in the following format:

The following main index files have been provided for this format:

	???/xhtml/FRONTMATTER/manconts.xml
.

In addition the following are provided:

Please see the Intel web site for further information about MKL (http://www.intel.com/software/products/mkl).

6. Support from NAG

(a) Contact with NAG

Queries concerning this document or the implementation generally should be directed to NAG at one of the addresses given in the Appendix. Users subscribing to the support service are encouraged to contact one of the NAG Response Centres (see below).

(b) NAG Response Centres

The NAG Response Centres are available for general enquiries from all users and also for technical queries from sites with an annually licensed product or support service.

The Response Centres are open during office hours, but contact is possible by fax, email and phone (answering machine) at all times.

When contacting a Response Centre, it helps us deal with your enquiry quickly if you can quote your NAG site reference or account number and NAG product code (in this case FSLM623DCL).

(c) NAG Websites

The NAG websites provide information about implementation availability, descriptions of products, downloadable software, product documentation and technical reports. The NAG websites can be accessed at the following URLs:

http://www.nag.co.uk/, http://www.nag.com/, http://www.nag-j.co.jp/ or http://www.nag-gc.com/

(d) NAG Electronic Newsletter

If you would like to be kept up to date with news from NAG then please register to receive our free electronic newsletter, which will alert you to announcements about new products or product/service enhancements, technical tips, customer stories and NAG's event diary. You can register via one of our websites, or by contacting us at nagnews@nag.co.uk.

(e) Product Registration

To ensure that you receive information on updates and other relevant announcements, please register this product with us. For NAG Library products this may be accomplished by filling in the online registration form at http://www.nag.co.uk/numeric/Library_Registration.asp.

7. User Feedback

Many factors influence the way that NAG's products and services evolve, and your ideas are invaluable in helping us to ensure that we meet your needs. If you would like to contribute to this process, we would be delighted to receive your comments. Please contact any of the NAG Response Centres (shown below).

Appendix - Contact Addresses

NAG Ltd
Wilkinson House
Jordan Hill Road
OXFORD  OX2 8DR                         NAG Ltd Response Centre
United Kingdom                          email: support@nag.co.uk

Tel: +44 (0)1865 511245                 Tel: +44 (0)1865 311744
Fax: +44 (0)1865 310139                 Fax: +44 (0)1865 310139

NAG Inc
801 Warrenville Road
Suite 185
Lisle, IL  60532-4332                   NAG Inc Response Center
USA                                     email: support@nag.com

Tel: +1 630 971 2337                    Tel: +1 630 971 2337
Fax: +1 630 971 2706                    Fax: +1 630 971 2706

Nihon NAG KK
Hatchobori Frontier Building 2F
4-9-9
Hatchobori
Chuo-ku
Tokyo 104-0032                          Nihon NAG Response Centre
Japan                                   email: support@nag-j.co.jp

Tel: +81 3 5542 6311                    Tel: +81 3 5542 6311
Fax: +81 3 5542 6312                    Fax: +81 3 5542 6312

NAG Taiwan Branch Office
5F.-5, No.36, Sec.3
Minsheng E. Rd.
Taipei City 10480                       NAG Taiwan Response Centre
Taiwan                                  email: support@nag-gc.com

Tel: +886 2 25093288                    Tel: +886 2 25093288
Fax: +886 2 25091798                    Fax: +886 2 25091798