Parallelization by means of K200 HLO
HERA-B Bologna
Computing System
Parallelization by means of K200 High-Level Optimizer (HLO)
The new generation of HP9000 compilers (C, C++ and FORTRAN), born together
with 10.0 HP-UX release)
is characterized by a common back end that consists of the High-Level
Optimizer (HLO), code generator, optimizer and linker.
HLO targets large-scale, high-semantic-level analyses
and transformations that are more
efficiently done before the Back End converts the program representation to a
lower level, more machine-specific form.
The HLO is the strategic foundation for FORTRAN/9000, C/ANSI C and
C++ runtime performance. HP continue to place a significant emphasis on
improving runtime performance through the HLO and with this release has
obtained a performance level sufficient to replace old FORTRAN
optimizing preprocessor (FTNOPP).
Furthermore, the HLO support automatic loop parallelism for HP
Symmetric MultiProcessor (SMP) machines, like K-series servers.
Optimizations performed by HLO
- Inlining, i.e. substitution of a call site by the called routine's
code.
- Cloning. A call site is replaced by a call to a specialized
version of the original caller.
- Interprocedural Reference Analysis. HLO can determine that a
variable may never have its address exposed, or is never read or written.
This information is passed to th Back End.
- Loop Parallelization. HLO will automatically transform a loop to
execute in parallel on a multiprocessor machine, if it can determine that
the transformation is legal and profitable.
- Cache-Enhancing Transformations. Interchange loops in a nest to
bring the stride-one reference innermost, thus exploiting the spatial
locality of long cache lines and minimizing cache conflicts in
direct-mapped caches.
- Vectorization, i.e. the replacement of eligible loops by calls to
the vector library. HLO can recognize certain linear algebra kernel loops and
convert them to calls to the BLAS (Basic Linear Algebra Subroutine) library.
- Unconsumed Expression Elimination. Instruction that do not
contribute to the computation, because their results are not consumed, can be
deleted. These instruction may arise from other optimization, such as
inlining.
- Constant Propagation and Unreachable Code Elimination . An
instruction that computes a compile-time known result can be replaced by that
constant.
- Code Sinking. HLO detect instructions that are only consumed by one
branch of a following IF-statement, and moves the instruction into
that branch, if it safe to do so.
Compiler Options and Directives
Parallelization may be either controlled by the compiler and linker options:
+O[no]parallel
+O[no]parallel_env
+O[no]vectorize
or by compiler directives, inserted in the source code, such as, in FORTRAN:
- Convex style:
C$DIR FORCE_PARALLEL
C$DIR NO_PARALLEL
C$DIR NO_VECTOR
C$DIR PREFER_PARALLEL
C$DIR PREFER_VECTOR
- Cray style:
CDIR$ IVDEP
CDIR$ NO SIDE EFFECTS
CDIR$ [NO]CONCUR
CDIR$ [NO]VECTOR
CFPP$ NODEPCHK
- KAP style:
C*$* [NO]CONCURRENTIZE
C*$* [NO]VECTORIZE
- VAST style:
CVD$ [NO]CONCUR
CVD$ NODEPCHK
CVD$ [NO]VECTOR
Compiling for Parallel Execution
The following command lines compile (without linking) three source files: x.f,
y.f, and z.f. The files x.f and y.f are compiled for parallel execution. The
file z.f is compiled for serial execution, even though its object file will be
linked with x.o and y.o.
f77 +O4 +Oparallel -c x.f y.f
f77 +O4 +Oparallel_env -c z.f
The following command line links the three object files, producing the
executable file para_prog:
f77 +O4 +Oparallel -o para_prog x.o y.o z.o
As this command line implies, if you link and compile separately, you must use
f77, not ld. The command line to link must also include the +Oparallel option
in order to link in the right startup files and runtime support.
f77 +O4 +Oparallel -o para_prog x.o y.o z.o
N.B.: Parallelization does not require message-passage
instructions in source code, like in MPP (Massive Parallel Processing)
computers or PVM (Parallel Virtual Machine) scheme.
For more informations:
HERA-B Bologna Home Page
February 23 , 1995 Domenico Galli