Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 9:

Optimizing compiler. Static and dynamic profiler. Memory manager. Code generator

Аннотация: This lecture describes the roles of the profiler, memory manager and code generator. Principles of dynamic profiler usage. The difference between static and dynamic profiling. Memory distribution and allocation issues. Memory manager affection on application performance. Register allocation and instruction scheduling.
Ключевые слова: presentation, CAN, statement, execution, probability, with, control flow graph, example, subexpression, this, optimization, enlargement, routine, temporary variable, store, result, calculation, CASE, usage, basic block, NOT, back, argument, appropriate, loop invariant, information, estimation, reordering, substitutability, efficiency, parallelizing compiler, loop iteration, optimizing compiler, application, event, hint, used, additional, compiler, transfer, profiler, analysis, code performance, execution profile, conditional jump, AND, base, frequency, Calculated, call graph, source code analysis, weight, characteristic, general, input, executable program, compilation time, Data, perform, interprocedural optimization, dynamic, statistics, FROM, instrumentation, RAN, set, build, effective, let, CHECK, if, able, vectorization, remark, inefficiency, parallelization, insufficient, computational, memory manager, part, processing, memory, typical, situational, WHERE, dynamic memory allocation, free memory, overhead, unpredictable, garbage collection, memory block, memory fragmentation, important, performance, placement, linked list, linear array, sequential, e-cash, traversal, reason, array, hit, method, memory allocation, Object, size, alternative, AS, improve, dynamic allocation, ONE, MOST, standard, library, STL, flexibility, priority, expansion, container, step, constructor, define, initial, amount, Copy, ITS, VIA, copy constructor, memory pool, code generator, compilation process, correct, internal representation, SEQUENCE OF, architecture, apply, machine-dependent, intermediate representation, Basic, Actions, intrinsics, memory alignment, parameter passing, local variable, instruction scheduling, register allocation, mapping, microprocessor, register set, Local, entire, global, available, physical, Register, exchange, load, choose, hold, hard, determine, problem, loss, register spilling, interference, graph, implementation, vertex, very, useful, dependency, issue, previous, prove, validity, permute, optimized code, generator, reusable, unnecessary, Write, NEXT, iteration, computer, parallelism, level, reduce, processor, pipelining, attempt, memory subsystem, read, FAR, Instruction, planning, distribution, proactive, incoming, execution unit, simplify, regulation, block, carry, control, flow dependence, data dependency, demonstration, instruction set, change, Assembler

The presentation can be downloaded here.


Рис. 9.1.

Determining the optimization profitability

Profitability of intraprocedural optimizations depends on the statement execution probability. It closely relates with control flow graph behavior.

Example for common subexpressions elimination.

   z=x*y;
   if(hardly_ever) { 
      t=x*y;
   } 

This optimization has the disadvantage, it enlarges routine stack because it creates temporary variable to store the result of repeated calculation. In the case when usage of this result is happened inside infrequent basic block the optimization can not be paid back.

A similar argument is appropriate for loop invariant hoisting.

  for(i=0;i<n;i++) {
      …
      if(hardly_ever) { 
          … = x*y;  }
   }

A lot of optimizations need an information on probability of different events for more precise optimization profitability estimation:

  • For intraprocedural optimization "field reordering" it is important to detect which fields are used together "frequently".
  • For inlining it is unprofitable to substitute a routine to a call site which is "rarely" used.
  • For partial inlining compiler need to detect "hot" parts of the code inside the inline candidate routine.
  • For vectorization it is unprofitable to vectorize loops with "small" iteration count.
  • For efficient auto-parallelization compiler need to estimate amount of work which is performed on loop iteration.
  • And so on …

Thus optimizing compiler need methods for application event estimation.

There are small hints which can be used to provide the additional information to compiler. For example, builtin_expect is designed to transfer the compiler information about the probability of branching

if(x)  =>  if(__builtin_expect(x,1))

Static profiler

Static profiler performs a static program analysis. It is analysis of application source code performed without the application execution. Profiler calculates the probability of conditional jumps and the base blocks execution fequency. Routine execution frequency is calculated during the call graph analysis.

Source code analysis can not provide an accurate calculation of the weight (execution frequency) characteristics. In general, the input of the executable program it is not known, the compilation time is limited. Nevertheless, the data obtained using the static profiler is used to perform various interprocedural optimizations.

Dynamic profiler

Dynamic profiler calculates weights based on the analysis of statistics collected by an instrumented application during execution. To obtain benefits from dynamic profiler an application should be built with instrumentation. The instrumented application should be ran with a set of common data. The final build will use statistics collected during execution for more effective optimizations.

  • /Qprof-gen[:keyword]
    • instrument program for profiling.
    • Optional keyword may be srcpos or globdata
  • /Qprof-use[:<arg>]
    • enable use of profiling information during optimization
    • weighted - invokes profmerge with -weighted option to scale data based on run durations
    • [no]merge - enable(default)/disable the invocation of the profmerge tool