Опубликован: 12.07.2012 | Доступ: свободный | Студентов: 355 / 24 | Оценка: 4.00 / 4.20 | Длительность: 11:07:00
Специальности: Программист
Лекция 1:

Introduction to application optimizations with usage of Intel® performance tools

Лекция 1: 123 || Лекция 2 >
Аннотация: At the first lecture common Intel microprocessor architecture and the main factors affecting its performance are described. The simplified microprocessor model is used to show the subsystems role and describe the main features such as multi-level memory model, common and vector registers, data prefetching mechanism, branch prediction, pipeline and superscalar features, vector instructions, multi-core, multi-processor. Performance optimization compiler role is also described.
Ключевые слова: performance improvement, performance analysis, theoretical, investigation, problem area, application performance, microprocessor, control flow graph, flow analysis, permute, control unit, logical unit, ALU, FSB, peripheral device, performance data, perform, arithmetic, AND, logic operation, system, memory, CPU, used, temporary storage, information, processor, system bus, data transfer, AS, performance, amount, computational, CAN, TIME, 'speak', with, EM64T, clock frequency, instruction set, internal memory, pipelining, prefetcher, superscalar, Timer, synchronization, periodic, sync, ITS, frequency, clock speed, extended memory, Basic, cover, ALL, CISC, RISC, complex, Computing, hybrid, access time, available, register spilling, exchange, application, technology, additional, access speed, speed, characteristic, response time, latency, memory unit, reduce, FETCH, VIA, cache, memory location, memory block, if, cached memory, hit, data acquisition, miss, this, CASE, block, read, ONE, BUS, filling, size, Line, associative cache, associative access, quality, computer architecture, hierarchy, memory latency, proactive, access mechanism, hardware, history, detect, Data, special, SET OF, induce, load, specified, software, locality, reuse, difference, temporal locality, spatial locality, new, engine, define, pattern, calculation, problem, aliasing, bad, replacement, useful, instruction fetch, instruction decoder, successive, execution, typical, instruction execution, memory address, throughput, previous, level, Instruction, parallelism, control logic, dependence, control, flow dependence, efficiency, conditional branch, flow, CONDITION, Calculated, branch predictor, predictor, processing, storage, assumption, correct, proper, branch, misprediction, dynamic, carry, target, prediction, unconditional jump, superscalar processor, multiplication operator, PER, clock cycle, technique, sequential, instruction stream, data dependency, execution unit, diversity, vector, operation, vector register, fixed, length, array, scalar, pair, vector representation, compute, x86, family, SSE, SIMD, BIT, real, advanced, maximum, approach, change, SEQUENCE, out-of-order, implementation, architecture, energy, support, atom, scheduling, multitasking, method, WHERE, share, hardware support, execute, divide, MIX, improve, core, multiprocessor, multi-core, branch prediction, vector instruction, criteria, determine, computer system, training, maintenance, standard, Compare, computer, measure, running, shared memory, compiler, entire, source program, machine code, language

The objectives of this course

The presentation can be downloaded here.

Get a basic understanding of:

  • the main factors of the processor performance,
  • base performance improvement techniques,
  • Intel® tools for performance analysis,
  • main options and components of the Intel compiler,
  • theoretical foundations of some performance optimizations.

You will be able to

  • describe the main problems of the processor performance;
  • investigate the application using the VTune ™ Performance Analyzer and find problem areas;
  • identify the main problems of an application analyzed;
  • develop a strategy to improve application performance;
  • describe the main components of the compiler and its functions;
  • control the level of optimization with command line options.

Course plan

  • Intel microprocessor architecture and main factors affecting processor performance;
  • VTune Performance Analyzer usage;
  • The role of the compiler in improving application performance;
  • Some theoretical concepts. Control flow graph, data-flow analysis;
  • Permutation optimizations and their applicability. Dependencies;
  • Vectorization;
  • Parallelization using OMP directives and auto parallelization;
  • The main components of the compiler, their tasks and interconnection.

Intel microprocessor architecture and the main factors affecting the processor performance.

Simplified processor model

Рис. 1.1. Simplified processor model

Simplified processor model

  • Control Unit, CU
  • Arithmetic and Logic Unit, ALU
  • System registers
  • Front Side Bus, FSB
  • Memory
  • Peripheral devices

Control Unit (CU):

  • decodes instructions received from the memory;
  • controls ALU;
  • performs data transfer between the CPU registers, memory, peripheral devices.

ALU consists of different parts, allowing to perform arithmetic and logical operations on the system registers.

System registers - a piece of memory inside the CPU that is used for temporary storage of an information processed by the processor.

A system bus is used for data transfer between the CPU and memory, as well as between the CPU and peripherals.

High performance is one of the key factors in the competition of the computer systems manufacturer

Processor performance is directly related to the amount of computational work that can be processed at a time.

Roughly speaking:

Performance = Number of instructions / Time

We'll talk about performance on the basis of IA32 and IA32E architectures (IA32 with EM64T).

Factors affecting the processor performance:

  • CPU clock frequency;
  • Accessible memory amount and speed;
  • The performance of the instructions and completeness of the instruction set;
  • The internal memory registers usage;
  • The quality of pipelining;
  • The quality of prediction;
  • The quality of the prefetching;
  • Superscalarity;
  • The quality of vectorization;
  • Parallelization and multicore.

Clock rate

Because the processor is made of different components, working with different speeds, there is a processor timer which is providing the synchronization by sending periodic sync. Its frequency is called the clock speed of the processor.

Memory speed and amount

  • 8086 - 1 MB of memory.
  • 80 286 - A new system registers, and a new mode of memory - 16MB of memory.
  • 80 386 - the first 32-bit processor - 4GB
  • Technology EM64T (Extended Memory 64 Technology) - ~ 264B

The performance of the instructions and completeness of the instruction set

Performance depends on how well the instructions are implemented, how well the basic instruction set covers all possible tasks.

CISC, RISC (complex, reduced instruction set computing)

Modern Intel processors are a hybrid of CISC and RISC; before executing a processor converts CISC instructions into simpler RISC instruction set.

Лекция 1: 123 || Лекция 2 >