Advanced Processor Architecture,

Advanced Processor Architecture

Roll over the picture, click to view some demo slides

Course Number	554
Price NIS before VAT / Tcs
Duration (Days)	4
Language	English
Level	Advanced

Jan	Feb	Mar	Apr	May	Jun
					23-26

Jul	Aug	Sep	Oct	Nov	Dec
					14-17

		+972 3 9247780 ext. 207
		Tell about this course to a friend

Available training formats

This course aims to explain the operation of high-end processor cores through implementation examples. Consequently this course provides guidelines to write low level efficient code and also could be useful for processor designers.

PIPELINE

Single-issue pipeline, example of ARM7 and ARM9
Example: studying how instructions are propagating through the pipeline of ARM7 and ARM9 clock per clock
Superscalar operation, decode and dispatch unit, example of ARM11
Example : studying how instructions are propagating through the pipeline of ARM11 clock per clock
Out-of-order execution
Register renaming, scoreboarding, implementation of Tomasulo algorithm
Reservation stations used when register interlock is detected
Completion queue, in-order completion
Different types of synchronization
Example : studying how instructions are propagating through the pipeline of 750 PowerPCs
Issue queues used to decouple the execute units from the dispatcher
Example : studying how instructions are propagating through the pipeline of e600 PowerPC
Hyperthreading, sharing some resources between logical cores
Example : studying how instructions are propagating through the pipeline of Pentium IV
Branch penalties
Branch folding
Mechanisms used to improve the performance of unconditional branches : Branch Target Buffer
Mechanisms used to improve the performance of conditional branches : static prediction vs. dynamic prediction
Locking entries in BTB
Taking into account instruction cache line alignment
Example : studying how the ARM11 core processes branches
Example : studying how the e600 core processes branches
Example : studying how the G5 PPC core processes branches

MMU

Link stack : accelerating function return sequence
Purposes : assigning attributes to memory ranges, assigning access rights to memory ranges, managing a virtual space and protecting pages
Clarifying how the virtual address is translated into a physical address
The Fast Context Switching Extension mechanism
Example : studying the ARM926 MMU
Example : studying the new features of ARM11 MMU
Example : studying the PowerPC MMU
TLB organization
Unified TLB and separate micro-TLBs
Software TLB reload vs. hardware TLB reload
TLB synchronization instruction when several cores are sharing the same page descriptor table

CACHES

Cache organizations : direct-mapped, fully associative, N-way set associative Replacement algorithms, MEI vs. MESI state machines
Write policies : write-through vs. copy-back Cache locking
Basic cache operations : clean, flush, invalidate
Software coherency to share data between core and external masters
Example : studying the level 1 caches of ARM11 and the related CP15 instructions
Example : studying the level 1 caches of e600
Level 2 cache, understanding the data and instruction flow between external memory, level 2 cache and level 1 caches
Example : studying the L220 IP from ARM
Hardware coherency : the snooper
Basic snoop requests : CLEAN, FLUSH, KILL
Cache-to-cache data transfers Example : studying how two e600s share cache enabled data