|
PIPELINE
- Single-issue pipeline, example of ARM7 and ARM9
- Example: studying how instructions are propagating through the pipeline of
ARM7 and ARM9 clock per clock
- Superscalar operation, decode and dispatch unit, example of ARM11
- Example : studying how instructions are propagating through the pipeline of
ARM11 clock per clock
- Out-of-order execution
- Register renaming, scoreboarding, implementation of Tomasulo algorithm
- Reservation stations used when register interlock is detected
- Completion queue, in-order completion
- Different types of synchronization
- Example : studying how instructions are propagating through the pipeline of 750
PowerPCs
- Issue queues used to decouple the execute units from the dispatcher
- Example : studying how instructions are propagating through the pipeline of e600
PowerPC
- Hyperthreading, sharing some resources between logical cores
- Example : studying how instructions are propagating through the pipeline of
Pentium IV
- Branch penalties
- Branch folding
- Mechanisms used to improve the performance of unconditional branches : Branch
Target Buffer
- Mechanisms used to improve the performance of conditional branches :
static prediction vs. dynamic prediction
- Locking entries in BTB
- Taking into account instruction cache line alignment
- Example : studying how the ARM11 core processes branches
- Example : studying how the e600 core processes branches
- Example : studying how the G5 PPC core processes branches
MMU
- Link stack : accelerating function return sequence
- Purposes : assigning attributes to memory ranges, assigning access rights to
memory ranges, managing a virtual space and protecting pages
- Clarifying how the virtual address is translated into a physical address
- The Fast Context Switching Extension mechanism
- Example : studying the ARM926 MMU
- Example : studying the new features of ARM11 MMU
- Example : studying the PowerPC MMU
- TLB organization
- Unified TLB and separate micro-TLBs
- Software TLB reload vs. hardware TLB reload
- TLB synchronization instruction when several cores are sharing the same page
descriptor table
|
|
CACHES
- Cache organizations : direct-mapped, fully associative, N-way set associative
Replacement algorithms, MEI vs. MESI state machines
- Write policies : write-through vs. copy-back Cache locking
- Basic cache operations : clean, flush, invalidate
- Software coherency to share data between core and external masters
- Example : studying the level 1 caches of ARM11 and the related CP15 instructions
- Example : studying the level 1 caches of e600
- Level 2 cache, understanding the data and instruction flow between external
memory, level 2 cache and level 1 caches
- Example : studying the L220 IP from ARM
- Hardware coherency : the snooper
- Basic snoop requests : CLEAN, FLUSH, KILL
- Cache-to-cache data transfers Example : studying how two e600s share cache
enabled data
MEMORY SUBSYSTEM
- Understanding Hit-Under-Miss and Miss- Under-Miss
- Decoupling the core internal operation from the bus interface
- Write queues
- Load Miss queue
- Store Miss merging
- Store Miss gathering Memory barrier instructions
- Example : ARM DSB and DMB instructions
EXCEPTIONS MECHANISM
- Basic ARM exception handling
- Relationship between modes and register banks
- Interrupts nesting, requirement of system mode
- Primecell VICs
- Reducing interrupt latency through automatic vector generation
- VIC basic signal timing
- Connectivity : daisy-chained
- VIC Interrupt priority and masking
MULTI-CORE SYSTEMS
Multiprocessing types : SMP vs. AMP
Hardware requirements to support SMP : multi-core interrupt controller
Inter-Processor Interrupts
Example : IA32 Local APIC and IO APIC
Defining which resources are shared and which resources are private
System start sequence, assigning an ID to each core
Multi-processor synchronization, management of Boolean semaphores
ARM SWP instructions ARM11 new mechanism, exclusive load and store instructions
Debugging a multi-core application
|