Skip Navigation Links
 
 

Advanced Processor Architecture

 
Roll over the picture, click to view some demo slides
Course Number554
Price NIS before VAT / Tcs 
Duration (Days)4
LanguageEnglish
LevelAdvanced
JanFebMarAprMayJun
     23-26
 
JulAugSepOctNovDec
     14-17
+972 3 9247780 ext. 207
Tell about this course to a friend
   
 
Available training formats
 
 

This course aims to explain the operation of high-end processor cores through implementation examples. Consequently this course provides guidelines to write low level efficient code and also could be useful for processor designers.

 

PIPELINE

  • Single-issue pipeline, example of ARM7 and ARM9
  •  Example: studying how instructions are propagating through the pipeline of ARM7 and ARM9 clock per clock
  •  Superscalar operation, decode and dispatch unit, example of ARM11
  • Example : studying how instructions are propagating through the pipeline of ARM11 clock per clock
  • Out-of-order execution
  • Register renaming, scoreboarding, implementation of Tomasulo algorithm
  • Reservation stations used when register interlock is detected
  • Completion queue, in-order completion
  • Different types of synchronization
  • Example : studying how instructions are propagating through the pipeline of 750 PowerPCs
  • Issue queues used to decouple the execute units from the dispatcher
  • Example : studying how instructions are propagating through the pipeline of e600 PowerPC
  • Hyperthreading, sharing some resources between logical cores
  • Example : studying how instructions are propagating through the pipeline of Pentium IV
  • Branch penalties
  • Branch folding
  • Mechanisms used to improve the performance of unconditional branches : Branch Target Buffer
  •  Mechanisms used to improve the performance of conditional branches : static prediction vs. dynamic prediction
  • Locking entries in BTB
  • Taking into account instruction cache line alignment
  •  Example : studying how the ARM11 core processes branches
  • Example : studying how the e600 core processes branches
  • Example : studying how the G5 PPC core processes branches
MMU
  • Link stack : accelerating function return sequence
  • Purposes : assigning attributes to memory ranges, assigning access rights to memory ranges, managing a virtual space and protecting pages
  • Clarifying how the virtual address is translated into a physical address
  • The Fast Context Switching Extension mechanism
  •  Example : studying the ARM926 MMU
  • Example : studying the new features of ARM11 MMU
  • Example : studying the PowerPC MMU
  • TLB organization
  • Unified TLB and separate micro-TLBs
  • Software TLB reload vs. hardware TLB reload
  • TLB synchronization instruction when several cores are sharing the same page descriptor table
 

CACHES

  • Cache organizations : direct-mapped, fully associative, N-way set associative Replacement algorithms, MEI vs. MESI state machines
  • Write policies : write-through vs. copy-back Cache locking
  • Basic cache operations : clean, flush, invalidate
  • Software coherency to share data between core and external masters
  • Example : studying the level 1 caches of ARM11 and the related CP15 instructions
  • Example : studying the level 1 caches of e600
  • Level 2 cache, understanding the data and instruction flow between external memory, level 2 cache and level 1 caches
  • Example : studying the L220 IP from ARM
  • Hardware coherency : the snooper
  • Basic snoop requests : CLEAN, FLUSH, KILL
  • Cache-to-cache data transfers Example : studying how two e600s share cache enabled data

MEMORY SUBSYSTEM

  • Understanding Hit-Under-Miss and Miss- Under-Miss
  •  Decoupling the core internal operation from the bus interface
  • Write queues
  • Load Miss queue
  • Store Miss merging
  • Store Miss gathering Memory barrier instructions
  • Example : ARM DSB and DMB instructions

EXCEPTIONS MECHANISM

  • Basic ARM exception handling
  • Relationship between modes and register banks
  •  Interrupts nesting, requirement of system mode
  • Primecell VICs
  • Reducing interrupt latency through automatic vector generation
  • VIC basic signal timing
  • Connectivity : daisy-chained
  • VIC Interrupt priority and masking

MULTI-CORE SYSTEMS

  • Multiprocessing types : SMP vs. AMP
  • Hardware requirements to support SMP : multi-core interrupt controller
  • Inter-Processor Interrupts
  • Example : IA32 Local APIC and IO APIC
  • Defining which resources are shared and which resources are private
  • System start sequence, assigning an ID to each core
  •  Multi-processor synchronization, management of Boolean semaphores
  • ARM SWP instructions ARM11 new mechanism, exclusive load and store instructions
  • Debugging a multi-core application
    • Engineers in charge of processor core development
    • Engineers in charge of low level code development (especially boot programs)
    • Project managers in charge of choosing cores for future designs

     

     
     
    You consider we have missed something in the syllabus? Call us 972-3-9247780 ext. #207 or E-mail us    and we shall dispel your doubts.
    Others who took this course also took the following course/courses:
     right now and we shall contact you immediately. 
     
    Web hosting by Somee.com