AUB ScholarWorks

Tuning the continual flow pipeline architecture -

Show simple item record

dc.contributor.author Jothi, Komal Madaiah,
dc.date 2014
dc.date.accessioned 2015-02-03T10:23:56Z
dc.date.available 2015-02-03T10:23:56Z
dc.date.issued 2014
dc.date.submitted 2014
dc.identifier.other b18068339
dc.identifier.uri http://hdl.handle.net/10938/10042
dc.description Dissertation Ph.D. American University of Beirut, Department of Electrical and Computer Engineering. 2014. ED:47
dc.description Advisor: Dr. Haitham Akkary, Associate Professor, Electrical and Computer Engineering ; Committee Members: Dr. Ayman Kayssi, Professor, Electrical and Computer Engineering ; Dr. Ali Chehab, Associate Professor, Electrical and Computer Engineering ; Dr. Mazen Saghir, Associate Professor, Electrical and Computer Engineering, Texas AandM University at Qatar ; Dr. Alaa Alameldeen, Adjunct Faculty, Electrical and Computer Engineering, Portland State University.
dc.description Includes bibliographical references (leaves 127-132)
dc.description.abstract One of the main factors that impacts performance of general purpose computer processors is misses to the data cache. Conventional techniques used in modern processors - building wide pipelines and large instruction buffers to hide the latency of these misses and keep the processor units busy - are not suitable for present and next generation processors that need to cater to high energy efficiency demands. Continual Flow Pipeline (CFP) allows a processor core to handle hundreds of in-flight instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all miss-dependent instructions into a low complexity waiting buffer to unblock the pipeline. Meanwhile, miss-independent instructions execute normally and update the processor state. When the miss data returns, CFP replays the miss-dependent instructions from the waiting buffer and then merges the miss dependent and independent execution results. CFP was initially proposed for cache misses to DRAM. In that work, the miss-independent and miss-dependent instructions execute at different times separated by a pipeline flush in between, based on the timing of the load miss event and the data arrival event. In this thesis, we focus on reducing the execution overhead of CFP by avoiding the pipeline flush and executing dependent and independent instructions concurrently. The goal of these improvements is to gain performance by applying CFP to L1 data cache misses that hit the last level on-chip cache. However, we see that when CFP is applied to L1 data cache misses, many applications or execution phases of applications incur excessive amount of replay and-or rollbacks to the checkpoint. This frequently cancels benefits from CFP and reduces performance. We mitigate this issue by using a novel virtual register renaming substrate, and by tuning the replay policies to eliminate excessive replays and rollbacks to the checkpoint. We describe these new design optimizations
dc.format.extent 1 online resource (xv, 132 leaves) : illustrations ; 30cm
dc.language.iso eng
dc.relation.ispartof Theses, Dissertations, and Projects
dc.subject.classification ED:000047 AUBNO
dc.subject.lcsh High performance processors.
dc.subject.lcsh Computer architecture.
dc.subject.lcsh Microprocessors.
dc.subject.lcsh Energy consumption.
dc.subject.lcsh Computers, Pipeline.
dc.subject.lcsh Computer engineering.
dc.subject.lcsh Pentium (Microprocessor)
dc.title Tuning the continual flow pipeline architecture -
dc.type Dissertation
dc.contributor.department American University of Beirut. Faculty of Engineering and Architecture. Department of Electrical and Computer Engineering. degree granting institution.


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account