AUB ScholarWorks

Tuning the continual flow pipeline architecture

Show simple item record

dc.contributor.author Jothi K.
dc.contributor.author Akkary H.
dc.contributor.editor
dc.date 2013
dc.date.accessioned 2017-09-07T07:08:19Z
dc.date.available 2017-09-07T07:08:19Z
dc.date.issued 2013
dc.identifier 10.1145/2464996.2465011
dc.identifier.isbn 9.7814503213e+012
dc.identifier.issn
dc.identifier.uri http://hdl.handle.net/10938/11865
dc.description.abstract Continual Flow Pipelines (CFP) allows a processor core to process instruction windows of hundreds of instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all miss dependent instructions into a low complexity non-critical waiting buffer to unblock the pipeline. Meanwhile, miss independent instructions execute normally and update the processor state. When the miss data returns, CFP replays the miss dependent instructions from the waiting buffer and then merges the miss dependent and the miss independent execution results. CFP was initially proposed for cache misses to DRAM. Later work focused on reducing the execution overhead of CFP by avoiding flushing the pipeline before replaying miss dependent instructions, and on allowing these instructions to execute concurrently with miss independent instructions. The goal of these improvements was to gain performance by applying CFP to L1 data cache misses that hit the last level on chip cache. However, many applications or execution phases of applications incur excessive amount of replay and-or rollbacks to the checkpoint. This frequently cancels any benefits from CFP or even causes performance degradation. In this paper, we improve the CFP architecture by using a novel virtual register renaming substrate, and by tuning the replay policies to mitigate excessive replays and rollbacks to the checkpoint. We describe these new design optimizations and show, using Spec 2006 benchmarks and microarchitecture performance and power models of our design, that our Tuned CFP architecture improves performance and power consumption over previous CFP architectures by ∼15percent and 9percent, respectively. © 2013 ACM.
dc.format.extent
dc.format.extent Pages: (243-252)
dc.language English
dc.relation.ispartof Publication Name: Proceedings of the International Conference on Supercomputing; Conference Title: 27th ACM International Conference on Supercomputing, ICS 2013; Conference Date: 10 June 2013 through 14 June 2013; Conference Location: Eugene, OR; Publication Year: 2013; Pages: (243-252);
dc.relation.ispartofseries
dc.relation.uri
dc.source Scopus
dc.subject.other
dc.title Tuning the continual flow pipeline architecture
dc.type Conference Paper
dc.contributor.affiliation Jothi, K., Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
dc.contributor.affiliation Akkary, H., Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
dc.contributor.authorAddress Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon
dc.contributor.authorCorporate University: American University of Beirut; Faculty: Faculty of Engineering and Architecture; Department: Electrical and Computer Engineering;
dc.contributor.authorDepartment Electrical and Computer Engineering
dc.contributor.authorDivision
dc.contributor.authorEmail
dc.contributor.faculty Faculty of Engineering and Architecture
dc.contributor.authorInitials
dc.contributor.authorOrcidID
dc.contributor.authorReprintAddress
dc.contributor.authorResearcherID
dc.contributor.authorUniversity American University of Beirut
dc.description.cited
dc.description.citedCount 1
dc.description.citedTotWOSCount
dc.description.citedWOSCount
dc.format.extentCount 10
dc.identifier.articleNo
dc.identifier.coden
dc.identifier.pubmedID
dc.identifier.scopusID 84879818352
dc.identifier.url
dc.publisher.address
dc.relation.ispartofConference Conference Title: 27th ACM International Conference on Supercomputing, ICS 2013 : Conference Date: 10 June 2013 through 14 June 2013 , Conference Location: Eugene, OR.
dc.relation.ispartofConferenceCode 97663
dc.relation.ispartofConferenceDate 10 June 2013 through 14 June 2013
dc.relation.ispartofConferenceHosting
dc.relation.ispartofConferenceLoc Eugene, OR
dc.relation.ispartofConferenceSponsor ACM SIGARCH
dc.relation.ispartofConferenceTitle 27th ACM International Conference on Supercomputing, ICS 2013
dc.relation.ispartofFundingAgency
dc.relation.ispartOfISOAbbr
dc.relation.ispartOfIssue
dc.relation.ispartOfPart
dc.relation.ispartofPubTitle Proceedings of the International Conference on Supercomputing
dc.relation.ispartofPubTitleAbbr Proc Int Conf Supercomputing
dc.relation.ispartOfSpecialIssue
dc.relation.ispartOfSuppl
dc.relation.ispartOfVolume
dc.source.ID
dc.type.publication Series
dc.subject.otherAuthKeyword continual flow pipelines
dc.subject.otherAuthKeyword instruction level parallelism
dc.subject.otherAuthKeyword latency tolerant processors
dc.subject.otherAuthKeyword superscalar processors
dc.subject.otherAuthKeyword virtual register renaming
dc.subject.otherChemCAS
dc.subject.otherIndex Continual flow pipelines
dc.subject.otherIndex Design optimization
dc.subject.otherIndex Instruction level parallelism
dc.subject.otherIndex Instruction windows
dc.subject.otherIndex Micro architectures
dc.subject.otherIndex Performance degradation
dc.subject.otherIndex Register renaming
dc.subject.otherIndex Superscalar Processor
dc.subject.otherIndex Benchmarking
dc.subject.otherIndex Cache memory
dc.subject.otherIndex Computer architecture
dc.subject.otherIndex Pipelines
dc.subject.otherIndex Program processors
dc.subject.otherIndex Pipeline processing systems
dc.subject.otherKeywordPlus
dc.subject.otherWOS


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search AUB ScholarWorks


Browse

My Account