dc.contributor.author |
Jothi K. |
dc.contributor.author |
Akkary H. |
dc.contributor.editor |
|
dc.date |
2013 |
dc.date.accessioned |
2017-09-07T07:08:19Z |
dc.date.available |
2017-09-07T07:08:19Z |
dc.date.issued |
2013 |
dc.identifier |
10.1145/2464996.2465011 |
dc.identifier.isbn |
9.7814503213e+012 |
dc.identifier.issn |
|
dc.identifier.uri |
http://hdl.handle.net/10938/11865 |
dc.description.abstract |
Continual Flow Pipelines (CFP) allows a processor core to process instruction windows of hundreds of instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all miss dependent instructions into a low complexity non-critical waiting buffer to unblock the pipeline. Meanwhile, miss independent instructions execute normally and update the processor state. When the miss data returns, CFP replays the miss dependent instructions from the waiting buffer and then merges the miss dependent and the miss independent execution results. CFP was initially proposed for cache misses to DRAM. Later work focused on reducing the execution overhead of CFP by avoiding flushing the pipeline before replaying miss dependent instructions, and on allowing these instructions to execute concurrently with miss independent instructions. The goal of these improvements was to gain performance by applying CFP to L1 data cache misses that hit the last level on chip cache. However, many applications or execution phases of applications incur excessive amount of replay and-or rollbacks to the checkpoint. This frequently cancels any benefits from CFP or even causes performance degradation. In this paper, we improve the CFP architecture by using a novel virtual register renaming substrate, and by tuning the replay policies to mitigate excessive replays and rollbacks to the checkpoint. We describe these new design optimizations and show, using Spec 2006 benchmarks and microarchitecture performance and power models of our design, that our Tuned CFP architecture improves performance and power consumption over previous CFP architectures by ∼15percent and 9percent, respectively. © 2013 ACM. |
dc.format.extent |
|
dc.format.extent |
Pages: (243-252) |
dc.language |
English |
dc.relation.ispartof |
Publication Name: Proceedings of the International Conference on Supercomputing; Conference Title: 27th ACM International Conference on Supercomputing, ICS 2013; Conference Date: 10 June 2013 through 14 June 2013; Conference Location: Eugene, OR; Publication Year: 2013; Pages: (243-252); |
dc.relation.ispartofseries |
|
dc.relation.uri |
|
dc.source |
Scopus |
dc.subject.other |
|
dc.title |
Tuning the continual flow pipeline architecture |
dc.type |
Conference Paper |
dc.contributor.affiliation |
Jothi, K., Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon |
dc.contributor.affiliation |
Akkary, H., Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon |
dc.contributor.authorAddress |
Department of Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon |
dc.contributor.authorCorporate |
University: American University of Beirut; Faculty: Faculty of Engineering and Architecture; Department: Electrical and Computer Engineering; |
dc.contributor.authorDepartment |
Electrical and Computer Engineering |
dc.contributor.authorDivision |
|
dc.contributor.authorEmail |
|
dc.contributor.faculty |
Faculty of Engineering and Architecture |
dc.contributor.authorInitials |
|
dc.contributor.authorOrcidID |
|
dc.contributor.authorReprintAddress |
|
dc.contributor.authorResearcherID |
|
dc.contributor.authorUniversity |
American University of Beirut |
dc.description.cited |
|
dc.description.citedCount |
1 |
dc.description.citedTotWOSCount |
|
dc.description.citedWOSCount |
|
dc.format.extentCount |
10 |
dc.identifier.articleNo |
|
dc.identifier.coden |
|
dc.identifier.pubmedID |
|
dc.identifier.scopusID |
84879818352 |
dc.identifier.url |
|
dc.publisher.address |
|
dc.relation.ispartofConference |
Conference Title: 27th ACM International Conference on Supercomputing, ICS 2013 : Conference Date: 10 June 2013 through 14 June 2013 , Conference Location: Eugene, OR. |
dc.relation.ispartofConferenceCode |
97663 |
dc.relation.ispartofConferenceDate |
10 June 2013 through 14 June 2013 |
dc.relation.ispartofConferenceHosting |
|
dc.relation.ispartofConferenceLoc |
Eugene, OR |
dc.relation.ispartofConferenceSponsor |
ACM SIGARCH |
dc.relation.ispartofConferenceTitle |
27th ACM International Conference on Supercomputing, ICS 2013 |
dc.relation.ispartofFundingAgency |
|
dc.relation.ispartOfISOAbbr |
|
dc.relation.ispartOfIssue |
|
dc.relation.ispartOfPart |
|
dc.relation.ispartofPubTitle |
Proceedings of the International Conference on Supercomputing |
dc.relation.ispartofPubTitleAbbr |
Proc Int Conf Supercomputing |
dc.relation.ispartOfSpecialIssue |
|
dc.relation.ispartOfSuppl |
|
dc.relation.ispartOfVolume |
|
dc.source.ID |
|
dc.type.publication |
Series |
dc.subject.otherAuthKeyword |
continual flow pipelines |
dc.subject.otherAuthKeyword |
instruction level parallelism |
dc.subject.otherAuthKeyword |
latency tolerant processors |
dc.subject.otherAuthKeyword |
superscalar processors |
dc.subject.otherAuthKeyword |
virtual register renaming |
dc.subject.otherChemCAS |
|
dc.subject.otherIndex |
Continual flow pipelines |
dc.subject.otherIndex |
Design optimization |
dc.subject.otherIndex |
Instruction level parallelism |
dc.subject.otherIndex |
Instruction windows |
dc.subject.otherIndex |
Micro architectures |
dc.subject.otherIndex |
Performance degradation |
dc.subject.otherIndex |
Register renaming |
dc.subject.otherIndex |
Superscalar Processor |
dc.subject.otherIndex |
Benchmarking |
dc.subject.otherIndex |
Cache memory |
dc.subject.otherIndex |
Computer architecture |
dc.subject.otherIndex |
Pipelines |
dc.subject.otherIndex |
Program processors |
dc.subject.otherIndex |
Pipeline processing systems |
dc.subject.otherKeywordPlus |
|
dc.subject.otherWOS |
|