HADP - Materials

"How to present research in a seminar" - Prof. Torsten Hoefler

----


Reading material

[1] Liang et al.: Floating point unit generation and evaluation for FPGAs, FCCM'03 (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1227254&tag=1)

[2] Kobori et al.: A Cellular Automata System with FPGA, FCCM'01 (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1420908)

[3] Sano et al.: Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth, TPDS'14 (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6470606)

[4] Zuo et al.: Improving Polyhedral Code Generation for High-Level Synthesis, CODES'13 (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6659002)

[5] Trimberger: Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology, Proc. of IEEE (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7086413)

[6] Parashar et al.: Triggered Instructions: A Control Paradigm for Spatially-Programmed Architectures, ISCA'13 (http://dl.acm.org/citation.cfm?id=2485935)

[7] da Silva et al.: "Performance modeling for FPGAs: extending the roofline model with high-level synthesis tools" IJRC'13 (http://dl.acm.org/citation.cfm?id=2610940)

[8] Canis et al.: LegUp: high-level synthesis for FPGA-based processor/accelerator systems, (http://dl.acm.org/citation.cfm?id=1950423)

[9] DeHon: Fundamental Underpinnings of Reconfigurable Computing Architectures, Proc. IEEE (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7086421)

[10] Jun et al., "BlueDBM: An Appliance for Big Data Analytics," ISCA 2015. (http://livinglab.mit.edu/wp-content/uploads/2016/01/ISCA15_Sang-Woo_Jun.pdf)

[11] Zhu and Janapa Reddi, "WebCore: Architectural Support for Mobile Web Browsing," ISCA 2014. (http://3nity.io/~vj/downloads/publications/zhu14webcore.pdf)

[12] Ahn et al., "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing," ISCA 2015. (https://users.ece.cmu.edu/~omutlu/pub/tesseract-pim-architecture-for-graph-processing_isca15.pdf)

[13] Biscuit: A Framework for Near-Data Processing of Big Data Workloads Yoon et al., Samsung  ISCA'16: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7551390;  VLDB'16:  http://www.vldb.org/pvldb/vol9/p924-jo.pdf)

[14] Putnam et al., "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services," ISCA 2014. (http://dx.doi.org/10.1145/2678373.2665678)

[15] BLAS Comparison on FPGA, CPU and GPU IEEE Computer Society Symposium on VLSI, 2010 (https://www.microsoft.com/en-us/research/publication/blas-comparison-on-fpga-cpu-and-gpu/)

[16] Deep Learning with Limited Numerical Precision ICML 2015 (https://arxiv.org/pdf/1502.02551.pdf)

[17] DaDianNao: A Machine-Learning Supercomputer MICRO 2015 (http://ieeexplore.ieee.org/document/7011421/)

[18] Fast Support Vector Machine Training and Classification on Graphics Processors ICML 2008 (https://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-11.pdf)

[19] cuDNN: Efficient Primitives for Deep Learning ArXiv (http://arxiv.org/abs/1410.0759)

[20] High-Throughput Transaction Executions on Graphics Processors VLDB 2011 (http://www.vldb.org/pvldb/vol4/p314-he.pdf)

[21] Bojnordi and Ipek, "Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning," HPCA 2016. (http://www.cs.rochester.edu/~ipek/hpca16.pdf)

[22] Shafiee et al., "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars," ISCA 2016. (https://www.cs.utah.edu/~rajeev/pubs/isca16.pdf)* Best read in conjunction with the DaDianNao paper [17]*

[23] Shaw et al., "Anton, a special-purpose machine for molecular dynamics simulation," ISCA 2007. (http://dl.acm.org/citation.cfm?doid=1250662.1250664)

[24] Suleman et al. "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," ASPLOS 2009. (https://users.ece.cmu.edu/~omutlu/pub/acs_asplos09.pdf