SoC FPGA Main Memory Performance - Altera

1 downloads 176 Views 379KB Size Report
Additional performance can be achieved by customizing the memory controller via software for the system's custom data pr
Architecture Brief

SoC FPGA Main Memory Performance Introduction A cursory look at the memory specifications can conceal the whole story of how it will perform in an SoC FPGA-based system. It is important to check the measured memory performance, not just the bus specifications to ensure that maximum efficiency is realized, for performance, operation and power consumption benefits. This Architecture Brief looks at memory performance considerations when selecting an SoC FPGA for a design project. Key aspects of this Architecture Brief are highlighted in an online video: “System Performance: How smart is your memory controller?” which can be found at www.altera.com/socarchitecture. Top Level Specs When selecting an SoC FPGA, one would typically assume that the memory bus speed would dominate the realized system memory performance (see Table 1).

Table 1: External Memory Controller Support Comparison Function/Feature

Altera SoC FPGA

Vendor B

Vendor C

Hardened External Memory Controller for Processor System

Yes

Yes

Yes

Maximum Supported Address Space

4G

1G

4G

LPDDR2, DDR2, DDR3L, DDR3

LPDDR2, DDR2, DDR3L, DDR3

LPDDR, DDR2, DDR3

Memory Types Supported

Data Width Configuration Modes

Integrated ECC Support External Memory Bus Maximum Frequency

x8 x16 x16+ECC x32 x32+ECC

x16 x16+ECC x32

x8 x8+ECC x16 x16+ECC x32 x32+ECC

16 bit, 32 bit

16 bit

8 bit, 16 bit, 32 bit

400 MHz (Cyclone V SoC), 533 MHz (Arria V SoC)

533MHz

333 MHz

Memory Controller Intelligence However, other factors - how intelligently the memory data transfers are prioritized, scheduled, and processed - can significantly impact overall memory performance. Altera SoC FPGAs utilize Altera’s third generation memory controller technology which include advanced features in the areas of scheduling, bank management, command and data reordering, and more.

Figure 1: Altera Memory Controller Intelligence

Altera SoCs

Memory Controller Features

Deficit Weighted Round Robin Scheduling Bank Management (+hint) User Supplied Profiles Trust Zone Security Round Robin Sheduler Comand and Data Reordering Priority Management Power Management Simple Scheduler Bank Management Active Refresh

GEN 3 GEN 2

GEN 1

Memory Performance Case Study: LMbench To illustrate the impact of the memory controller intelligence on system memory performance, consider two SoC FPGA devices with different memory bus speeds (shown in Figure 2). The one on the left is the Altera Cyclone V SoC FPGA; the one on the right is an SoC FPGA from “Vendor B”. Both have a dual-core ARM Cortex-A9 processor running at the same frequency of 667 MHz. However, one has an external memory operating at 400 MHz, while the other uses an external memory running at 533 MHz. Which one would you expect to have the better system memory performance? Initially, one would expect the system with 533 MHz memory to exhibit 33% higher performance. However, factors in the memory controller architecture produce some noticeably different results.

Figure 2: SoC FPGA Memory Performance Comparison

Altera SoC FPGA Hard Memory CPU Controller 667 MHz A FPGA Logic

Vendor B SoC FPGA DDR Memory 400 MHz

Hard Memory CPU Controller 667 MHz B FPGA Logic

DDR Memory 533 MHz

Turning to the system performance benchmark called LMbench, an industry-standard benchmark (www.bitmover.com/lmbench) well known for exercising the memory system performance, helps to quantify and compare the results. LMbench (ver. 3) consists of several different read/write test cases. The results for the partial read/write case are shown in Figure 3 as the partial read/write case is most indicative of transfers in a typical embedded system.

Figure 3: LMbench Partial Read/Write Memory Bandwidth Test Demonstrates Benefits of Advanced Controller

5,000

Higher is better

Vendor B

4,000 Memory Bandwidth (MB/s)

Altera SoC FPGA CPU: 667 MHz

With its more sophisticated memory controller, a 400-MHz DDR3 memory interface on an Altera SoC FPGA outperforms a 533-MHz DDR3 memory interface on a competing device.

CPU: 667 MHz

3,000

2,000

1,000

64 M

32 M

16 M

8M

4M

2M

1M

512 K

256 K

128 K

64 K

32 K

16 K

8K

4K

2K

1K

512

0

Transfer Size (bytes)

The vertical axis shows the memory bandwidth vs. the data transfer size along the horizontal axis. (Higher is better for the memory bandwidth.) The curve can be grouped into three stages as the data size moves from the L1 cache (32KB data + 32KB instruction) to the L2 cache (512KB shared) to external memory. Note that the Altera SoC FPGA significantly outperforms the Vendor B SoC FPGA on the L1 andL2 cache regions. As discussed earlier, one would expect that by the time the transfers reach the external memory (>512 KB on the curve) that the Vendor B solution would outperform the Altera SoC FPGA due to the 533 MHz external bus on SoC FPGA B vs. the 400 MHz memory bus of the Altera Cyclone V SoC FPGA. However, this is not the case as the Altera SoC FPGA exhibits comparable or better performance, even when accessing main memory at >1MB data transfer size. These results are due to the L1/L2 cache structure and external memory controller intelligence of the Altera SoC FPGA. Grouping the data into small (512 byte to 16 KB), medium (16KB to 1MB) and large (>2MB) data transfer sizes as shown in Figure 4 helps provide a numerical analysis for the three different regions of the curve.

Figure 4: LMbench Memory Bandwidth Difference Grouped by Data Transfer Size

Memory Bandwidth Increase

Ratio of Memory Bandwidth for Altera SoC FPGA vs. SoC FPGA B 18% 16% 14% 12% 10% 8% 6% 4% 2% 0%

17.03%

Altera SoC FPGA SoC FPGA B CPU Frequency Memory Device Frequency

667 MHz

400 MHz

533 MHz

6.60%

6.28%

512 Bytes - 16 KB

667 MHz

16 KB - 1 MB

Data Transfer Size (Bytes)

2 MB - 67 MB Benchmark: LMbench Access: Partial Memory Read Write

Across the range of small, medium, and large memory accesses, the Altera SoC FPGA with a more effective cache structure and more advanced memory controller, extracts up to 17% more memory bandwidth despite a slower external memory bus operating frequency. These results demonstrate that when comparing SoC FPGAs, it is important to check the measured memory system performance, not just the memory bus specifications. Memory controller algorithms extract maximum bandwidth by managing transaction priority, reordering command and data, and scheduling pending transactions using, for example, deficit weight round robin algorithms. Additional performance can be achieved by customizing the memory controller via software for the system’s custom data profile, set priorities, assign ports or transaction channels, and even share the bandwidth between them.

Conclusion The main memory selection is another example of where architecture matters. Memory controllers today can use sophisticated algorithms to maximize system memory efficiency. A superior memory controller can extract more bandwidth from system memory, enabling the memory to run at a lower frequency for the same throughput; thus saving system power and benefiting the whole system design.

Want to Learn More? For a more in-depth explanation of the Altera SoC FPGA architecture and LMbench performance results, tune to the EE Journal Chalk Talk entitled: Architecture Matters: Three Architectural Insights for SoC FPGAs. For more details on the Altera Cyclone V SoC FPGA memory controller architecture and settings, consult the SDRAM Controller Section of the Cyclone V Device Handbook, Vol. 3 Hard Processor System Technical Reference Manual.

Altera Corporation

Altera European Headquarters

Altera Japan Ltd.

Altera International Ltd.

101 Innovation Drive San Jose, CA 95134 USA www.altera.com

Holmers Farm Way High Wycombe Buckinghamshire HP12 4XF United Kingdom Telephone: (44) 1494 602000

Shinjuku i-Land Tower 32F 6-5-1, Nishi-Shinjuku Shinjuku-ku, Tokyo 163-1332 Japan Telephone: (81) 3 3340 9480 www.altera.co.jp

Unit 11- 18, 9/F Millennium City 1, Tower 1 388 Kwun Tong Road Kwun Tong Kowloon, Hong Kong Telephone: (852) 2 945 7000 www.altera.com.cn

Copyright © 2014 Altera Corporation. All rights reserved. Altera, the stylized Altera logo, speciἀc device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending applications, mask work rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. November 2014 SS-01243