ZYNQ Architecture#
The overall architecture of Zynq consists of two parts: PS (Processing System) and PL (Programmable Logic). The power circuits for these two parts are independent of each other, allowing PS and PL to be used separately, with unused parts powered down to reduce power consumption. However, the most valuable mode of Zynq is when the two components are combined for use.
PS (Processing System)#
As the foundation of the processing system, all chips contain a dual-core ARM Cortex-A9 chip. This is a hard processor, specifically designed and optimized silicon component on the chip.
In addition to the "hard" processor, there is another option, such as the "soft" processor like MicroBlaze, which is composed of PL-side units, equivalent to the IP on the PL side. In contrast, the "hard" processor can achieve relatively high performance, while the number and precise implementation of the "soft" processor are flexible.
It is worth mentioning that one or more MicroBlaze soft processors can be allocated on the PL side of Zynq to work in coordination with the hard core. For example, the soft core can be responsible for coordinating specific low-level functions and the interaction between systems, offloading less demanding tasks from the hard core to improve overall performance.
The PS side of Zynq does not only contain the ARM processor but also a set of related processing resources that form an Application Processing Unit (APU), along with extended peripheral interfaces, cache memory, memory interfaces, interconnect interfaces, and clock generation circuits.
PL (Programmable Logic)#
The logic part of Zynq is based on Artix7 and Kintex7 FPGA components.
Logic Part#
- Configurable Logic Block (CLB) — CLB is a small-scale, general grouping of logic units, arranged in a two-dimensional array in the PL, interconnected through programmable interconnects to other similar resources. Each CLB contains two slices and is adjacent to a switch matrix.
- Slice — A sub-unit within a CLB that contains resources for implementing combinational and sequential logic circuits.
-
Lookup Table (LUT) — A flexible resource that can implement
- Logic functions with up to 6 inputs
- A small piece of Read-Only Memory (ROM)
- A small piece of Random Access Memory (RAM)
- A Shift Register
LUTs can be combined as needed to form larger logic functions, memory, or shift registers.
-
Flip-flop (FF) — A sequential circuit that implements a one-bit register with a reset function. One of its uses is to implement latches.
-
Switch Matrix — Each CLB is adjacent to a switch matrix, providing flexible routing capabilities to connect units within the CLB or to connect the CLB with other resources in the PL.
-
Carry Logic — Arithmetic circuits need to pass signals between adjacent slices, which is achieved through carry logic.
-
Input/Output Block (IOB) — IOB facilitates the interfacing between PL logic resources and provides physical device "pads" to connect to external circuits. Each IOB can handle one bit of input or output signal, and IOBs are generally located around the perimeter of the chip.
Special Resources: DSP48E1 and BRAM#
These two resources are integrated in columns within the logic array, embedded in the logic part, and are close to each other, as intensive computation and storing data in memory are often closely related operations.
BRAM#
The BRAM in Zynq-7000 is the same as the BRAM in other Xilinx 7 series FPGAs, capable of implementing RAM, ROM, and FIFO, while also supporting error correction coding.
Each BRAM can store up to 36KB of information and can be configured as a 36KB RAM or two independent 18KB RAMs. It can also be "reshaped" to contain more smaller units or combined to form larger capacity RAM.
Using BRAM means that a large amount of data can be stored in a dedicated storage unit optimized within the chip, occupying very little physical space. Another method is Distributed RAM (DRAM), which is built using LUTs in the logic part. To create a memory comparable in size to BRAM, a large number of LUTs are needed, and the resulting implementation is affected by increased logic and routing delays, which can limit timing performance. On the other hand, using DRAM to implement small memories is advantageous, as it has higher resource utilization and more flexible layout. BRAM often operates at the highest clock frequency supported by the chip.
DSP48E1#
The LUTs in the logic part can be used to implement arithmetic operations of arbitrary length, but since long-word arithmetic circuits occupy a large space within the logic slices, such layouts and routing can lead to suboptimal clock frequencies. Therefore, it is better to use LUTs for short-word operations.
DSP48E1 is specifically designed for high-speed arithmetic operations on long-word signals, being a dedicated silicon resource that includes a pre-adder/subtractor, multiplier, and post-adder/subtractor within the logic unit.
The post-adder can also be used as a logic unit, allowing it to perform logical operations and supporting all basic Boolean operations.
If larger word-length operations are needed, multiple DSPs can be combined for expansion.
General Input/Output#
The general input/output functionality on Zynq is collectively referred to as SelectIO resources, which are grouped into sets of 50 IOBs, each with a pad for connecting to the external world.
The I/O groups are divided into High Performance (HP) or High Range (HR). The HP interface has a maximum voltage of 1.8V and is typically used for high-speed interfaces connecting to memory and other chips; the HR interface allows for a voltage of 3.3V, suitable for connecting various IO standards. Both interfaces support single-ended and differential signals.
Each IOB also contains an IOSERDES resource, which can perform programmable conversions between parallel and serial formats, with a data bit width of 2 to 8 bits.
Communication Interfaces#
Zynq contains GTX transceivers and high-speed communication interface blocks embedded within the logic part.
Other Programmable Logic Expansion Interfaces#
- ADC — XADC, with two independent 12-bit ADCs, each with a sampling rate of 1Msps.
- Clock — The PL receives four independent clock inputs from the PS and can also generate and distribute its own clock independent of the PS.
- JTAG Debug Interface
Interface Between PS and PL#
As mentioned earlier, the performance of Zynq relies not only on the characteristics of its two components, PS and PL, but also on the ability to integrate both into a complete, cohesive system. A set of highly customized AXI interconnects and interfaces plays a key role in bridging the two parts. Additionally, there are other types of connections between the PS and PL, particularly EMIO.
AXI#
Advanced eXtensible Interface. The current version is the fourth generation AXI4.
AXI4#
Used for memory-mapped connections, supporting the highest performance by transferring up to 256 data words to provide an address.
AXI4-Lite#
A simplified connection that only supports one data transfer at a time. AXI4-Lite is also memory-mapped, transferring one address and a single data at a time.
AXI4-Stream#
Used for high-speed streaming data, supporting bulk transfers of data of arbitrary size. There is no addressing mechanism, suitable for direct data flow between source and destination.
EMIO Interface#
EMIO involves transfers between two domains, implemented by a set of simple wires.
EBAZ4205 Mining Board Information Summary#
Expansion Version#
ebaz4205 Expansion Board - Lichuang Open Source Hardware Platform (oshwhub.com)
Development Board Completion#
Learning ZYNQ from Scratch (Based on Mining Card EBAZ4205) (Part 1) - CSDN Blog
Schematic Related#
Mining Board Schematic
Elrori/EBAZ4205: EBAZ4205 BOARD (github.com)
PCB