Description

XPS_NPI_DMA is high performance direct memory access (DMA) engine which seamlessly integrates into Xilinx EDK environment (figure below). It is highly flexible due to full access of the softcore MicroBlaze to the XPS_NPI_DMA core functionality through 9 32-bit registers attached to PLBv4.6 bus.

It enables high speed data streaming to (input port) and from (output port) an external memory attached to a Xilinx Multiport Memory Controller (MPMC).

XPS_NPI_DMA and XPS_FX2 custom IP blocks are both necessary to connect  (throgh USB connection) host computer's software  and TE USB FX2 module's DRAM. For MB Commands tests, see example 4.

Features

  • The input and output port can run simultaneously.
  • The transfer on each port can have its own allocated memory space and can be looped. Loop means that when the transfer will run indefinitely on allocated memory space (like a frame buffer) until stopped.
  • The data written to the input port can be stored in a memory linearly.
  • The data coming out from the output port can be read in X-Y pattern. That means it can read data linearly or canjump in memory locations to transpose input data – used for rotating image for 90 degrees. That is possible when using single beat transactions (word or double-word) for reading.

 

System integration block scheme

The XPS_NPI_DMA has 4 interfaces:

  • Xilinx PLBv4.6 created with IPIF wizard for access to 9 x 32-bit registers. These registers control the whole peripheral operation.
  • MPMC Native Port Interface (NPI) bus supporting 32 or 64-bit width. This bus is used for highspeed access to external memory.
  • Proprietary synchronous 32 or 64-bit wide DMA_IN bus used for data streaming to external memory. This port can stream data in blocks – up to 64 word burst (64-bit only). Maximal sustainable bandwidth at 100MHz NPI_Clk is 200MB/s .
  • Proprietary synchronous 32 or 64-bit wide DMA_OUT bus used for data streaming from external memory. The bus width is dependant on NPI bus width. This port can stream data in single beat transfers – word (32-bits), double word (64-bit), 16, 32 and 64-word (64-bit only) per one transaction. Maximal sustainable bandwidth at 100MHz NPI_Clk is approx. 300MB/s at 32-bits and 600MB/s at 64-bits.

 

Peripheral internal structure block scheme

XPS_NPI_DMA Core Design Parameters

Feature/DescriptionParameter Name Allowable ValuesDefault ValueVHDL Type
System Parameters
Target FPGA familyC_FAMILY spartan3, spartan3e,
spartan3a,
spartan3adsp,
spartan3an, virtex2p,
virtex4, qvirtex4,
qrvirtex4, virtex5
virtex5string
PLB Parameters
PLB base addressC_BASEADDRValid AddressNonestd_logic_vector
PLB high addressC_HIGHADDRValid AddressNonestd_logic_vector
PLB least significant
address bus width
C_SPLB_AWIDTH3232integer
PLB data widthC_SPLB_DWIDTH32, 64, 12832integer
Shared bus topologyC_SPLB_P2P 0 = Shared bus
topology

0integer
PLB master ID bus
Width
C_SPLB_MID_WIDTH log2(C_SPLB_NUM_
MASTERS) with a
minimum value of 1

1integer
Number of PLB mastersC_SPLB_NUM_MASTERS1 - 161integer
Width of the slave data
bus
C_SPLB_NATIVE_DWIDTH3232integer
Burst supportC_SPLB_SUPPORT_BURSTS0 = No burst support0integer
XPS_NPI_DMA Parameters
NPI bus data widthC_NPI_DATA_WIDTH32, 6432integer
Byte swap input dataC_SWAP_INPUT0, 10integer
Byte swap output dataC_SWAP_OUTPUT0, 10integer
Writing padding value
if number of bytes does
not match multiple of
packet size
C_PADDING_BE0, 1 (zeros, ones)0integer
XPS_NPI_DMA Core Design Parameters

XPS_NPI_DMA I/O Signal Descriptions

Name

InterfaceI/OInitial StateDescription
NPI_Clk-I-Memory clock
ChipScope[0:63]-O-Debug port
IP2INTC_IrptInterrupt request
LEVEL_HIGH
Capture_data[(C_NPI_DATA_WIDTH-1):0]DMA_INI-Sync DMA Input data
Capture_validDMA_INSync DMA Input valid strobe
Capture_readyDMA_IN DMA Input is ready flag
Output_data[(C_NPI_DATA_WIDTH-1):0]DMA_OUT 

DMA Output data,
Sync to NPI_Clk

Output_validDMA_OUT DMA Output valid strobe,
sync to NPI_Clk
Output_readyDMA_OUTExternal Output ready
NPI_Addr[31:0]MPMC_PIMOzerosNPI address data
NPI_AddrReqMPMC_PIMO0NPI address request
NPI_AddrAckMPMC_PIM NPI address acknowledge
NPI_RNWMPMC_PIMO0NPI read now write
NPI_Size[3:0]MPMC_PIM NPI packet size
See below for info
NPI_RdModWrMPMC_PIM ONPI read mod write
(not used)
NPI_WrFIFO_Data[(C_NPI_DATA_WIDTH-1):0]MPMC_PIM zeros NPI write FIFO data vector
NPI_WrFIFO_BE[(C_NPI_DATA_WIDTH/8-1):0]MPMC_PIM ones NPI write FIFO byte enable mask
(alway ones)
NPI_WrFIFO_PushMPMC_PIM NPI write FIFO data valid strobe
NPI_RdFIFO_Data[(C_NPI_DATA_WIDTH-1):0]MPMC_PIMNPI read FIFO data vector
NPI_RdFIFO_PopMPMC_PIM NPI read FIFO data read strobe
NPI_RdFIFO_RdWdAddr[3:0]MPMC_PIM NPI read FIFO read write addr
(not used)
NPI_WrFIFO_EmptyMPMC_PIM NPI write FIFO empty flag
NPI_WrFIFO_AlmostFullMPMC_PIMI-NPI write FIFO almost full flag
NPI_WrFIFO_FlushMPMC_PIMO0NPI write FIFO reset
NPI_RdFIFO_EmptyMPMC_PIM NPI read FIFO empty flag
NPI_RdFIFO_FlushMPMC_PIMO0NPI read FIFO reset
NPI_RdFIFO_Latency[1:0]MPMC_PIM ‘’01’’ NPI read FIFO latency
NPI_InitDoneMPMC_PIMI-MPMC init done flag
OTHERS ARE PLBv4.6 SIGNALSPLBv4.6---
XPS_NPI_DMA I/O Signal Descriptions

Writing and reading to/from DMA_IN and DMA_OUT ports

The point to point unidirectional buses use simple handshaking protocol.

  • When (slave) “ready” signal is high the port is open for writing.
  • A write is performed when “valid” signal goes high.
  • The “data” should be valid when valid signal is high.
  • If “valid” signal goes high and the ready is low then the data are discarded (FIFO_IN only).
  • The signals are updated on rising edge of clock. 
The internal clock for DMA_IN is NPI_Clk.
In this version DMA_OUT can properly throttle transmission using Ready signal only at single beat transfers (Read block size = 0).
BUSDMA_INDMA_OUT
 Bus width 32 or 64 bit 32 or 64 bit
 Clock synchronous toNPI_ClkNPI_Clk 
“valid” widthMultiple cycles possibleMultiple cycles possible
XPS_NPI_DMA I/O Signal Descriptions

DMA high speed communication ports principle of operation

XPS_NPI_DMA Core Registers

XPS_NPI_DMA has a full access of a microprocessor to the core functionality through a 9 user 32-bit and 7 IPIF Interrupt registers attached to PLBv4.6 bus.

Base Address +

Offset (hex)

Register Name

Access Type

Default Value (hex)

Description
NPI_DMA_CORE IP Core Grouping

C_BASEADDR + 00

CRR/W0x00000000Control Register
C_BASEADDR + 04WSAR/W0x00000000Write Start Address Register
C_BASEADDR + 08WBRR/W0x00000000Write Bytes Register
C_BASEADDR + 0CRSAR/W0x00000000Read Start Address Register
C_BASEADDR + 10RBRR/W0x00000000Read Bytes Register
C_BASEADDR + 14RJRR/W0x00000000Read Jumps Register
C_BASEADDR + 18SRRead0x00000000Status Register
C_BASEADDR + 1CWCRReadWSA

Write Address Counter Register

C_BASEADDR + 20

RCRReadWBR

Read Address Counter Register

IPIF Interrupt Controller Core Grouping

C_BASEADDR + 200

INTR_DISRRead0x00000000interrupt status register

C_BASEADDR + 204

INTR_DIPRRead0x00000000interrupt pending register

C_BASEADDR + 208

INTR_DIERWrite0x00000000interrupt enable register

C_BASEADDR + 218

INTR_DIIRWrite0x00000000interrupt id (priority encoder) register

C_BASEADDR + 21C

INTR_DGIERWrite0x00000000global interrupt enable register

C_BASEADDR + 220

INTR_IPISRRead0x00000000ip (user logic) interrupt status register
C_BASEADDR + 228INTR_IPIERWrite0x00000000ip (user logic) interrupt enable register
XPS_NPI_DMA Core Registers
The First (LSB) interrupt from user_logic is masked on the left!!

Details of XPS_NPI_DMA Core Regi sters

The parts of the registers (or the whole registers) with a non-capital designation (e.g. wr_fifo_rst) are usually the names of the HDL signals connected to the described register.

Control Register (CR)

The Control Register is used to control basic peripheral functions. All the bit flags are assembled here.

BitsNameDescription ResetValue

31

rst

Peripheral soft reset (not self resettable)

 

0
30

wr_fifo_rst

 

Write FIFO reset (not self resettable)0
29rd_fifo_rstRead FIFO reset (not self resettable)0
28wr_loopWrite loop – continuous transfer0
27rd_loopRead loop – continuous transfer0

26

wr_test

 

Write test – writes 32bit counter to memory0
25xfer_writeWrite data flag (starts/stops xfer)0
24xfer_readRead data flag (starts/stops xfer)0
20-23wr_block_sizeWrite block size0x0
16-19rd_block_sizeRead block size0x0
15use_rd_jumpEnables transpose0
Control Register bits
Write Start Address Register (WSA)

Here, the user inputs start address for writing transfer. It is an external memory address for the first byte to be written.

It should be aligned to Write block size boundary.

wr_start_addr
Write Bytes Register (WBR)

Here, the user inputs the number of bytes to written to memory. It is not necessary to align the number of bytes to block size, since the remaining bytes will be padded. If the user sets wr_loop to 1 then the WSA+WBR is the maximal address before the address counter jumps to WSA and starts counting again.

wr_xfer_bytes
Read Start Address Register (RSA)

Here, the user inputs start address for reading transfer. It is an external memory address for the first byte to be read.

It should be aligned to Read block size boundary.

rd_start_addr
Read Bytes Register (RBR)

Here, the user inputs the number of bytes to be read from the memory. It is not necessary to align the number of bytes to block size, since the remaining bytes will remain in the RdFIFO. If the user sets rd_loop to 1 then the when the byte counter reaches RBR values jumps to 0 (RSA address) and starts counting again.

rd_xfer_bytes
Read Jumps Register (RJR)

This register is used to input two16bit values to define the reading jumping startegy/algorithm. The read_jump is an address increment between two consecutive reads. If the user want linear read then this is a number of bytes per read block (4 or 8 for single beat xfer). When rotating (transposing) an image this should equal to number of bytes in a row. The parameter rows define how many reads should be done before returning to starting position+block size.

At linear transfer this register in NOT USED.

Read Jumps Register (RJR)
Status Register (SR)

In the status register the peripheral reports of the current status.

BitsNameDescriptionReset Value
31wr_xfer_done

Write xfer done flag (always 0 if wr_loop = '1')

1
30rd_xfer_done

Read xfer done flag (always 0 if wr_loop = 1)

1
24-27xfer_statusWrite xfer status (bit 27 = wr_fifo_full)0
Status Register (SR)
Write Address Counter Register (WCR)

Reading this register returns current WRITE address counter value. It can be used to monitor write transfer progress.

wr_xfer_counter
Read Address Counter Register (RCR)

Reading this register returns current READ address counter value. It can be used to monitor read transfer progress.

rd_xfer_counter
Interrupt registers

With INTR_IPIER register the user can enable/disable peripheral interrupt sources. With INTR_IPISR the user can identify interrupt source. Writing a value to INTR_IPISR also clears interrupt.

"Ghost" interrupts

The user must make sure that triggered interrupts will be cleared in a consinstent way (single owner); the user (host computer's software) must only clear triggered interrupts. Otherwise the user will trigger "ghost" interrupts which were not triggered by peripheral, but the interrupt controller itself.

Writing 0x7 to INTR_DIER will enable IP interrupt sources and writing 0x80000000 to INTR_DGIER will enable global interrupt.

The image below presents a conection of user logic interrupt to INTR_IPIER and INTR_IPISR.

Conection of user logic interrupt to INTR_IPIER and INTR_IPISR.

Programmin model

In the instruction sequence it is only important that xfer_write or xfer_read are written at the end as they start the transmission.
Write block sizewr_block_sizeC_NPI_DATA_WIDTHtype of transferImplemented
4 bytes   X”0”321 word xfer(error)
8 bytes  X”0”642 words xfer(error)
16 bytesX”1”32-644-word cache-line burst(tick)
32 bytesX”2”32-648-word cache-line burst(tick)
64 bytesX”3”32-6416-word burst(tick)
128 bytesX”4”32-6432-word burst(tick)
256 bytesX”5”6464-word burst(tick)
Write block size available
Read block sizerd_block_sizeC_NPI_DATA_WIDTHtype of transferImplemented
4 bytes   X”0”321 word xfer(tick)
8 bytes  X”0”642 words xfer(tick)
16 bytesX”1”32-644-word cache-line burst(warning), not tested
32 bytesX”2”32-648-word cache-line burst(warning), not tested
64 bytesX”3”32-6416-word burst(tick)
128 bytesX”4”32-6432-word burst(tick)
256 bytesX”5”6464-word burst(tick)
Read block size available

Example 1

Example of single write transfer from address 0x1C000000 to 0x1C00FFFF using 32-word burst

1. Write 0x1C000000 to WSA

2. Write 0x00010000 to WBR

3. Write 0x00000440 to CR

4. Poll SR until write_xfer_done = 1

Example 2

Example of single linear read transfer from address 0x1C000000 to 0x1C00FFFF using 32-word burst transaction

1. Write 0x1C000000 to RSA

2. Write 0x00010000 to RBR

3. Write 0x00004080 to CR

4. Poll SR until read_xfer_done = 1

Example 3

Example of single transpose read transfer from address 0x1C000000 at image size 750 bytes/row x 480 rows.

1. Write 0x1C000000 to RSA

2. Write 0x00057E40 to RBR

3. Write 0x02EE01DF to RJR (Note 1DF=rows-1)

4. Write 0x00010080 to CR

5. Poll SR until read_xfer_done = 1

In this case the user gets on output port 4 (at C_NPI_DATA_WIDTH = 32) or 8 (at C_NPI_DATA_WIDTH = 64) bytes per every data valid. Further demultiplexing (downto single pixel size if needed) can be done using a FIFO array (for example OUTPUT_DMA_FIFOS).

For using the software driver read function comments in:

#projec t#(or IP repository)\drivers\xps_npi_dma_v1_00_a\src\xps_npi_dma.c

Example 4 (if Reference Design is used): test XPS_NPI_DMA and XPS_FX2 using MB Commands 

XPS_NPI_DMA and XPS_FX2 custom IP blocks are both necessary to connect  (throgh USB connection) host computer's software  and TE USB FX2 module's DRAM.

The MB Commands FX22MB_REG0_START_RX, FX22MB_REG0_START_TX, FX22MB_REG0_STOP are used for data throughput and integrity test.

MB Commands require the XPS_I2C_SLAVE custom IP block and a proper FX2 interrupt handler (i2c_slave_int_handler() function in interrupt.c running on MicroBlaze); the FX2 interrupt handler is called to handle the signal interrupt xps_i2c_slave_0_IP2INTC_Irpt. The i2c_slave_int_handler() function actually execute the I2C delivered MB Command.

Write test should be executed before read test; otherwise the read test will fail.
  • No labels