# Radioastronomic signal processing cores for the SKA telescope

G. Comoretto, S. Chiarucci, C. Belli

- Digital signal processing in FPGAs
- SKA overview
- IP Core library
- Quantization effects

#### Data processing in radioastronomy

- Wideband signals
  - Typ. 100 MHz  $\rightarrow$  30 GHz
  - Analog chain vs. digital chain
- Large number of signals
  - Interferometers: 10's to 1000 elements
  - Phase array feeds: 100's of elements
- Complex, but fixed, process
  - Channelization
  - Correlation
  - Filtering
  - Resampling
  - Beamforming





#### Radio signal processing



#### **RF Interference mitigation**

- Segnali RF artificiali sono 10<sup>6</sup> volte quelli naturali
- Filtro automatico di campioni nel dominio del tempo o della frequenza
- Analisi su statistiche avanzate (kurtosis)



#### Firmware development tools Tame the complexity

- Proprietary graphic generation tools
  - Based on Matlab-Simulink
  - Difficult to port on other families
- OpenCL C-like
  - Strongly family dependent
  - Require a direct link to the target board
  - Require board support package
- Casper
  - Extensive library based on Xilinx DSP blocks
  - Bound to specific tools and FPGA families
- VHDL libraries
  - Difficult to use
  - Portable with minor efforts
  - Very efficient
  - Use software design tools (SysML)



Casper 8 antenna correlator

Applicazioni FPGA in ambito astrofisico Torino, 18-20 Maggio 2016

#### The hardware

- Casper boards •
  - Very well supported, easy to acquire, open source
  - Virtex 6 (Roach-2)
  - typ. 3-5 years old
- Uniboard
  - 8x Stratix5 mesh 4x4 @20Gb
  - 8x DDR3
  - 2x16x10GbE link
- Uniboard 2
  - Custom, not in production
  - 4x Arria10 ax115 ring @100 Gb
  - 8x DDR4 + HMC on back board
  - 24x 40GbE link
- ITPM
  - 2x Kintex US
  - 4xDDR3(4)
  - 32x 1GB ADC chans
  - 2x 40 GbE link



Applicazioni FPGA in ambito astrofísico Torino, 18-20 Maggio 2016

### The SKA telescope

3 telescopes (in phase 1)

- SKA-low
  - 131072 dual polarization logperiodic antennas
  - 300 MHz bandwidth
- SKA-mid
  - 192 dish antennas
  - 5 GHz bandwidth
- SKA-survey
  - 32 dish antennas, with PAF
  - 36 beams/antenna
  - 500 MHz bandwidth

Ska phase 2 (~2020): ~20x more antennas





#### SKA computing requirements

- Very high data rates
  - Heavy parallelism
  - Large interconnect BW
  - Distributed memory
- Low power
  - 5W/antenna = 600kW
  - Power in desert is expensive

| SKA requirements | LOW  | MID |      |
|------------------|------|-----|------|
| Data rate        |      |     |      |
| Antennas         | 1700 | 35  | Tbps |
| Corr. input      | 2.5  | 10  | Tbps |
| Corr. output     | 76   | 190 | Gbps |
| Processing       |      |     |      |
| Antennas         | 6000 | 300 | Tops |
| Correlation      | 160  | 740 | Tops |

#### The SKA LFAA library

- Modules developed for the SKA-LOW beamformer-channelizer
  - Functions for signal generation, channelization, filtering beamforming, spectroscopy
- Extensive reuse of modules developed for Uniboard
- Seamless conversion from Altera to Xilinx
- All modules with consistent interface:
  - Control using AXI4-lite
  - Streaming data using AXI4-stream
- · All modules heavily parametrized
  - Time multiplexing factor
  - Input, output, coefficient data widths



**DDR** external

Applicazioni FPGA in ambito astrofisico Torino, 18-20 Maggio 2016

### **Polyphase filterbank**

- Problem:
  - Channelize a signal with high out-of-band rejection
  - Overlap between adjacent channels ~18%
  - 512 output channels (pos. freq. only)
- Configurable oversampled PFB module
  - n. of channels, inputs, taps, time multiplexing, oversampling,
  - Optimized for real input samples
  - Processes two signals, time multiplexed x4
  - Output data rate = input data rate x oversampling: 200 MHz in, ~237 MHz out
  - Output clock may be higher. Then not all clock cycles have valid samples
  - Automatic detection and flagging of overflow in FFT



### **Polyphase filterbank – real FFT**



- Real FFT of M-time multiplexed signal = complex FFT of half size
- Radix-4 FFT module efficiently transform 4 parallel complex signals
  - E.g. two parallel 4x time multiplexed signals
  - Or one 8x time multiplexed signal, ~2.5-4 Gsps band
- Final complex-to-real transform and bit-reverse reordering

#### Resource usage of complete module, filter+fft, 16 signals, x4 time multiplexed

- 367 block RAMs (~23/signal, 17 in filter, 6 FFT)
- 1343 multiplier blocks (~84/signal, 56 in filter, 28 FFT)
- Comparison with Altera DSP Builder: 1168 block RAMS (x3), 1856 multipliers (+38%)

#### http://www.arcetri.astro.it/images/data/Reports/14/14\_04.pdf

### Polyphase filterbank

- Design procedure
  - Matlab based, but with custom
     C++ filter calculation
  - Equiripple (Remez/McKellan) algorithm
    - Minimizes number of taps (FPGA resources)
  - Algorithm fails for N> 200
    - interpolation of coefficients
- Filter performance:
  - 14 taps per FFT channel
  - 60 dB min. stopband attenuation
  - 90 dB @ 10 MHz offset
  - +/- 0.2 dB in-band ripple
  - 18.5% overlap



#### **Tile beamformer**

- Selects up to 16 spectral regions, arbitrarily placed in input bandwidth
- Computes geometric delay for each region (beam)
  - LMC specifies delay+delay rate
- Phases together and coadds 16 antennas







Campera Electronic Systems Srl is a high-technology, privately held, startup based in Livorno, Italy. Established in March 2014 has grown rapidly with a focus on FPGA and its applications.

•The Staff has more than 10 years of experience on FPGA design, Digital Signal Processing and board level design for Telecommunication, Railway, Aerospace and Defense, Test and Measurement and High Performance Computing markets.



HDL Design

know-how from DSP up to HDL alongside a vast, fully verified, collection of HDL IP core



#### **CES HDL Library**

High performance, vendor independent "off the shelf" VHDL Digital Signal Processing Libraries

**V & V** Simulation, documentation and independent Verification and Validation services.  $\langle \langle \rangle$ 

**ASIPs** Application Specific IP core , tailored to a specific application



# **CES PFB CHANNELIZER**

*Campera Electronic Systems developed FX-engine channelizer for the three correlators of the SKA Telescope.* 

- The channelizer is based on a highly configurable Polyphase Filter Bank architecture capable of analyze over 4GHz BW in real time with up to 512.000 channels, using our super fast CES FFT Core
- The polyphase filter uses a Weight-Overlap-Add (WOLA) operation on a serial data stream, multiplying segments of the stream by a filter function and adding together individual frames of the result.
- Oversampling and overlapping between segments are implemented to extend alias free region.

#### Key features

•Real Time bandwidth of up to 4.6 GHz, rejection out of band up to 63 dB and in band ripple 0.2 dB

Full configurable: filter length and coefficient taps bit width, oversampling factor and overlapping length, Number of channels, input/output data width, time multiplexing factor
The VHDL library is fully vendor independent.

The CES FF T ASIP Core has been developed to sustain the inherent parallelism required to process the massive > 10GSps data rate required



Campera Electronic Systems is currently designing a new custom Frequency domain Tile BeamForming for the SKA project

### **Test signal generator**

- Configurable sample size and time multiplexing factor
- Used when no actual signal is present
  - Relies on input SOP and DAV signals, that must be generated
  - Test signal is added to the input stream. E.g. for RFI test
  - Input signal can be blanked
- One generator for each input stream
- Test signals:
  - Two sinewaves, with independently programmable frequency and phase
  - Pseudorandom white noise
  - Periodic pulses (frequency comb)
- All test signals have programmable amplitude



#### **Digital receiver**



### Wideband digital receiver (SKA-MID)

- Tunable receiver for SKA-MID telescope
- 14 GHz input real (30 GS/s)
- 2.5 GHz output complex (2.75 GS/s)
  - Noninteger decimation 11/120
- Two stage decimation: 1/6x11/20

#### Processing

- 1st filter: passband
- Decimation 1:8
- Frequency rotation to align band
- 2nd filter, sharp
- Interpolating resampling 11:15\_
- Optional complex-to-real



Applicazioni FPGA in ambito astrofísico Torino, 18-20 Maggio 2016

#### Detailed architecture – first filter

Transfer Function of the Prototype Filter - Stopband

3

Channel shape

4

5

Direct

Aliased

6

1.5

0.5

2

1

-50

-100 L

0

-20

-40 -60

- Input as 64 phases
- 64 tap complex filter
  - Wide transition band
- 8 outputs (1:8 decimation)
- Total of 16 (8R+8I) symmetric filters
- 64x8 multipliers required



#### Second filter Noninteger decimation

- Interpolating filter:
   one LP filter for each possible interpolation value
- Only one filter used at each time
- Only 11 /15 clock cycles produce a result
  - Use other 4 cycles to reuse the multipliers
  - At each time sum two different samples







#### Quantization and correlation

- Astronomically interesting quantity: correlation of two Gaussian noiselike signals  $\langle x_1 x_2 \rangle$
- Actual computed quantity: correlation of the two digitized representations  $\langle X_1 X_2 \rangle$

 $X_{1,2} = round(x_{1,2})$  (odd N) or  $X_{1,2} = round(x_{1,2}+0.5)$  -0.5 (even N), clipped to +/- N/2

Examples: N=3  $\rightarrow$  X={-1, 0, +1} N=4  $\rightarrow$  X={-1.5, -0.5, +0.5. +1.5} •  $x_1x_2$  follow a joint Gaussian bivariate probability:

$$P(x_1, x_2) = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}} \exp\left[-\frac{1}{2(1-\rho^2)}\left[\frac{x_1^2}{\sigma_1^2} + \frac{x_2^2}{\sigma_2^2} - \frac{2\rho x_1 x_2}{\sigma_1\sigma_2}\right]\right]$$

Correlation

$$r = \langle x_1 x_2 \rangle = \rho \sigma_1 \sigma_2$$

- Quantized correlation  $R = \langle X_1 X_2 \rangle = g \rho \sigma_1 \sigma_2 + E$
- Errors:
  - Added noise N
  - Gain g
  - Nonlinear terms E
- All error terms function of  $\rho,\,\sigma_{_1}\,\sigma_{_2},\,N$ 
  - Assume  $\sigma_1 = \sigma_2$  as all effects are factorizable
  - R expressed as fraction of maximum representable value N/2

Applicazioni FPGA in ambito astrofisico Torino, 18-20 Maggio 2016

## **ADC linearity**

Response of an interferometer to a Gaussian correlated signal

- Gain variations (mainly clipping) for input RMS > 14% full range
- Distortion of the cross correlation for input RMS > 16% full range
- For 8 bit ADC: RMS must be < 37 ADU
- Added noise: 1/12 ADU <sup>2</sup> larger for less eff. Number of bits



Scale (gain) error and cross correlation nonlinear terms as a function of input level

#### Results

|        | Added Noise     | 4bit        | 5bit             | 6bit             | 7bit                                          | 8bit                                        | 9bit                                        |
|--------|-----------------|-------------|------------------|------------------|-----------------------------------------------|---------------------------------------------|---------------------------------------------|
| min    | ${0.5\%} {1\%}$ | -<br>0.252* | $0.255 \\ 0.180$ | $0.127 \\ 0.090$ | $\begin{array}{c} 0.065 \\ 0.045 \end{array}$ | $\begin{array}{c} 0.032\\ 0.022\end{array}$ | $\begin{array}{c} 0.017\\ 0.012\end{array}$ |
| Max    |                 | 0.252       | 0.260            | 0.263            | 0.265                                         | 0.267                                       | 0.268                                       |
| DR[dB] | $0.5\%\ 1\%$    | -<br>0*     | $0.2 \\ 3.2$     | $6.3 \\ 9.3$     | $12.2 \\ 15.4$                                | $18.4 \\ 21.7$                              | $23.9 \\ 27$                                |

- 4 bit quantization possible but basically no dynamic range
- 5 bit has some dynamic range if 1% added noise is acceptable
- More bits give ~6 dB of extended dynamic range per bit Internal quantization
- Same methodology and results apply to intermediate quantizations and ADC choice
- Internal quantizations cannot be practically corrected in the SDP
- Dynamic range requirements higher due to RFI immunity requirements
- 4-6 bit transport from LFAA to CSP looks problematic