# **Research Article**

# Design and Implementation of Minimal adaptive West first algorithm for NoC Router Architecture

# Rohini<sup>a\*</sup>

<sup>a</sup>Industrial Electronics, KLS's VDRIT, Haliyal Affiliated to VTU, Belgaum, Karnataka, India

#### Abstract

As the feature size is continuously decreasing and chip integration is increasing, bus connections have become a dominating factor in determining the overall quality of a System on Chip. Long global wires also cause many design problems, such as routing traffic, scalability, latency and throughput. Network-on-Chip NoCs are an evoluting architecture to be used in future systems, due to its increased performance, reusability and scalability. A NoC is a set of interconnected switches, with IP cores[1] connected to these switches. Routing plays an important role in determining latency and delay of router in NoC. In this paper mesh based router architecture using minimal West First Routing Algorithm is presented. The delay and latency performance of routers have been analyzed through simulation.

Keywords: ASIC, NoC, PE, RTL, SoC

# 1. Introduction

NoCs are an attempt to miniaturize concepts of large networks, and apply them to the system-on-chip SoC domain. NoCs use packets to send data from the source to the destination component, via an architecture that consists of routers and interconnection links. An example of a NoC system is illustrated in fig.1. The figure shows a NoC interconnection architecture with a mesh topology, consisting of several processing elements PEs connected together via routers and regular sized wires

| PE | PE |      | PE |
|----|----|------|----|
| PE | PE |      | PE |
| PE | PE |      | PE |
| PE | PE | PE · | PE |

Figure1.NoC architecture

A PE in this case may be any component such as a microcontroller, DSPs, or a block of memory. A network interface NI at the boundary of each PE is used to packetize any data generated by the PE. This NI is connected to a router, which has buffers at its input to accept data packets from a PE or from other routers. A crossbar switch inside the router is used to route the data packets from the input buffers to the appropriate output link, based on the address in the packet header.

This network can be modeled as a graph wherein nodes, processing elements and edges are the connective links of the processing elements. In this paper designing and implementing NoC router are presented. In our design dead-lock free west first routing algorithm is applied to a 2D mesh topology and wormhole switching is used along with data link layer flow control.

#### 2. Block Diagram of the Router



Figure 2 block diagram of the router

As shown in figure 2, Router is having five ports east, west, north, south and local port. Each port is having its input and output channel, and each input and output channel is having its control and decoding logic, which supports five parallel connections at the same time simultaneously.

The input channel consists of three parts i.e. FIFO, FSM, and West First routing logic. The FIFO here is used as input buffer to store the data temporarily. It is of 8 bits size and is of depth 16 bits. The first 8 bits of FIFO is header containing the coordinates of destination port, thus

<sup>\*</sup>Corresponding author: Rohini

the size of packet varies from 8 to 120 bits. FSM is used to control the read and write operation of FIFO, according to its status. If FIFO is empty, FSM generates the signal to perform the write operation and if FIFO is full, FSM generates signal to perform the read operation. West First Routing logic is the adaptive minimal logic which analyzes the header of packet to find out its destination address. West First Routing logic compares the coordinates values stored in header with the locally stored coordinates values and send the data to its destination port. Similarly the output channel consists of three parts i.e. FIFO, FSM and arbiter. The FIFO here is used as output buffer to store the data temporarily. It is of 8 bits size and is of depth 16 bits the 8 bits are the header, thus the packet size varies from 8 bits to 120 bits. FSM is used to control the read and write operation of FIFO according to its status. If FIFO is empty, FSM generates signal to perform write operation and if FIFO is full FSM generates signal to perform read operation. Arbiter is used in output channel to overcome the problem of multiple input requests coming at single output port. Arbiter is based on rotating priority scheme in which each port get reduced its priority once it has been served.

# 3. Utilized routing algorithm

West First Routing Logic is used for comparing the coordinates stored in header with the locally store coordinates and thus find out the destination address of packet. The input signals will be first four bit of header which is coming from FIFO and the output signal will be requests going to output port. The minimal west-first routing algorithm for 2-D meshes is as given below.

#### Algorithm: Minimal West-First Algorithm for 2-D Meshes

Inputs: Coordinates of current node Xcurrent, Ycurrent and destination node Xdest, Ydest Output: Selected output Channel Procedure: *Xoffset* := *Xdest* - *Xcurrent*: *Yoffset* :=*Ydest* -*Ycurrent*; if Xoffset < 0 then Channel := X - ;endif if Xoffset > 0 and Yoffset < 0 then Channel := SelectX+, Y-; endif if Xoffset > 0 and Yoffset > 0 then Channel := SelectX+, Y+; endif if Xoffset > 0 and Yoffset = 0 then Channel := X+: endif if Xoffset = 0 and Yoffset < 0 then Channel := Y-: endif if Xoffset = 0 and Yoffset > 0 then Channel := Y+: endif if Xoffset = 0 and Yoffset = 0 then Channel := Internal; endif



Figure 3 shows the minimal west-first routing algorithm for 2-D meshes, where *Select* is the selection function. This function returns a free channel if any from the set of channels passed as parameters. The channels marked as unavailable are either faulty or being used by other packets. One of the paths shown is minimal, while the other two paths are nonminimal, resulting from routing around unavailable channels. Because cycles are avoided, west-first routing is deadlock-free. For minimal routing, the algorithm is fully adaptive if the destination is on the right-hand side east of the source otherwise, it is deterministic.

Routing module performs the deadlock free adaptive routing function. It detects the header from the FIFO outgoing flits and runs the routing algorithm to select an output channel. After detecting the header, it will send a request to any of the output channels. The selected output channel will send a grant signal to that request when it is free. This grant signal is coming to the IRS block of the Input Channel module. The Input Channel module also sends a 'read- ok' signal to the selected output channel. The read-ok signal is high if the outgoing it is a valid one. The request and read-ok signals remain high until tailer is coming out from FIFO. Flits are written into the buffer with the write clock supplied by the previous router and are coming out from the buffer synchronously with the router's own clock.

# 4. Utilized Switching Technique

The NoC switching strategy determines how data flows through the routers in the network. PEs generate messages that are partitioned into possibly several data packets. A packet is further divided into multiple flits flow control unit. Each flit is made up of one or more phits physical units. A phit is a unit of data that is transferred on a link in one clock cycle as shown in figure 4.



Figure4. The structure of phits, packets, and messages.

In this design we have used packet switching to transfer data, particularly wormhole switching where a flit from a packet is forwarded to the receiving router. If there is insufficient space in the next router to store the entire packet, parts of the packet are distributed among two or more routers.

#### 5. Flow control

In the ACK/NACK scheme, when flits are sent on a link, a local copy is kept in a buffer by the sender. When it

270 | Proceedings of National Conference on 'Women in Science & Engineering' (NCWSE 2013), SDMCET Dharwad

arrives at the receiver, either an acknowledge ACK or negative acknowledge NACK is sent back. When an ACK is received by the sender, it deletes its copy of the flit from its local buffer. When a NACK is received, the sender rewinds its output queue and starts resending its, starting from the corrupted one. ACK/NACK can be implemented either end-to-end between the sender and receiver, or at the switch-to-switch level as shown in figure 5 a and b.



Figure 5: ACK/NACK scheme

# 6. Clocking schemes

In NoCs, several different clocking schemes are possible, such as synchronous, mesochronous, pleisochronous, and asynchronous. In the fully synchronous case, a single global clock is distributed to synchronize the entire chip. The clock signal arrives simultaneously at the local flipflops of routers, nodes, and buffered links all over the chip.To overcome this problem, multiple clock domains are used. In the mesochronous case, local clocks are derived from a global clock that has been distributed all across the chip. All synchronous modules in a mesochronous system use the same clock source, but the phase between clock signals in different modules may differ due to an unbalanced global clock network. In this design we have used Globally Asynchronous and Locally Synchronous clocking scheme as shown in figure 6.



Figure 6 GALS module

### 7. Router design

The mesh based router consists of multi ports such as east, west, north, south and local port. It also has a central cross point matrix. Inside each port there are two channels input and output. Data packet is sent from one port moves in to the input channel of router by which it is forwarded to the output channel of the other port. Each input channel and output channel has its own decoding logic which increases the performance of the router. Buffers are used at all ports to store the data for a short time span. The store and forward method is used here for data transmission. Control logic is present to make decisions to grant access to a port request. In this way communication is established between input and output ports. The transfer of data from source to destination is called packet switching mechanism where the flit size is 8 bits.

In router structural modeling is used in which input channel of all 5 ports, output channel of all 5 ports and cross point matrix is used as component to form complete router. The input signals here are five data input signals; five requests signals, and five acknowledgement signals. The output signals are five data output signals, five acknowledgement signals and five request signals. The port mapping of each component is done to connect all.

# 8. Result and discussion

Two router architectures are implemented in structural Register-Tranfer-Level RTL Verilog and then synthesized in a Xilinx 13.1. Router design is simulated in ISIM 13.1 by creating a dummy environment around its surroundings.

#### Synthesized result

Synthesis process converts user's hardware description into structural logic description. It provides a means to covert schematics of HDL into real world hardware. Synthesis tools convert the described hardware into a net list that a vendor may use to create a chip or board. Figure 7 represents the synthesis of input channel using west first algorithm, Figure 8 represents the synthesis of output channel.



Figure 7 synthesized output of input channel



Figure 8 synthesized output of output channel

#### Simulation results

Simulation refers to the verification of a design, its function and performance. It is process of applying stimuli to a model over time and producing corresponding responses from a model. Figure 9 represents the simulation of input channel using west first algorithm, figure 5.2 represents the simulation of output channel

|                       |          |                 |          |          |          |          |          |          |          | 2,500.083 ms |          |
|-----------------------|----------|-----------------|----------|----------|----------|----------|----------|----------|----------|--------------|----------|
| Name                  | Value    |                 | 2,150 ns | 2,200 ns | 2,250 ns | 2,300 ms | 2,350 ns | 2,400 ns | 2,450 ns | 2,500 ms     | 2,550 ns |
| 🕨 🍟 dətə_in(7:0)      | 11111110 |                 |          |          | 111111   |          |          | 000      | 1013     | 111          | 1111     |
| 🔓 wr_dk               | ٥        |                 |          |          |          |          |          |          |          |              |          |
| 🔓 rđ_dk               | ٥        |                 |          |          |          |          |          |          |          |              |          |
| 🔓 rst,n               | 1        |                 |          |          |          |          |          |          |          |              |          |
| 🍹 injial              | 1        |                 |          |          |          |          |          |          |          |              |          |
| 🕨 🚺 op_req            |          |                 |          |          |          |          |          |          |          |              |          |
| Cing 🚽                | 1        |                 |          |          |          |          |          |          |          |              |          |
| 🔓 get1                | 1        |                 |          |          |          |          |          |          |          |              |          |
| lig get2              | ٥        |                 |          |          |          |          |          |          |          |              |          |
| lig gnt3              | 0        |                 |          |          |          |          |          |          |          |              |          |
| in_req                | 1        |                 |          |          |          |          |          |          |          |              |          |
| in ack                | 1        |                 |          |          |          |          |          |          |          |              |          |
| 🕨 📜 req               |          |                 |          |          |          |          |          |          |          |              |          |
| ▶ 📲 data_out(7:0)     | 00001010 |                 |          |          | 111111   |          |          | 000      | 1010     | 111          | 1110     |
| lig rd_ok             | 1        |                 |          |          |          |          |          |          |          |              |          |
| R rd_en_wire          | 1        |                 |          |          |          |          |          |          |          |              |          |
| wr_en_wire            | 1        |                 |          |          |          |          |          |          |          |              |          |
| in ffo full wire      | ٥        |                 |          |          |          |          |          |          |          |              |          |
| in ffo_empty_wire     | ٥        |                 |          |          |          |          |          |          |          |              |          |
| 🕨 👹 xd_wire[10]       | 00       |                 |          |          | 11       |          |          | <u> </u> |          | 0            |          |
| vd_wire[1:0]          | 00       |                 |          |          | 11       |          |          | <u> </u> |          | 0            |          |
| ha ha fit, at a, wire | ٥        |                 |          |          |          |          |          |          |          |              |          |
| in tal fit atd wire   | 1        |                 |          |          |          |          |          |          |          |              |          |
|                       |          |                 |          |          |          |          |          |          |          |              |          |
|                       |          | 31: 2,500.083ms |          |          |          |          |          |          |          |              |          |

Figure 9 simulation output of input channel



Figure 10 simulation output of output channel

Using these figures we can understand the working of the input and output module. Now we send a data packet from

west port of router to east port and south port to router as shown in figure 11 and 12 respectively.



Figure 11: data packet is sent from west port to east port

|                                |       | 5002.016 ms<br>5,250.000 ms 5,728.310 ms 5,728.310 ms   |
|--------------------------------|-------|---------------------------------------------------------|
| Name                           | Value | 5,000 ns 15,200 ns 15,400 ns 5,600 ns 5,600 ns 5,600 ns |
| l <mark>a</mark> w <u>r</u> ck | 1     |                                                         |
| 🔓 rd_ck                        | 1     |                                                         |
| lla rst_n                      | 1     |                                                         |
| 🕨 📑 data_in_w(7:0)             | 00    | (F) (50 ) 70 (Fe ) (C)                                  |
| 🕨 📑 dəta_in_s(7x)              | 00    | (F) (8) (8) (8)                                         |
| 🔓 ingvalger                    | D     |                                                         |
| l <sub>a</sub> injualis        | 0     |                                                         |
| la out_req_e                   | 1     |                                                         |
| 🔓 outjackje                    | 1     |                                                         |
| ▶ 📲 data_out_e[7:0]            | 00    | 00 (ff x50)70 (fe x 00 (ff x50)80 (fe 00                |
| le out_val_e                   | 1     |                                                         |
| 🔓 injreqis                     | 1     |                                                         |
| 🔓 ingreque                     | 1     |                                                         |
| 🔓 injackjiw                    | 0     |                                                         |
| lin_ack_s                      | 9     |                                                         |
| 🕨 📑 input                      |       |                                                         |
|                                |       | x1: 5,002.016 ns                                        |

Figure 12: data packet is sent from west port to east port and south to north port

| Table    | 1: | Comparison | of | performance | of | routing |
|----------|----|------------|----|-------------|----|---------|
| algorith | ms |            |    |             |    |         |

| Parameters                             | XY algorithm       | West first<br>algorithm |  |  |
|----------------------------------------|--------------------|-------------------------|--|--|
| Number of slices                       | 51 out of<br>14572 | 48 out of 14572         |  |  |
| Number of slice Flip<br>Flops          | 67 out of 29504    | 62 out of 29504         |  |  |
| Number of 4 input<br>LUTs              | 74 out of 29504    | 61 out of 29504         |  |  |
| Number of IOs                          | 38                 | 36                      |  |  |
| Number of bonded<br>IOBs               | 38 out of 376      | 36 out of 376           |  |  |
| Maximum<br>combinational path<br>delay | 6.671ns            | 5.58ns                  |  |  |

The above table tells us the improvement in the performance of the router using the proposed west first routing algorithm over the existing XY algorithm. Here the hardware utilization is less compared to XY algorithm. So it takes lesser space in the hardware. It has 5.58ns of combinational path delay between the transfer of packets whereas the XY algorithm leads to a 6.671ns delay between the transfer of packets.

#### Conclusion

In this paper, router architecture using west first algorithm is implemented using HDL called Verilog at RTL level. Architecture is synthesized using Xilinx and simulated using ISIM 13.3 for evaluating latency and delay of two routers. The simulation result show that there is same latency but zero clock delay by using west first algorithm. Network on chip presents a most adapted technology to perform communication in complex SoCs. In this work, we present a west first routing algorithm related router which offers a low latency 1 clock cycles and high speed 216 Mhz communication for on chip modules. This project presents all the details of our NoC such as topology, routing algorithm, dynamic arbiter and router module. This architecture offers a variety of SoC communication services owing to its flexibility and adaptability.The physical parameters of the designed router consist on the width and the depth of the FIFO, the number of the input/output ports, the valence and the maximal diameter of the NoC. Compared with other NoC, the advantage of this architecture, resides in its capacity to handle a suitable cost/performance compromise in the field of NoC. This is due to its wide constant and low diameter, latency of the router, frequency and power consumption. The simulation and implementation of this architecture show its performances and effectiveness.

#### References

- Kumar, S.; et al.2002, A Network on Chip Architecture and Design Methodology. In: *IEEE Computer Society Annual Symposium on VLSI*.ISVLSI02, Apr. 2002, pp. 105-112.
- J. W. Dall and B. Towles 2001, Route packets, not wires: On-Chip inter-connection networks, *Proc. IEEE International Conference on Design and Automation*, pp. 684-689
- Seyyed Amir Asghari, Hossein Pedram, Mohammad Khademi, and Pooria Yaghini (2009), World Applied Sciences Journal 6 1: 88-93, 2009 ISSN 1818-4952 © IDOSI, 2009
- Jayashree S (2011), Study of Fuzzy based routing algorithm in Manet at CETIT-2011By,Under the guidance of Dr.G.R.Udupi.
- A Security attack resilient and Power Efficient Routing Protocol for Manets at ICETIT 2012" By, Jayashree S,Under the guidance of Dr.G.R.Udupi.
- UNLV, Theses/Dissertations 1-1-2009, Performance evaluation of network-on-chip interconnect architectures, by Xinan Zhou University of Nevada Las Vegas.
- Inter connection Networks" By Jose DUATO, Sudhakar Yalamanckilli