Welcome to Ram Meenakshisundaram's Transputer Home Page

"…sequential computers are approaching a fundamental physical

limit on their potential power. Such a limit is the speed of light…"

Sundance SMT219 FPGA HARP2 Transputer Module

Features
General Description

The SMT219 utilises the benefits of a 32-bit RISC processor and standard Field Programmable Gate Array (FPGA) technology to offer a re-configurable hardware/software processing element. Designed around the industry standard Transputer Module (TRAM) architecture, it enables simple, rapid prototyping and application development of high performance systems.

The FPGA is mapped in to the microprocessor memory bus and can be reconfigured at any time by the processor. Once loaded, and a proper clock frequency established, the FPGA is a true co-processor which has full access to the bus for high-speed communications between it and the microprocessor.

SMT219 may be used as a platform for hardware/software co-design experiments. Application programs are typically split into two parts; a computationally demanding kernel or inner loop which is mapped into FPGA hardware, and the rest of the application which is compiled into machine code for the microprocessor.

The SMT219 use a T425, 32-bit integer transputer with 4 MBytes of external DRAM closely coupled to the FPGA. A bank of fast SRAM and a high performance frequency synthesizer is also included. Communication to the TRAM is via the standard TRAM signals or alternatively via the external interface headers of the FPGA which is fully programmable.

All the resources are controlled by the transputer. The FPGA may be used as a co-processor, dynamically re-programmed from the transputer. Or there may be a single configuration loaded at boot time. The boards may be used alone, or as part of parallel arrays.

The SMT219 can be used as development platforms for FPGA applications, and so are also useful in education and familiarization with the technology. The flexibility inherent in the almost completely programmable nature of the boards allows them be used to explore hardware options. Where volume does not justify the manufacture of a custom design, they can be used directly.

Other applications include "custom" interfaces to unusual hardware such as image or audio acquisition interfaces, or where a single piece of equipment needs to be re-programmed to suit different hardware standards. By using the on-board frequency synthesizer, and programming the FPGA to pass the signals through to the external connector, they can even be used as a signal generator.

The T425 processor has 4KBytes of on-chip memory, a 32-bit multiplexed data and address bus four serial communications links and various control services. Externally to the processor are an additional 4 MByte of DRAM which occupies a contiguous addresses space following the internal RAM. The DRAM is organized as 1 MByte of 32 bit wide memory.

The extra facilities, primarily the FPGA and control registers, appear as memory mapped ports in locations not used by standard TRAMs. The EventReq (similar to an interrupt) line is not generally available: it is connected both to the FPGA and to the frequency synthesizer. This allows the frequency synthesizer to report when the Phase Locked Loop (PLL) has stabilised. And also a configuration in the FPGA may likewise request attention from the transputer. This is intended to permit efficient system parallelism.

Technical Description

The board uses a Xilinx XC3195A FPGA in a 160 pin surface mount package. As already noted, the FPGA is connected directly to the transputer bus to deliver maximum bandwidth between the two processors. The circuit allows the FPGA to take over as bus master: thus it has direct access to the transputer DRAM. However, to improve system performance the the FPGA also has 4 private fast (20ns) SRAM chips. Each of these chips is byte-wide, and contains 32K bytes and are organised into two completely independent banks. Within a bank the address lines are shared, but the read and write control signals for the chips are distinct. Consequently, it is possible to have any two simultaneous operations at a single address within a bank: this might be a write of one byte, and a read of the other.

The circuit allows many different conceptual organisations for the SRAM: it may be used a single bank of 32K 32-bit wide words, or as two separate 16-bit wide memories. The organisation of this SRAM may even be changed from cycle-to-cycle. Data may be streamed across the FPGA from one bank to the other through a pipeline or other systolic array in the FPGA. Or the transputer may be moving data from one bank while part of the FPGA is processing data in the other.

The physical connections of the Static RAM chips are described here, but it is sometimes more important to understand the arrangement as viewed by an FPGA configuration. Of course, the way in which a configuration maps data and addresses onto the hardware is essentially arbitrary, and will not necessarily match the chip manufacturer's assignment. For example, a SRAM data sheet may label a particular data line as Q4, but there is no reason why this should carry the bit 3 (the first signal is called Q1) of a word. Although this is obvious, it can be a source of confusion if the system is not performing as expected, and you need to investigate the signals on the pins with an oscilloscope or logic analyser.

There is a more common reason for understanding the mapping: if two successive configurations access common data in the SRAMs, they must share the same view. The Handel-C programming environment provides this automatically.

Often the SRAMs are accessed indirectly from the transputer bus, either by a dedicated configuration, or concurrently with some other actions. It turns out that the Xilinx internal architecture is best utilized by mapping the data and addresses on to the SRAMs in particular ways.

In other circumstances, the pin assignments might be the result of a place and route algorithm, or the need to drive signals onto the external connections, some of which are shared with the SRAMs. So it may sometimes be necessary to relate the views: from the transputer seen "through" the FPGA; from the FPGA configuration, as described by the FPGA software; from the chip manufacturer's datasheets; and of the physical pins.

On the SMT219 external interfaces may be developed using the spare user pins and the SRAM bus interface for additional I/O. This is intended to allow higher bandwidth communication between boards, particularly in parallel arrays.

Warning:
It is impractical to include protection circuitry on these pins: not even current-limiting resistors. The pins connect direct to the FPGA, and also the SRAMs on the shared lines. It may be necessary to use external buffers and termination when driving long lines.

Obviously care must be taken when a board is connected to other systems: if it is reconfigured for some other unrelated application, this must not cause conflicting signals on the cables.

On the SMT219 the FPGA will usually be used as a co-processor, in some cases dynamically: it may be reloaded with different functions as required. If this happens frequently, it is important that the overhead in re-configuration does not outweigh the advantages of higher processing speed. The Xilinx chips may be configured at 10 Mbits/s. A Hardware Assisted Configuration (HAC) circuit whas been implemented to operate at more than 8 MHz with the 25 MHz transputer. The hardware is loaded a byte at a time which means the transputer still has to work hard to feed the configuration data rapidly without stalling the Xilinx interface. Once a byte is loaded, the eight bits are shifted into the FPGA at full speed, but there may be a pause if the transputer does not reload the register as soon as it is empty. Thus this may be a limiting factor in if fine grained parallel applications are to implemented.

Handel-C provides standard procedures to configure the FPGA using the HAC so most users need not be concerned with these details. However, the operation of the HAC port is simple. Provided the HAC bit in XC_fig has been set, immediately after a byte is written, 8 Cclk edges are emitted together with the eight bits in the byte on Din, least significant bit first. When the last bit is delivered, the sign bit, HAC.rdy goes high. Thus the transputer may write the next byte as soon as it detects the negative sign.

The maximum frequency at which a particular FPGA configuration can operate depends mainly on the depth of combinatorial logic and the quality of routing and placement in the design. The SMT219 board provides various clocking options. The transputer ProcClk signal is permanently available at an FPGA pin as are various strobes and external memory bus signals. These signals may be used to clock all or part of the FPGA internal circuit. The FPGA configuration can include a divider to produce sub-multiples of these clocks.

However, the FPGA has two dedicated clock inputs: these are driven in anti-phase from a multiplexor. This multiplexor can connect various clocks to the FPGA:

  1. 5MHz: The TRAM 5MHz master clock.
  2. ProcClockOut: The transputer output clock.
  3. PLL: A programmable frequency synthesizer.
  4. Single Step: A transputer port. This gives explicit software control.
  5. External Clock: This is intended to allow an array of boards to be driven by a common clock.
Note that the Single Step option allows the clock to be manipulated in any way or speed that the software can manage. It also provides a way to turn the clock off. Although the transputer ProcClockOut signal is already available at another FPGA pin, the dedicated clock pins have special circuitry which can be used to advantage to achieve a low clock skew, especially when driving a substantial portion of the FPGA.

The frequency range for the frequency synthsizer is from approximately 1.4MHz to 98.5MHz.

PCB Layout

Top and bottom views of the PCB

Handel-C Language

Handel-C is a simple programming language designed for compiling programs into hardware implementations. However, Handel is definitely not a hardware description language. It has been developed by Oxford University Computing Laboratory/Hardware Compilation Group with the long-term aim to investigate how a single programming language can be used effectively for creating systems with both hardware and software components. OUCL firmly believe that it is only by using the paradigm of computer programming, securely based on sound mathematical principles, that we can hope to address tomorrow's large-scale design problems. This argument has even more force when considering the design and implementation of parallel, safety-critical, or redundant systems.

The Handel-C language is based on the CSP algebra and has been greatly influenced by its programming language counterpart, the occam language.

Handel provides programmers with (variable-width) integers, arrays of integers, and channels. Handel expressions support the usual arithmetic and logical operations, as well as conditionals, shared sub-expressions, and bit-field extraction and concatenation operators. The language has control constructs for (parallel) assignment, sequential and parallel composition, communication and alternation, the usual while, if and case control structures, as well as a certain forms of procedure call.

These facilities are intended to support the efficient construction of higher-level programs, which might, for instance, include more complex procedure mechanisms or a wider range of data types.

Handel-C compiler

The syntax of Handel-C is reminiscent of the C language, whereas the semantics are based on CSP and occam. For a more in depth description of Handel-C please refer to the Handel-C programming manual.

Here is an example of a simple Handel-C program. It implements a Bresenham line-drawing algorithm, calculating and outputting a new pixel co-ordinate on every clock cycle:


   void main (chan(in) params : DataWidth,

   chan(out) output : DataWidth)



   { int dx, c1, c2, xi, yi, ri : DataWidth;

   params ? dx;

   params ? c1;

   params ? c2;

   while (xi << dx) par { yi,ri,xi="(ri">>= 0 ? yi+1 : yi),

            (ri >>= 0 ? ri+c1 : ri+c2),

                              xi+1;

                              output ! yi;        /* Assert: Time(output)=1 cycle */

      }

   }

As an alternative, programs can also be expressed in Handel-AS. These are Handel programs expressed as Abstract Syntax trees embedded in the SML language.

In this way, SML can be used as a very powerful meta-language for parametrising Handel programs in an elegant and efficient way. We use these facilities for complex compilation processes such as Automatic Design and Implementation of Microprocessors. Contact Sundance for details programming in Handel-ASA.

Here are some further examples of programs in Handel-C:


   const dw = 8;                  /* Width of incoming data */

   const nw = 4;                  /* 2 ^ nw = Number of inputs */

   const rw = dw + nw;     /* Width of the result */



   void main(chan (in) STDIN : dw,

		chan (out) STDOUT : rw)

   {

      int dat		: dw;

		int accumulator	: rw;

		int counter 	: nw;





	do { 

		STDIN ? data;

		accumulator = accumulator + (data @ 0);

		counter = counter + 1;

		} while (counter != 0);



	STDOUT ! accumulator;

   }

This program defines two channels, of different widths, to communicate with the environment (the HARP transputer by default). It then reads a sequence of numbers and writes their sum, using these channels. The widths of the variables and channels is chosen so that overflow cannot occur while calculating the sum.

   /* Simple microprocessor and fibonacci number generator */



   const dw 	= 8;		/* Data width */

   const opcw 	= 4;		/* Op-code width */

   const oprw 	= 4;		/* Operand width */



   const rom_aw 	= 4; 		/* Width of ROM address bus */

   const ram_aw 	= 4; 		/* Width of RAM address bus */



   const HALT 	= 0 : opcw;

   const LOAD 	= 1 : opcw;

   const LOADI 	= 2 : opcw;

   const STORE 	= 3 : opcw;

   const ADD 	= 4 : opcw;

   const SUB 	= 5 : opcw;

   const JUMP 	= 6 : opcw;

   const JUMPNZ 	= 7 : opcw;

   const INPUT 	= 8 : opcw;

   const OUTPUT 	= 9 : opcw;



   #define _asm_(opc,opr) ((opc)+((opr)*(1<< ram_aw] : dw; int pc : rom_aw; int ir : opcw + oprw; int x : dw; int opcode()="ir" <->opcw;

	int operand() 	= ir \\ opcw;



  	do {

	ir, pc = program[pc], pc + 1;

	case (opcode()) {

		HALT 	: skip; 

		LOAD 	: x = data[operand() <->ram_aw];

		LOADI 	: x = operand() @ 0;

		STORE 	: data[operand() <->ram_aw] = x;

		ADD 		: x = x + data[operand() <->ram_aw];

		SUB 		: x = x - data[operand() <->ram_aw]; 

		JUMP 	: pc = operand() <->rom_aw; 

		JUMPNZ 	: if (x != 0) pc = operand() <->rom_aw; 

		INPUT 	: STDIN 	? x; 

		OUTPUT 	: STDOUT 	! x; 	 default : stop;

	} 

      } while (opcode() != HALT);

   }

This is a simple microprocessor, defined as Instruction Set Processor program in Handel-C. It also contains a one-line assembler and a small machine code program for calculating Fibonacci numbers.
Channel Protocol Converters

The Handel-C compiler produces XNF netlists which contain a great deal of information for building the Channel Protocol Converters. This consists of:

  1. the detailed placement and routing of the CPC hardware (necessary since it contains metastability resolvers etc. for the asynchronous processor bus), and
  2. the occam device driver code which can be run on the host processor in order to maintain the channel communication semantics between hardware and software. This CPC information has been deleted entirely from the first example above, and mostly deleted from the second example.

The Handel-C language has been in use for some time as the basis for Sundance's hardware compilation and hardware/software co-design research. This original Handel language is embedded as an abstract syntax in the SML programming environment. Handel-C is essentially a concrete syntax front-end for this system so that users who don't want the power (or the learning curve!) associated with abstract syntax and meta-programming can compile their programs into hardware in a more familiar programming environment.

The necessary platform-specific support software consists of parameterised Channel Protocol Converters (CPCs) which generate occam software to run on the host transputer, and circuitry which resides on the FPGA. Together these implement channel communications between the software and hardware parts of the system. There are other CPCs to support access to the SRAMs etc.

Specification


SMT219 Specification
FeaturesUnitsNotes
Processor TypeIMST425

Speed25MHz
On-chip memory4KBytesDual access per single cycle
SRAM64KBytes0 wait states
DRAM4MBytes0 wait states
Links4
20MBit/s transputer links
JTAG InterfaceNo

Board SpecificationSize 4 TRAM

Length93.04mm
Width110.58mm
Height above PCB9.0mm
Height below PCB5.0mm[3]
Storage Temperature0-70C
Operating Temperature0-50C
WeightTBAg
Maximum Power3.95W[1]
Typical Power1.75W[2]
Notes:
  1. Worst case figures using 25MHz processor accessing 4MBytes DRAM at maximum rate. Does not include external I/O drice currents for FPGA
  2. Measured figures during normal operating conditions. Does not include external I/O drice currents for FPGA
  3. Includes PCB thickness
User Manual & Support Software

This page is copyright 1995, Sundance Multiprocessor Technology Ltd.