Digital Core Design

The Power of Intellectual Property

DFPAU

Floating Point Arithmetic Coprocessor

    The DFPAU is a Floating Point Arithmetic Coprocessor, designed to assist CPU in performing the floating point arithmetic computations. It replaces directly C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. Our effective coprocessor does not require any programming; there's also no need to any modifications to be done in the main software. Everything is done automatically during software compilation by the DFPAU C driver.

    The DFPAU was designed to operate with DCD’s DP8051 but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the package.
    In our coprocessor several specialized algorithms have been implemented to compute arithmetic functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value and change sign of a number. The input numbers format is according to IEEE-754 standard single precision real numbers. It is designed to be used with 8-, 16- and 32-bit processors. Trigonometric functions are supported indirectly, because they are computed as set of add, multiply and divide operations by software subroutines.
    The DFPAU is a technology independent design, that can be implemented in a variety of process technologies.


    Family summary

    Design Standard compliance Arithmetic operations
    ADD, SUB, MUL, DIV, SQRT, COMP
    Trigonometric operations
    SIN, COS, TAN, ARCTAN
    Processors interfaces
    8,16,32 bit
    Single precision Double precision 8/16/32 bit integers 52-bit integers
    DFPAU IEEE-754 + - + + - - -
    DFPMU IEEE-754 + + + + - + -
    DFPAU-DP IEEE-754 + - + + + + +
    DFPMU-DP IEEE-754 + + + + + + +

    The main features of each Arithmetic Coprocessors family member has been summarized in table above. It gives a briefly member characterization helping you to select the most suitable IP Core for your application.

    Performance

    Each core has been tested in variety of FPGA and ASIC technologies. Its implementation results are summarized below.

    Implementation Speed
    grade
    LUTs/PFUs Frequency
    [MHz]
    ispXPGA -5 2881/747 43

    DFPAU implementation results for LATTICE devices.
    All features have been included. 

    Implementation Speed
    grade
    Slices Frequency
    [MHz]
    SPARTAN-IIE -7 1310 42
    SPARTAN-3 -5 1290 49
    VIRTEX-E -8 1300 48
    VIRTEX-II -5 1300 75
    VIRTEX-II pro -7 1300 84
    VIRTEX-4 -11 1300 94

    DFPAU implementation results for XILINX devices. 
    All features have been included. 

    Implementation Speed
    grade
    Logic Cells Frequency
    [MHz]
    STRATIX -5 2210 115
    CYCLONE -6 2410 91
    CYCLONE-II -6 2280 96
    STRATIX-II -3 1680 169
    STRATIX-IV -2 1985 220

    DFPAU implementation results for ALTERA devices.
    All features have been included. 


    Info

    The tables and figures below illustrate the system with DFPAU performance improvements for two typical CPU.
    The DFPAU floating point instructions performance has been compared to standard C library functions delivered with every commercial C compiler. Each program was executed in the same system environments. Number of clock periods were measured between input data loading into work registers and output result storing after operation. The results are placed in tables below.
    Improvement has been computed as a number of clock cycles required by the CPU to compute FP operation, by the number of clocks required to compute the same operation by system of CPU with DFPAU:

    DP8051 BASED SYSTEM

    The following table gives a survey about the DP8051+DFPAU performance compared to std 8051 microcontroller.

    Device Improvement
    80C51 1.0
    DP8051  15.5
    DP8051+DFPAU  91.0

    Improvements of particular operations is presented below.

    IEEE-754 FP Instruction Improvement
    Addition 73
    Subtraction 60
    Multilication 65
    Division 182
    Square Root 392
    Sine 10
    Cosine 10
    Tangent 12
    Arcs Tangent 17
    Average speed improvement: 91

    32-BIT RISC BASED SYSTEM

    The table below shows performance improvements of the sample 32-bit-RISC CPU with DFPAU, compared to the same system without the DFPAU coprocessor.

    Device Improvement
    CPU 1.0
    CPU+DFPAU (arithmetic)  7.5
    CPU+DFPAU (trigonometric)  5.9
    CPU+DFPAU (overall) 6.8

    Improvements of particular operations is presented below.

    IEEE-754 FP Instruction Improvement
    Addition 6.4
    Subtraction 6.5
    Multilication 5.1
    Division 6.5
    Square Root 12.9
    Sine 5.2
    Cosine 5.4
    Tangent 5.8
    Arcs Tangent 7.2
    Average speed improvement: 6.8

     

    Key Features

    • Direct replacement for C float software functions such as: +, -, *, /,==, !=,>=, <=, <, >
    • C interface supplied for all popular compilers: GNU C/C++, 8051 compilers
    • No programming required
    • IEEE-754 Single precision real format support – float type
    • Flexible arguments and result registers location
    • Performs the following functions:
      • FADD, FSUB – addition, subtraction
      • FMUL, FDIV – multiplication, division
      • FSQRT – square root
      • FCHS, FABS – change of sign, absolute value
      • FXAM – examine input data
      • FUCOM – comparison
    • Exceptions built-in routines
    • Masks each exception indicator:
      • Precision lack PE
      • Underflow result UE
      • Overflow result OE
      • Invalid operand IE
      • Division by zero ZE
      • Denormal operand DE
    • Fully synthesizable
    • Static synchronous design
    • Positive edge clocking and no internal tri-states
    • Scan test ready

    Applications

    • Math coprocessors
    • DSP algorithms
    • Embedded arithmetic coprocessor
    • Fast data processing & control

    Symbol

     datai1 (31:0)
     addr2 (4:0)
     cs
     we
    datao1 (31:0) 
    irq 

    Pins description

    PinTypeDescription
    datai1 (31:0)inputData bus input
    addr2 (4:0)inputRegister address to read/write
    csinputChip select for read/write
    weinputData write enable
    datao1 (31:0)outputData bus output
    irqoutputInterrupt request indicator

    Block Diagram

    AlignIt performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module.
    ExponentIt performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers.
    InterfaceIt is an interface between external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.

    1 - data bus can be configured as 8-, 16- or 32- bit depends on processor's bus size
    2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors
    datai1 (31:0)
    datao1 (31:0)
    addr2 (4:0)
    cs
    we
    irq
    MantissaIt performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers.
    ShifterIt performs mantissa shifting during normalization, denormalization operations. Information about out-shifted bits is stored for rounding process.
    Control UnitIt manages execution of all instructions and internal operation required to carry particular function.
    Exponent bus Exponent data bus is 17-bit wide bus used for exponent transferring between modules.
    Mantissa Mantissa data bus. It is 70-bit wide internal bus used for mantissas transferring between modules.
    Control bus Control bus is intended for control signals connected to each module. Main control is performed by Control Unit.

    Units

    Align
    It performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module.
    Exponent
    It performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers.
    Interface
    It is an interface between external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.

    1 - data bus can be configured as 8-, 16- or 32- bit depends on processor's bus size
    2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors

    Mantissa
    It performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers.
    Shifter
    It performs mantissa shifting during normalization, denormalization operations. Information about out-shifted bits is stored for rounding process.
    Control Unit
    It manages execution of all instructions and internal operation required to carry particular function.