An important function of any modern digital communications system is error control coding. Such coding is the field of communications that deals with techniques for detecting and correcting errors in a signal. Though used in a variety of systems, error control coding is especially useful in wireless communications systems. Such systems typically operate with a low signaltonoise ratio (SNR) and suffer from distortion because of a multipath channel. The harsh wireless environment means that the received signal is prone to errors.
Typical communications systems use several codes that are suited to correcting different types of errors. ReedSolomon (RS) codes are the most powerful in the family of linear block codes and are arguably the most widely used type of error control codes.
Error control coding
The fundamental concept of error control coding is the addition of redundancy to a signal at the transmitter, and the exploitation of that redundancy at the receiver to detect and/or correct errors. The inclusion of redundancy in the transmitted signal results in a coded signal consisting of more bits than the original uncoded signal. The tradeoff for this overhead is the ability to detect, and possibly correct, errors at the receiver. The performance improvement that occurs when using error control coding is often measured in terms of coding gain. Suppose an uncoded communications system achieves a given bit error rate (BER) at an SNR of 30 dB. Imagine that an error control coding scheme with a coding gain of 3 dB was added to the system. This coded system would be able to achieve the same BER at the even lower SNR of 27 dB. Alternatively, if the system was still operated at an SNR of 30 dB, the BER achieved by the coded system would be the same BER that the uncoded system achieved at an SNR of 33 dB. The power of the coding gain is that it allows a communications system to either maintain a desired BER at a lower SNR than was possible without coding, or achieve a higher BER than an uncoded system could attain at a given SNR.
ReedSolomon codes
RS codes first appeared in technical literature in 1960. Since their introduction, they have seen widespread use in a variety of applications. These applications include interplanetary communications (e.g., the Voyager spacecraft), CD audio players, and countless wired and wireless communications systems.
RS codes belong to the family known as block codes. This family is so named because the encoder processes a block of message symbols and then outputs a block of codeword symbols. This method is in contrast to the other major coding family known as convolutional codes. Instead of processing message symbols in discrete blocks, a convolutional encoder works on a continuous stream of message symbols and simultaneously generates a continuous encoded output stream. These codes get their name because the encoding process can be viewed as the convolution of the message symbols and the impulse response of the encoder.
To be specific, RS codes are nonbinary systematic cyclic linear block codes. Nonbinary codes work with symbols that consist of several bits. A common symbol size for nonbinary codes is 8 bits, or a byte. Nonbinary codes such as RS are good at correcting burst errors because the correction of these codes is done on the symbol level. By working with symbols in the decoding process, these codes can correct a symbol with a burst of eight errors just as easily as they can correct a symbol with a single bit error. A systematic code generates codewords that contain the message symbols in unaltered form. The encoder applies a reversible mathematical function to the message symbols in order to generate the redundancy, or parity, symbols. The codeword is then formed by appending the parity symbols to the message symbols. The implementation of a code is simplified if it is systematic. A code is considered to be cyclic if a circular shift of any valid codeword also produces another valid codeword. Cyclic codes are popular because of the existence of efficient decoding techniques for them. Finally, a code is linear if the addition of any two valid codewords also results in another valid codeword.
Galois Fields
The theory of error control codes uses a mathematical construct known as finite fields or Galois Fields (GFs). A GF is a set that contains a finite number of elements. The operations of addition and multiplication on this set are defined and the operations behave as would be expected from normal arithmetic. For example, the additive identity element is 0 and the multiplicative identity element is 1. A more rigorous mathematical definition of GFs is beyond the scope of this article but the references contained at the end of this article will provide the interested reader with a good starting point. For the sake of brevity, our discussion of GFs will be limited to what the reader needs to know in order to actually implement an encoder/decoder.
RS codes operate on GFs of order q = p^{m} where p is a prime positive integer and m is a positive integer. A GF of order q is denoted by GF(q) and it contains q distinct elements. A typical value of q in practical RS systems is q = 256. The elements of a GF are typically denoted using the variable α. The elements of GF(8) are shown in Table 1 using different notations. Each line in the table corresponds to a single element in the field. Elements are typically represented using either power or polynomial notation when performing calculations by hand, but binary notation is used when the codes are actually implemented in hardware. All three notations are simply three different ways to represent a given GF element. Multiplication is easier in power notation because the exponents are added. Similarly, addition is easier in polynomial notation.
Numerous books and computer programs exist that list or generate the elements for GFs of various sizes. To implement an RS encoder and decoder, two special hardware blocks will be needed: a GF adder and a GF multiplier.
Galois Field adder
The adder computes the sum of two GF elements by XORing the corresponding bits of each symbol together. For example, the sum of α^{3} and α^{5} can be found, by hand, using polynomial notation. α^{3} is equivalent to X + 1 and α^{5} to X^{2} + αX + 1. Because of the modulo properties of GFs, addition is the same as subtraction. Thus, X + X = X  X = 0. Keeping this property in mind, the addition of α^{3} and α^{5} can be written as follows:
Thus α^{3} + α^{5} = α^{2}. This operation would be computed in hardware by XORing the bits of the two symbols together as follows:
Note from the table that the binary notation 100 corresponds to the power notation α^{2} found by hand calculations.
Galois Field multiplier
A simple, but inefficient, way to implement a GF multiplier is to take the inputs in binary notation and use a lookup table to find their corresponding power notations. The powers can be added (e.g., α^{2} × α^{5} = α^{7}) and the result can be sent to an inverse lookup table to find the corresponding binary notation for the output. These lookup tables can be stored in readonly memory (ROM), but that is not practical because several multiplier outputs will typically need to be computed during a single clock cycle.
A more efficient solution is to compute the equations by hand and simplify the terms. An RS encoder needs fixed multipliers that multiply an arbitrary input by a constant value such as α^{2}. The RS decoder needs both fixed multipliers as well as generic multipliers that are capable of multiplying any two arbitrary numbers together. The following example will show how to derive the equations for a fixed multiplier that multiplies its input by α^{2}.
Denote an arbitrary 3bit multiplier input by b_{2}α^{2} + bα + b_{0}. The multiplier will multiply this input by the constant α^{2}:
Recall from Table 1 that α^{4} = X^{2} + X and α^{3} = X + 1. By replacing X with and substituting the result back into the above equation, it can be rewritten as:
The above equation represents the simplified equation for multiplying an arbitrary input by α^{2}. For example, the most significant bit (MSB) of the multiplier output is computed by XORing the input bits b_{2} and b_{0} together. The derivation for the generic multiplier can be found by simplifying the equations after using two arbitrary values for the inputs.
Code parameters
A given ReedSolomon code is indicated by referring to it as an (n, k) code. The parameter n indicates the codeword length in terms of the number of symbols in the codeword. The parameter k indicates the number of message symbols in the codeword. The number of parity symbols added is thus n  k. Figure 1 illustrates the structure of an RS codeword. The errorcorrecting capability of the code is t = (n  k)/2. The code can detect and correct T errors where 0 T t.
ReedSolomon encoder
The purpose of the encoder is to generate the codeword based on the message symbols. Because RS is a systematic code, the n message symbols are transmitted as is and the n  k parity symbols are appended to the message symbols to form the code word. The values of the parity symbols depend on the message symbols and they add redundancy to the transmitted codeword. This redundancy is exploited at the receiver to detect and correct errors.
The parity symbols are computed by performing a polynomial division using GF algebra. The steps involved in this computation are as follows:

Multiply the message symbols by X^{nk} (This shifts the message symbols to the left to make room for the nk parity symbols)

Divide the message polynomial by the code generator polynomial using GF algebra.

The parity symbols are the remainder of this division. These steps are accomplished in hardware using a shift register with feedback. The architecture for the encoder is shown in Figure 2.
The encoder first clocks the k message symbols from the message input M into the shift register and simultaneously clocks them out as the first k symbols of the codeword. Once the last message symbol is clocked into the shift register, the n  k parity bytes are located in the shift registers SR1 through SR(2t). The rest of the codeword is formed by clocking each of the n  k parity bytes out of the shift register to the codeword output C.
Three muxes are used in the encoder (Msg Mux, Feedback Mux, and Code Mux) and all three are controlled by the same Select signal. Select is 0 for the first k cycles starting with the cycle that the first message symbol is clocked in, and it is 1 for the remaining n  k clock cycles.
ReedSolomon decoder
The purpose of the decoder is to process the received code word to compute an estimate of the original message symbols. There are three main blocks to the decoder: the Syndrome Generator, Euclid's Algorithm, and the Chien/Forney block. The output of the Chien/Forney block is an estimate of the error vector. This error vector is then added to the received codeword to form the final codeword estimate. A toplevel block diagram of the decoder is shown in Figure 3. Note that the error value vector Y comes out of the Chien/Forney block in reverse order, and it must pass through a lastin, firstout (LIFO) block before it is added to the received codeword R(x).
Syndrome generator
The first step in the decoder is to compute the syndrome. The syndrome consists of n  k symbols and the values are computed from the received code word. The syndrome depends only on the error vector, and is independent of the transmitted code word. That is, each error vector has a unique syndrome vector. However, many different received code words will have the same syndrome if their error pattern is the same.
The purpose of computing the syndrome first is that it narrows the search for the actual error vector. Originally, a total of 2^{n} possible error vectors would have to be searched. However, by finding the syndrome first, this search is narrowed down to looking at just 2^{nk} possibilities.
The syndrome can be computed mathematically by dividing the received code word by the generator polynomial using GF algebra. The remainder of this division is called the syndrome polynomial s(x). The actual syndrome vector S(x) is computed by evaluating s(x) at a through α^{nk}. However, this method is not efficient from a hardware standpoint. The alternative method typically used in hardware is to directly evaluate the received code word R(x) at a through α^{nk}.
The Syndrome Generator module computes the syndrome S by evaluating the received code word R(x) at through α^{nk}. That is, R(α) through R(α^{nk}). In the RS code n  k = 2t, and thus there are 2t syndrome values to compute: [S1 S2 S3…S(2t)]. These values are computed in parallel as shown in Figure 4. The first syndrome generator evaluates the received code word at to form S1, the next generator evaluates the received code word at a^{2} to form S2, and so on.
The Syndrome Generator module will also contain hardware that checks for an allzero syndrome. If all of the syndrome values are zero, then there are no errors and the Euclid's algorithm block and the Chien/Forney block are disabled. The received code word then becomes the codeword estimate.
Euclid's algorithm
Euclid's algorithm processes the syndrome S(x) to generate the error locator polynomial Λ(x) and the error magnitude polynomial Ω(x). That is, it solves the following equation that is referred to as the Key Equation:
Λ(x) [1 + S(x)] = Ω(x) mod x^{2t+1}
The algorithm used in RS decoding is based on Euclid's algorithm for finding the greatest common devisor (GCD) of two polynomials. Euclid's algorithm is an iterative polynomial division algorithm (for the actual steps, contact the author).
Hardware implementation
The toplevel block diagram of the Euclid's algorithm block is shown in Figure 5. This architecture uses two types of registers. Each of the registers contains two sets of internal registers. These register sets are referred to as A and B (upper and lower registers, respectively). The number of registers is len = 2t + 10.
After the syndrome values are computed, the Euclid block begins its processing. The control state machine is responsible for controlling the operation of the block. Each register set (A and B) will contain two polynomials. These polynomials are stored such that their most significant coefficients are on the right, and their least significant coefficients are on the left (for the precise steps performed by the blocks, contact the author).
Chien Search
Once the error locator polynomial Λ(x) has been computed, it needs to be evaluated to find its roots. The Chien Search (CS) algorithm is used to find these roots (see Figure 6). The CS is a brute force algorithm that evaluates the polynomial for all possible input values, and then checks to see which outputs are equal to zero. If an error occurs in position i, then the following equation equals zero:
where i = 0 .. (n  1)
The CS evaluates the above equation for all the values of i and j and counts the number of times that the equation is equal to zero. The locations of the zeros are the error locations, and the number of zeros is the number of symbols in error.
There are (t + 1) stages of the CS that are implemented in hardware. Each of these stages (where a stage consists of a multiplier, mux and register) represents a different value for j in the above CS equation. The search is run for n clock cycles (each clock cycle represents a different value of i in the above equation) and the output of the adder is examined to see if it is equal to zero. If it is equal to zero, the Zero Detect block will output a 1, otherwise, it will output a zero. The output of the Chien Search block is thus a string of n bits that have values of either 0 or 1. Each 1 represents the location of a symbol in error.
For the first clock cycle, the mux will route the error locator polynomial coefficient into the register. For the remaining (n  1) clock cycles, the output of the multiplier will be routed via the mux into the register. The exponents of the multipliers have negative values. However, these values can be precomputed using the modulo operator. The exponent of i is equal to ( i modulo n) = (i modulo 255). For example, α^{1} equals α^{254}, α^{2} equals α^{253}, and so on.
Forney algorithm
The Forney algorithm is used to compute the error values Y_{i}. To compute these values, the Forney algorithm needs the error locator polynomial Λ(x) and the error magnitude polynomial Ω(x). The equation for the error values is:
for x=α^{1} where α^{1} is a root of Λ(x)
The computation of the formal derivative Λ'(x) is actually quite simple, as will be demonstrated in the following example. Assume Λ(x) = α^{4}X^{3} + α^{3}X^{2} + X + α^{2}. Λ'(x) thus equals:
The derivative is formed by taking the coefficients of the odd powers of X, and assigning them to the next lower power of X (which will be even). However, in hardware, it is actually easier to find xΛ'(x). This x has the effect of shifting the derivative coefficients to the next higher power of X (i.e., the power of X that the coefficient originally belonged to). Thus, the denominator of the Forney algorithm is found as follows using the same Λ(x) as the above example.
In hardware, this xΛ'(x) term is found by zeroing out every other term of the original Λ(x) polynomial. If the denominator of the Forney equation is modified by multiplying by the x term, then the numerator must also be multiplied by x in order for the equation to still work. Thus, the actual Forney equation implemented in hardware is:
The xΩ(x) polynomial is then evaluated along with the xΛ'(x) polynomial using the same type of hardware as used for the CS. However, in order to form xΩ(x), the coefficients of Ω(x) are shifted to the left by one location. To evaluate Ω(x), the Ω_{0} coefficient would be added with the Ω_{1} coefficient times α^{1}, the Ω_{2} coefficient times α^{2} all the way up to the Ω_{t} coefficient times α^{t}. To evaluate xΩ(x), the Ω_{0} coefficient is multiplied by α^{1}, the Ω_{1} coefficient by α^{2} all the way up to multiplying the Ω_{t} coefficient times α^{(t+1)}. The output of these multipliers is then summed.
The numerator is then multiplied by the denominator using an inverse multiply. The inverse multiply contains a lookup table that finds the inverse of the denominator. For example, if the denominator was α^{3}, the inverse is α^{3}. This can then be expressed as:
α^{i} = α^{(i mod n)} = α^{(3 mod 255)} = α^{252}.
Since the same type of hardware is needed for both the Chien Search and the Forney algorithm, the two functions can be combined in the same block. The Chien Search is shown in the top of Figure 7. The output of the adder for the odd stages is also used in the Forney algorithm, shown in the lower part of the Figure 7. The sum of the odd stages represents the denominator of the Forney equation. This value is inversed in the Inverse Multiply block and then multiplied by the numerator value that is formed from evaluating the error magnitude polynomial. The output is ANDed with the zero detect output since the error values are only valid for the actual error locations (and they should be set to zero otherwise).
A decoder failure is detected by comparing the degree of the error locator polynomial Λ(x) to the number of errors found by the Chien Search module. If the two values are not equal, then a decoder failure has occurred, the decoder failure flag is asserted and the codeword estimate is the received codeword. The degree of Λ(x) is the power of the highest nonzero coefficient of the polynomial. Note that the decoder failure detection is not guaranteed to work if there are more than t errors. Most of the time, it will detect when more than t errors have occurred, but there will still be some cases that are not detected.
Error correction
The output of the Chien/Forney block is the error vector. This vector is the same size as the codeword. The vector contains nonzero values in locations that correspond to errors. All other locations contain zeros. The errors in the received codeword are corrected by adding the received codeword to the error vector using a Galois Field adder. Because the error vector is generated in the reverse order of the received codeword, a LIFO must be applied to either the received codeword or the error vector to match the order of the bytes in both vectors. The output of the adder is the decoder's estimate of the original codeword.
Conclusions
This article discussed the algorithms and architectures needed to implement a ReedSolomon encoder and decoder in hardware. The theory of Galois Fields and ReedSolomon codes is a vast area, and the literature listed in the References section provides a good starting point.
References

Blahut, R. E., “Theory and Practice of Error Control Codes,” AddisonWesley, MA, 1984.

Lin, S. and Costello, D. J., “Error Control Coding: Fundamentals and Applications,” PrenticeHall, NJ, 1983.

Wicker, S. B., “Error Control Coding for Digital Communication and Storage,” PrenticeHall, NJ, 1995.

Wicker, S. B. and Bhargava, V. K., “ReedSolomon Codes and Their Applications,” IEEE Press, NY, 1994.
About the author
Louis Litwin is a m ember of the technical staff with Thomson Multimedia Corporate Research where he is working on 3G CDMA technology for mobile applications. Litwin received his M.S. degree in Electrical Engineering from Purdue University in 1999, and his B.S. degree in Electrical Engineering (summa cum laude) from Drexel University in 1997. He has published over 20 papers on the topics of digital communications and digital signal processing, and he has several patents pending that are related to digital communications. His professional interests include wireless digital communications with a particular focus on adaptive equalization, error control coding and synchronization.