Decoding LDPC: k-Bit Brute Forcing

Paul Tagliamonte 2022-11-01
Before you go on: I've been warned off implementing this in practice on a few counts; namely, the space tradeoff isn't worth it, and it's unlikely to correct meaningful errors. I'm going to leave this post up, but please do take the content with a very large grain of salt!

My initial efforts to build a PHY and Data Link layer – from scratch using my own code – have been progressing nicely since the initial BPSK based protocol I’ve documented under the PACKRAT series. As part of that, I’ve been diving deep into FEC, and in particular, LDPC.

I won’t be able to do an overview of LDPC justice in this post – with any luck that’ll come in a later post to come as part of the RATPACK series, so some knowledge is assumed. As such this post is less useful for those looking to learn about LDPC, and a bit more targeted to those who enjoy talking and thinking about FEC.

Hey, heads up! - This post contains extremely unvalidated and back of the napkin quality work without any effort to prove this out generally. Hopefully this work can be of help to others, but please double check anything below if you need it for your own work!

While implementing LDPC, I’ve gotten an encoder and checker working, enough to use LDPC like a checksum. The next big step is to write a Decoder, which can do error correction. The two popular approaches for the actual correction that I’ve seen while reading about LDPC are Belief Propagation, and some class of linear programming that I haven’t dug into yet. I’m not thrilled at how expensive this all is in software, so while implementing the stack I’ve been exploring every shady side ally to try and learn more about how encoders and decoders work, both in theory - and in practice.

Processing an LDPC Message

Checking if a message is correct is fairly straightforward with LDPC (as with encoding, I’ll note). As a quick refresher – given the LDPC H (check) matrix of width N, you can check your message vector (msg) of length N by multiplying H and msg, and checking if the output vector is all zero.

    // scheme contains our G (generator) and
    // H (check) matrices.
    scheme := {G: Matrix{...}, H: Matrix{...}}

    // msg contains our LDPC message (data and
    // check bits).
    msg := Vector{...}

    // N is also the length of the encoded
    // msg vector after check bits have been
    // added.
    N := scheme.G.Width

    // Now, let's generate our 'check' vector.
    ch := Multiply(scheme.H, msg)

We can now see if the message is correct or not:

    // if the ch vector is all zeros, we know
    // that the message is valid, and we don't
    // need to do anything.
    if ch.IsZero() {
        // handle the case where the message
        // is fine as-is.
        return ...
    }

    // Expensive decode here

This is great for getting a thumbs up / thumbs down on the message being correct, but correcting errors still requires pulling the LDPC matrix values from the g (generator) matrix out, building a bipartite graph, and iteratively reprocessing the bit values, until constraints are satisfied and the message has been corrected.

This got me thinking - what is the output vector when it’s not all zeros? Since 1 values in the output vector indicates consistency problems in the message bits as they relate to the check bits, I wondered if this could be used to speed up my LDPC decoder. It appears to work, so this post is half an attempt to document this technique before I put it in my hot path, and half a plea for those who do like to talk about FEC to tell me what name this technique actually is.

k-Bit Brute Forcing

Given that the output Vector’s non-zero bit pattern is set due to the position of errors in the message vector, let’s use that fact to build up a table of k-Bit errors that we can index into.

    // for clarity's sake, the Vector
    // type is being used as the lookup
    // key here, even though it may
    // need to be a hash or string in
    // some cases.
    idx := map[Vector]int{}

    for i := 0; i < N; i++ {
        // Create a vector of length N
        v := Vector{}
        v.FlipBit(i)

        // Now, let's use the generator matrix to encode
        // the data with checksums, and then use the
        // check matrix on the message to figure out what
        // bit pattern results
        ev := Multiply(scheme.H, Multiply(v, scheme.G))

        idx[ev] = i
    }

This can be extended to multiple bits (hence: k-Bits), but I’ve only done one here for illustration. Now that we have our idx mapping, we can now go back to the hot path on Checking the incoming message data:

    // if the ch vector is all zeros, we know
    // that the message is valid, and we don't
    // need to do anything.
    if ch.IsZero() {
        // handle the case where the message
        // is fine as-is.
        return ...
    }

    errIdx, ok := idx[ch]
    if ok {
        msg.FlipBit(errIdx)
        // Verify the LDPC message using
        // H again here.
        return ...
    }

    // Expensive decode here

Since map lookups wind up a heck of a lot faster than message-passing bit state, the hope here is this will short-circuit easy to solve errors for k-Bits, for some value of k that the system memory can tolerate.

Does this work?

Frankly – I have no idea. I’ve written a small program and brute forced single-bit errors in all bit positions using random data to start with, and I’ve not been able to find any collisions in the 1-bit error set, using the LDPC matrix from 802.3an-2006. Even if I was to find a collision for a higher-order k-Bit value, I’m tempted to continue with this approach, and treat each set of bits in the Vector’s bin (like a hash-table), checking the LDPC validity after each bit set in the bin. As long as the collision rate is small enough, it should be possible to correct k-Bits of error faster than the more expensive Belief Propagation approach. That being said, I’m not entirely convinced collisions will be very common, but it’ll take a bit more time working through the math to say that with any confidence.

Have you seen this approach called something official in publications? See an obvious flaw in the system? Send me a tip, please!