A Better WS2812 Control Library for TI Launchpad

     I'm writing a control library that lets me control 16 strings of WS2812 smart LEDs using a single TI TM4C123 Launchpad MCU. If a 16MHz Arduino can handle it, an 80MHz ARM Cortex M4 should have no problem. And it doesn't, but the first demo code I wrote was atrocious, so I wanted to put some thought into how to improve it. On my original Arduino controller, I just used a modified version of Adafruit's Neopixel library and wrote to 16 strings of 16 lights each in series. However, if I want to do visualizations more complex than falling raindrops or twinkling stars, I need to have CPU time available for that. Writing 24bits*16LEDs*16strings*60fps at 800kHz means 50% of my CPU time is used just writing to the LEDs. Unacceptable. I need a way to multithread the control functions or create a function that writes to multiple strings in parallel.

     My first idea was to make a full RTOS, with each string operating as a separate task. This seemed like extreme overkill. Logically, all 16 strings would have the same priority, and considering I have to modulate the IO pins at a resolution of 0.35+-0.15μs, there's not a lot of time to switch between threads in between pin switches. It makes a lot more sense to just write one big, ugly function with preplanned, overlapping, deterministic timing.

     I figured the easiest way to manage sub-μs timing would be to just create a timer that interrupts at the desired 800MHz and writes the next set of bits for several strings at once. We could store the codes to be written in a global variable array, have the timer handler read and transmit the next bit for as many strings/pins as it can handle, and then return. Doing some basic math, it takes two CPU cycles to flip a pin, which means we can't flip all sixteen pins in less than 0.35μs if we flip them in series. We could flip them all at the same time, but that would require some thought. We could create a mask that told us which pins to flip for each cycle. Too much work. We can't flip sixteen bits in series, but we can do eight, and that would allow us to write to all sixteen strings in the time required to write to two in sequence. That'll work.

     Additionally, since we want the framerate to be consistent, we can automate that using a timer as well. We'll create a second timer that interrupts every 60th of a second and sets the write flag. So then, all the main function has to do is update the image stored in shared memory. Writing to the shared memory should be uninterruptible, so we're not displaying two halves of two different images.

     So here's the plan, in pseudocode:
// timer1 is set to 800kHz, LED write frequency
// every timer tick, write 8 pins high, see what the polarity of each pin is
// supposed to be for this bit, increment counter, then write them low when
// their respective times are reached. Repeat for second group of 8.
timer1_handler() {
if(counter1 > 0) {
write all group 1 GPIOs high;
counter1--;
create mask to write zero pins low;
wait zero_pulse delay time;
write zero pins back low;
wait remaining one_pulse delay time;
write all group 1 GPIOs low;
}
else if(counter2 > 0) {
same as for group 1;
}
}

// timer2 is set to 60Hz, and only resets the timer 1 counters
timer2_handler() {
    counter1 = 24;
    counter2 = 24;
}

     This is good. I like this. Let's see how it works.