Thursday, September 5, 2019

Two Bit Swap Protocol

Two Bit Swap Protocol

Describing the protocol

Brief decription of the protocol: Two lines are used. Both are open collector pulled up. There is a sender and a receiver. The distinction is not a concern of the protocol as long as the upper level executive knows when it's supposed to send or receive.

At the most basic level, the two wires are used to send individual bits. Both D0 and D1 are normally idle at "1". To send a "0" the sender pulls D0 low. Or to send a "1" the sender pulls D1 low. In either case, the receiver sees one of those lines go low and then uses the opposite line as an ack. So if the receiver sees D0 go low, it pulls D1 low to ack the "0". When the sender sees D1 go low, it knows the receiver has the bit and lets D0 go back high. The receiver sees D0 is idle again, so it lets D1 go high as well. Now both D0 and D1 are in the idle state and the next bit can be sent.

But we would like to sync the bytes. This is done as follows:

Nine bits are sent for every byte. The most significant bit -- the one sent first -- is the sync bit. This bit is always "1" for a normal byte. So every byte is sent as a "1" followed by the serialization of the byte's bits, MSB to LSB. Alternatively, a context frame starts with the sync bit of "0" along with 8 more "0"s. This means that in a normal bit stream there will never be a series of 9 "0" bits.

So to specify a context for the data, send 9 "0"s followed by the context byte, which is the first normal byte after the context switch.

Using this context switch it's possible for the upper level executive to manage an api to toggle between sender and receiver. The low-level protocol itself doesn't care how that is handled.

The animation below demonstrates this protocol. (The Javascript doesn't actually implement it.) To see how bytes and contexts are sent, hit STOP. You can enter a comma separated list of bytes. These bytes will be displayed as "sent" either once or repeating. To send a context switch, specify "c" instead of a number.

Example: c,0x10,0x02,3

This will set the context to 0x10 because 0x10 is the first byte following the context switch. Then it will send 2 and 3. The bytes can be specified in hex or decimal.

Code will be included soon.

Enter Data:
Sending Data: 0xF0
repeat
slow

Friday, July 26, 2019

A General Purpose SPI Engine

SPI Engine

When I start work on a new microcontroller I usually hook up a small graphics display and bitbang some feedback through the SPI interface. I prefer a graphic display to serial output. With a simple, generic graphics library it's easy to get a 128x64 oled display working in less than an hour. The only thing I have to figure out is how to do GPIO output for the chip.

But eventually I have other SPI peripherals I want to hook up and I'd rather not bitbang everything. I tend to dislike bloated manufacture's libraries. It's usually harder to deal with them than it is to just take the time to study the CPU manuals.

I'll describe a very efficient, general purpose SPI engine. This implementation is on the STM32f103. I tested this on the "bluepill' board with a 128x64 oled (SSD1306 controller) and an LSM6DS3 IMU. With this technique I was able to read the accelerometer and update the full display at 400 frames per second. That's a ridiculously high refresh rate but that was only a test.

The SPI engine is table-driven. So once the code works, there's not much involved with adding different peripherals. Just add more tables.

Full source code is on github.

The Engine Logic

The core of the engine is a command-driven state machine. Think of this as a byte-code interpreter. The interpreter advances on events or triggers. Events would be SPI interrupts. Triggers would be functions named something like START(doing_this).

The SPI engine knows nothing about the peripherals it services. It only knows about SPI registers, GPIO and possibly DMA.

Although this interpreter is easily extended, currently it's sufficient to handle most hardware I throw at it. But I will extend it on a follow up article soon.

These are the byte-code commands I've implemented so far:

enum{ CMD_SETIO=1, CMD_CLEARIO, CMD_NEXTBYTE, CMD_NEXTWBLOCK, CMD_NEXTWBLOCK2, CMD_NEXTINBYTE, CMD_NEXTINBYTE2, CMD_SET_BUFFER, CMD_WAITRX, CMD_TEST, CMD_END };

Here is a typical program to read an accelerometer.

static const uint8_t ACCEL_Cmds_ReadAccel[]={ CMD_CLEARIO, ACCEL_CS, // select the accelerometer CMD_NEXTBYTE, LSM6DS3_OUTX_L_XL +0x80, // register for data CMD_SET_BUFFER, ACCEL_BUF, CMD_NEXTINBYTE, CMD_SETIO, ACCEL_CS, // deselect CMD_END };

That's a ten byte "program" to read x/y/z.

The interpreter executes commands by cycling through a simple switch statement. Some commands might have multiple steps. An example is CMD_NEXTINBYTE which has substep CMD_NEXTINBYTE2. Steps execute either in a loop or as the next call from an outside event, such as an interrupt.

The interpreter's main function is basically that switch statement along with state data. Here is the function:

static void _spi_nextstate( void ) { uint8_t executing=1; // we will loop through commands until this is 0 uint8_t nxt; // next command to execute static uint8_t state_continue=0; if( _spi_pc==0 ) return; // There must be a program to execute. while( executing ) { nxt=state_continue; // could be a continuation of previous command // (that is, some commands have two or more steps) if( nxt==0 ) nxt=*_spi_pc++; // no, it's a new command switch( nxt ) { // This is used to control things like chip select case CMD_SETIO: { uint16_t ionum=*_spi_pc++; // pick up the pin#, advance the command GPIO_SetBits( _iopin[ionum].port, _iopin[ionum].pin ); // pin=1 } break; // This is used to control things like chip select case CMD_CLEARIO: { uint16_t ionum=*_spi_pc++; // pick up the pin#, advance the command GPIO_ResetBits( _iopin[ionum].port, _iopin[ionum].pin ); // pin=0 } break; // Send an immediate byte over spi case CMD_NEXTBYTE: rxirq=0; // clear rx status (so we can detect the next receive for deselct cs) _spi_send( *_spi_pc++ ); // data byte to display hardware, advance the command pointer state_continue=0; executing=0; // will stop cmd loop but return on irq once data is sent break; // This is used for DMA buffers case CMD_SET_BUFFER: break; // This handles SPI input, interrupt on each byte (not DMA) case CMD_NEXTINBYTE: rxirq=0; // clear rx status (so we can detect the next receive for deselct cs) // (that is, we should guarantee we receive this byte prior to cs=1) _spi_send( 0 ); // normal spi input: send anything, receive next byte _state_continue=CMD_NEXTINBYTE2; break; case CMD_NEXTINBYTE2: if( rxirq ) // (may be a tx interrupt) { rxirq=0; if( _memorycnt ) // any more bytes to receive? { _state_continue=CMD_NEXTINBYTE; // start another byte } else { // nothing left to receive _state_continue=CMD_NEXTINBYTE3; } } else executing=0; // exit and wait on rxirq (this was a tx irq) break; // write from memory to peripheral, no input case CMD_NEXTWBLOCK: executing=0; // exit and wait on irq break; // This handles the rx complete interrupt. // We need this to know when it's safe to raise chip select. // If CS is raised too early it could cut off the last transmitted byte. case CMD_WAITRX: executing=0; // wait on next irq or trigger break; // This handles the end of a 'program'. case CMD_END: executing=0; break; default: break; } // end switch } // end while }

Let's step through what happens with the byte-code program above.

Typically when the accelerometer has data there will be a gpio interrupt. To set the ball rolling, the ISR makes a call to a simple scheduler. The scheduler's job is to start execution via a START(doing_this) function, ACCEL_Cmds_ReadAccel in this case. But this can happen only if the SPI engine is not in the middle of servicing other SPI hardware -- that is, it can't start a program if it's busy with another.

Let's assume the engine is in an idle state. The scheduler intitializes the program counter to the first byte of the SPI program and calls _spi_nextstate(). We're now executing the first byte of the program.

That first byte is CMD_CLEARIO which selects "case CMD_CLEARIO" in the switch statement.This case fetches the next byte in the byte-code stream, this being the pin to clear. We'll arbitrarily call the pin ACCEL_CS.

GPIO_ResetBits() does the deed. The chip select for the accelerometer goes low. Once CS is low, the program advances to the next byte-code command.

Since "executing" still equals 1, the while-loop continues with the next command, which is CMD_NEXTBYTE. This command sends a byte to the peripheral over SPI. The LSM6DS3 datasheet tells us that in order to read bytes from the chip we must tell where to start reading. This is a register number and that's what we need to send. The immediate byte-code fetched is "LSM6DS3_OUTX_L_XL" which specifies the register. Again, this name is arbitrary and is defined elsewhere. The bit "0x80" is added to that register value to specify that a read operation is requested. The function of this bit is documented in the LSM6DS3 datasheet. So _spi_send( *_spi_pc++) both fetches the register number and sends it over SPI.

Unlike the previous command, this one will take a while to finish. The data has to shift out the SPI port. Since the switch statement advances through the byte code in a while loop, we need to exit the while loop and wait for the SPI interrupt. CMD_NEXTBYTE does this by setting both "executing" and "state_continue" to zero. This will force an exit from the byte-code interpreter and also force a new command to be read from the byte stream when execution resumes.

Execution of the SPI program resumes with the next SPI interrupt. This interrupt happens when the byte we just sent is actually sent. (You might note that the interrupt is serviced on the "receive buffer full" rather the "transmitter byte empty." The transmitter interrupt is never enabled. It could be done either way, but you have to be careful with the transmitter buffer empty since it's possible to terminate transmission before a byte is fully sent.)

Here is the ISR:

void SPI_IRQ(void) { if( SPIPORT->SR & 2 ) // TX buffer empty { _spi_txint_off(); } if( SPIPORT->SR & 1 ) // RX has data { // must read the data! if( _memorycnt ) { *_memoryp++=SPIPORT->DR; _memorycnt--; } else rdata=SPIPORT->DR; rxirq=1; } _spi_nextstate(); // will send next byte, if any }

On either an rx or (optinally) tx interrupt the SPI engine calls _spi_nextstate() which will resume the SPI engine bytecode interpreter at the next command. In this case the command is CMD_SET_BUFFER, ACCEL_BUF. This sets up a buffer pointer and a byte count for input. ACCEL_BUF is an arbitraty name for the buffer structure to be used. More on this later. This command will not zero "executing" so the next command will execute. That's CMD_NEXTINBYTE. This command doesn't require an immediate value since it's going to use the buffer we just set up. The command's job is to transfer the stream of bytes into the buffer until the buffer count is exhausted. It does this with several steps. Once the bytes are transferred, the next command pulls CS high and we end the program.

For any peripheral there are only two basic actions we need to be concerned with. We'll need to toggle control signals and we'll need to transfer data. Each SPI peripheral will have its own chip select. There may be other control signals. The OLED display has lines to specify a byte is either command or data, and a module reset. In all cases, these commands require only a pin number. The CMD_SETIO and CMD_CLEARIO will handle any combination of that.

Data transfer is more difficult since we have to specify a buffer and we have to wait for the transfer to occur. Transfers can be interrupt driven or DMA driven. There is significant performance improvement with DMA on blocks of data.

I'll describe a program to send pixel data to the SSD1306 controller. This is the controller used in many cheap 128x64 displays. DMA will be used, so it's very fast. This is the program:

static const uint8_t DISPLAY_Cmds_Refresh[]={ CMD_CLEARIO, DISPLAY_CS, // select the display CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB0, // set page 0 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE0, // 128*0 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB1, // set page 1 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE1, // 128*1 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB2, // set page 2 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE2, // 128*2 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB3, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE3, // 128*3 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB4, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE4, // 128*4 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB5, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE5, // 128*5 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB6, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE6, // 128*6 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB7, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE7, // 128*7 CMD_WAITRX, CMD_SETIO, DISPLAY_CS, // deselect the display CMD_CLEARIO, DISPLAY_DC, // command mode CMD_END, };
Here is what's going on:

To send the full 128x64 frame, it's sent in 8 pages of 128 bytes per page. The frame data is held in a buffer defined ouside the SPI engine. All the engine needs is a set of buffer data structures. I've used buffer numbers DISP_PAGE0..DISP_PAGE7 in the snippet above. These are logical buffer numbers. I'll describe the buffer structure soon.

The overall structure is simple. There are 8 similar blocks, one for each display page as defined by the SSD1306 datasheet. Chip select is pulled low for the whole frame, then released at the end of the frame. Each page consists if a four byte sequence to setup the row, column and display page number. These four bytes are sent via the slower interrupt method because DMA would not improve speed significantly. (The interrupt method uses CMD_NEXTBYTE, not CMD_NEXTWBLOCK.) But before sending the command bytes, DISPLAY_DC is pulled low indicating these are command bytes, not data bytes. Once the commands are sent, DISPLAY_DC is pulled high indicating that the following bytes are data. The 128 bytes of pixel data are then sent via DMA. That's the CMD_NEXTWBLOCK command. It starts DMA using the buffer number, then exits. When the DMA interrupt occurs, its ISR calls _spi_nextstate() which resumes the SPI engine with the DMA cleanup at CMD_NEXTWBLOCK2.

These steps are repeated for each display page. CMD_WAITRX simply waits for the last byte to be transmitted. Normally on SPI we are assured a byte is transmitted when we receive a byte after the transmit. In this case it's DMA so we are assured when the DMA actually completes.

The Buffer Structures

The spi engine sends and receives all data through an array of buffer structures. A member in the array is defined like this:

typedef struct { uint8_t * source; uint8_t * destination; uint16_t count; uint8_t * flag; } membuffer_t;

The same structure is used for reads and writes. It's the same no matter the peripheral.

But the display driver's pixel buffer is defined elsewhere and differently:

uint8_t DisplayMap[16*DISPLAYROWS]; //1024 bytes

Graphics manipulation takes place in DisplayMap which is in the display driver and known only to the display driver. The display driver interfaces directly to an intermediate interface, spi_engine_display. This module contains all code related to the physical display such as chip control lines and transfer commands. Spi_engine_display.c is logically separate from spi_engine.c, but not fully separate since it's an included file. It's designed to help with protability. Its main job is to insert it's peripheral's structures into spi_engine.c.

So the display driver moves pixels. The "pseudo" SPI display driver handles the OLED manipulation. And the SPI engine handles SPI communication.

With this scheme, display.c doesn't need to know the display hardware is organized into eight pages of 128 bytes. Spi_display_engine doesn't need to know how to manipulate the SPI interface. And spi_engine doesn't need to know how to interface to the OLED module. Each module knows only what it needs to know. Portability is maximized through these three layers.

The tricky part is constructing the SPI engine source code in a way that allows upper level drivers to inject information into it at compile time. I used macros to do this. Usually I stay away from macros but I couldn't figure out a better way unless structures were build at run time. I didn't want runtime overhead.

Here is the macro that defines the display pages:

#define DISPLAYMEM {&DisplayMap[0*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[1*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[2*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[3*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[4*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[5*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[6*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[7*DISPLAYCOLS],0,128,0}, \

So DISPLAYMEM defines an array for the display's buffers. Now we need to get that array into _buffer[] inside spi_engine.c. I did this in a slightly unusual manner:

const membuffer_t _buffer[]={ // PERIPHERALS: #define LOAD_BUFFERS #include "spi_engine_display.h" #define LOAD_BUFFERS #include "spi_engine_accelerometer.h" };

The header files are read multiple times. That's right, mulyiple times. Each time they are read, only a specific portion of the header is enabled. So in spi_engine_display.h we have:

#ifdef LOAD_BUFFERS #undef LOAD_BUFFERS DISPLAYMEM #endif // LOAD_BUFFERS

Every driver that uses the SPI engine must have a header file with several selectively enabled sections such as this.

It's not enough to include the buffer structure. The buffers also have al ogical reference that the SPI programs can use. These are defined similarly:

#define DISPLAYMEM_ENUM DISP_PAGE0, \ DISP_PAGE1, \ DISP_PAGE2, \ DISP_PAGE3, \ DISP_PAGE4, \ DISP_PAGE5, \ DISP_PAGE6, \ DISP_PAGE7,

That macro defines logical reference numbers. It will be included like this:

#ifdef LOAD_BUFFER_IDS #undef LOAD_BUFFER_IDS DISPLAYMEM_ENUM #endif // LOAD_BUFFER_IDS

And this is inserted into an enum in spi_engine.c like this:

enum{ // PERIPHERALS: #define LOAD_BUFFER_IDS #include "spi_engine_display.h" #define LOAD_BUFFER_IDS #include "spi_engine_accelerometer.h" };

Note that care must be taken to order the includes the same way in every case.

The final step is synchronization. It's assumed that the peripherals are running asynchronous. As already noted, one SPI program cannot interrupt another. So synchronization is a must. This is a simple way of doing it:
static uint16_t _spiprog=0; static void _spi_prg_start( void ) { uint8_t * prg=0; if( _spiprog ==0 ) return; if( _spi_pc!=0 ) return; uint16_t bit=1; uint8_t i=0; while( bit ) { if( bit & _spiprog ) { _spiprog=_spiprog ^ bit; bit=0; prg=(uint8_t *) _spi_program[i]; } bit=bit<<1; i++; } _spi_pc=prg; _spi_nextstate(); } static void _spi_queue( uint8_t pnum ) { _spiprog=_spiprog | (1<<pnum); _spi_prg_start(); }
The variable _spiprog has 16 bits corresponding to 16 SPI possible SPI programs. When a bit is set via the _spi_queue() function, _spi_prg_start() will start that corresponding program when the engine becomes ready. This simple scheduler treats programs with an increasing priority which means that the low priority programs could be starved if high priority programs keep the engine busy. I didn't try to resolve this issue in this basic example.

Performance

I captured some images from a logic analyser to show the timing. This first image shows one full frame being sent to the OLED. You can see the 8 display pages. Signals for channels 1..8 are shown. Channel 4 is SPI clock. Channel 7 is the chip select. It takes 1.052 milliseconds to send the whole frame. I'm sending approximately 50 frames per second.

The next image shows how fast the DMA sends bytes, The clock is 12 mhz. You can see how little time there is between bytes -- no more than 2 to 3 clocks.

The third image shows the command bytes sent to the oled via the interrupt method. As you can see, it takes considerably longer to service the interrupt than it does to send through DMA.

The last image is the accelerometer read. Chip select is low for 49.08 microseconds.

Demo

I created a demonstration which shows the accelerometer and OLED working together with this SPI engine. I adapted a neat little Tiny 3d Engine by Themistokle Benetatos found here. I converted the raw accelerometer g values to an angle using a table lookup. Then used the angles to orient the wire frame model. Here is the video:

-- Don Jindra