Thursday, September 5, 2019

Two Bit Swap Protocol

Two Bit Swap Protocol

Describing the protocol

Brief decription of the protocol: Two lines are used. Both are open collector pulled up. There is a sender and a receiver. The distinction is not a concern of the protocol as long as the upper level executive knows when it's supposed to send or receive.

At the most basic level, the two wires are used to send individual bits. Both D0 and D1 are normally idle at "1". To send a "0" the sender pulls D0 low. Or to send a "1" the sender pulls D1 low. In either case, the receiver sees one of those lines go low and then uses the opposite line as an ack. So if the receiver sees D0 go low, it pulls D1 low to ack the "0". When the sender sees D1 go low, it knows the receiver has the bit and lets D0 go back high. The receiver sees D0 is idle again, so it lets D1 go high as well. Now both D0 and D1 are in the idle state and the next bit can be sent.

But we would like to sync the bytes. This is done as follows:

Nine bits are sent for every byte. The most significant bit -- the one sent first -- is the sync bit. This bit is always "1" for a normal byte. So every byte is sent as a "1" followed by the serialization of the byte's bits, MSB to LSB. Alternatively, a context frame starts with the sync bit of "0" along with 8 more "0"s. This means that in a normal bit stream there will never be a series of 9 "0" bits.

So to specify a context for the data, send 9 "0"s followed by the context byte, which is the first normal byte after the context switch.

Using this context switch it's possible for the upper level executive to manage an api to toggle between sender and receiver. The low-level protocol itself doesn't care how that is handled.

The animation below demonstrates this protocol. (The Javascript doesn't actually implement it.) To see how bytes and contexts are sent, hit STOP. You can enter a comma separated list of bytes. These bytes will be displayed as "sent" either once or repeating. To send a context switch, specify "c" instead of a number.

Example: c,0x10,0x02,3

This will set the context to 0x10 because 0x10 is the first byte following the context switch. Then it will send 2 and 3. The bytes can be specified in hex or decimal.

Code will be included soon.

Enter Data:
Sending Data: 0xF0
repeat
slow

Friday, July 26, 2019

A General Purpose SPI Engine

SPI Engine

When I start work on a new microcontroller I usually hook up a small graphics display and bitbang some feedback through the SPI interface. I prefer a graphic display to serial output. With a simple, generic graphics library it's easy to get a 128x64 oled display working in less than an hour. The only thing I have to figure out is how to do GPIO output for the chip.

But eventually I have other SPI peripherals I want to hook up and I'd rather not bitbang everything. I tend to dislike bloated manufacture's libraries. It's usually harder to deal with them than it is to just take the time to study the CPU manuals.

I'll describe a very efficient, general purpose SPI engine. This implementation is on the STM32f103. I tested this on the "bluepill' board with a 128x64 oled (SSD1306 controller) and an LSM6DS3 IMU. With this technique I was able to read the accelerometer and update the full display at 400 frames per second. That's a ridiculously high refresh rate but that was only a test.

The SPI engine is table-driven. So once the code works, there's not much involved with adding different peripherals. Just add more tables.

Full source code is on github.

The Engine Logic

The core of the engine is a command-driven state machine. Think of this as a byte-code interpreter. The interpreter advances on events or triggers. Events would be SPI interrupts. Triggers would be functions named something like START(doing_this).

The SPI engine knows nothing about the peripherals it services. It only knows about SPI registers, GPIO and possibly DMA.

Although this interpreter is easily extended, currently it's sufficient to handle most hardware I throw at it. But I will extend it on a follow up article soon.

These are the byte-code commands I've implemented so far:

enum{ CMD_SETIO=1, CMD_CLEARIO, CMD_NEXTBYTE, CMD_NEXTWBLOCK, CMD_NEXTWBLOCK2, CMD_NEXTINBYTE, CMD_NEXTINBYTE2, CMD_SET_BUFFER, CMD_WAITRX, CMD_TEST, CMD_END };

Here is a typical program to read an accelerometer.

static const uint8_t ACCEL_Cmds_ReadAccel[]={ CMD_CLEARIO, ACCEL_CS, // select the accelerometer CMD_NEXTBYTE, LSM6DS3_OUTX_L_XL +0x80, // register for data CMD_SET_BUFFER, ACCEL_BUF, CMD_NEXTINBYTE, CMD_SETIO, ACCEL_CS, // deselect CMD_END };

That's a ten byte "program" to read x/y/z.

The interpreter executes commands by cycling through a simple switch statement. Some commands might have multiple steps. An example is CMD_NEXTINBYTE which has substep CMD_NEXTINBYTE2. Steps execute either in a loop or as the next call from an outside event, such as an interrupt.

The interpreter's main function is basically that switch statement along with state data. Here is the function:

static void _spi_nextstate( void ) { uint8_t executing=1; // we will loop through commands until this is 0 uint8_t nxt; // next command to execute static uint8_t state_continue=0; if( _spi_pc==0 ) return; // There must be a program to execute. while( executing ) { nxt=state_continue; // could be a continuation of previous command // (that is, some commands have two or more steps) if( nxt==0 ) nxt=*_spi_pc++; // no, it's a new command switch( nxt ) { // This is used to control things like chip select case CMD_SETIO: { uint16_t ionum=*_spi_pc++; // pick up the pin#, advance the command GPIO_SetBits( _iopin[ionum].port, _iopin[ionum].pin ); // pin=1 } break; // This is used to control things like chip select case CMD_CLEARIO: { uint16_t ionum=*_spi_pc++; // pick up the pin#, advance the command GPIO_ResetBits( _iopin[ionum].port, _iopin[ionum].pin ); // pin=0 } break; // Send an immediate byte over spi case CMD_NEXTBYTE: rxirq=0; // clear rx status (so we can detect the next receive for deselct cs) _spi_send( *_spi_pc++ ); // data byte to display hardware, advance the command pointer state_continue=0; executing=0; // will stop cmd loop but return on irq once data is sent break; // This is used for DMA buffers case CMD_SET_BUFFER: break; // This handles SPI input, interrupt on each byte (not DMA) case CMD_NEXTINBYTE: rxirq=0; // clear rx status (so we can detect the next receive for deselct cs) // (that is, we should guarantee we receive this byte prior to cs=1) _spi_send( 0 ); // normal spi input: send anything, receive next byte _state_continue=CMD_NEXTINBYTE2; break; case CMD_NEXTINBYTE2: if( rxirq ) // (may be a tx interrupt) { rxirq=0; if( _memorycnt ) // any more bytes to receive? { _state_continue=CMD_NEXTINBYTE; // start another byte } else { // nothing left to receive _state_continue=CMD_NEXTINBYTE3; } } else executing=0; // exit and wait on rxirq (this was a tx irq) break; // write from memory to peripheral, no input case CMD_NEXTWBLOCK: executing=0; // exit and wait on irq break; // This handles the rx complete interrupt. // We need this to know when it's safe to raise chip select. // If CS is raised too early it could cut off the last transmitted byte. case CMD_WAITRX: executing=0; // wait on next irq or trigger break; // This handles the end of a 'program'. case CMD_END: executing=0; break; default: break; } // end switch } // end while }

Let's step through what happens with the byte-code program above.

Typically when the accelerometer has data there will be a gpio interrupt. To set the ball rolling, the ISR makes a call to a simple scheduler. The scheduler's job is to start execution via a START(doing_this) function, ACCEL_Cmds_ReadAccel in this case. But this can happen only if the SPI engine is not in the middle of servicing other SPI hardware -- that is, it can't start a program if it's busy with another.

Let's assume the engine is in an idle state. The scheduler intitializes the program counter to the first byte of the SPI program and calls _spi_nextstate(). We're now executing the first byte of the program.

That first byte is CMD_CLEARIO which selects "case CMD_CLEARIO" in the switch statement.This case fetches the next byte in the byte-code stream, this being the pin to clear. We'll arbitrarily call the pin ACCEL_CS.

GPIO_ResetBits() does the deed. The chip select for the accelerometer goes low. Once CS is low, the program advances to the next byte-code command.

Since "executing" still equals 1, the while-loop continues with the next command, which is CMD_NEXTBYTE. This command sends a byte to the peripheral over SPI. The LSM6DS3 datasheet tells us that in order to read bytes from the chip we must tell where to start reading. This is a register number and that's what we need to send. The immediate byte-code fetched is "LSM6DS3_OUTX_L_XL" which specifies the register. Again, this name is arbitrary and is defined elsewhere. The bit "0x80" is added to that register value to specify that a read operation is requested. The function of this bit is documented in the LSM6DS3 datasheet. So _spi_send( *_spi_pc++) both fetches the register number and sends it over SPI.

Unlike the previous command, this one will take a while to finish. The data has to shift out the SPI port. Since the switch statement advances through the byte code in a while loop, we need to exit the while loop and wait for the SPI interrupt. CMD_NEXTBYTE does this by setting both "executing" and "state_continue" to zero. This will force an exit from the byte-code interpreter and also force a new command to be read from the byte stream when execution resumes.

Execution of the SPI program resumes with the next SPI interrupt. This interrupt happens when the byte we just sent is actually sent. (You might note that the interrupt is serviced on the "receive buffer full" rather the "transmitter byte empty." The transmitter interrupt is never enabled. It could be done either way, but you have to be careful with the transmitter buffer empty since it's possible to terminate transmission before a byte is fully sent.)

Here is the ISR:

void SPI_IRQ(void) { if( SPIPORT->SR & 2 ) // TX buffer empty { _spi_txint_off(); } if( SPIPORT->SR & 1 ) // RX has data { // must read the data! if( _memorycnt ) { *_memoryp++=SPIPORT->DR; _memorycnt--; } else rdata=SPIPORT->DR; rxirq=1; } _spi_nextstate(); // will send next byte, if any }

On either an rx or (optinally) tx interrupt the SPI engine calls _spi_nextstate() which will resume the SPI engine bytecode interpreter at the next command. In this case the command is CMD_SET_BUFFER, ACCEL_BUF. This sets up a buffer pointer and a byte count for input. ACCEL_BUF is an arbitraty name for the buffer structure to be used. More on this later. This command will not zero "executing" so the next command will execute. That's CMD_NEXTINBYTE. This command doesn't require an immediate value since it's going to use the buffer we just set up. The command's job is to transfer the stream of bytes into the buffer until the buffer count is exhausted. It does this with several steps. Once the bytes are transferred, the next command pulls CS high and we end the program.

For any peripheral there are only two basic actions we need to be concerned with. We'll need to toggle control signals and we'll need to transfer data. Each SPI peripheral will have its own chip select. There may be other control signals. The OLED display has lines to specify a byte is either command or data, and a module reset. In all cases, these commands require only a pin number. The CMD_SETIO and CMD_CLEARIO will handle any combination of that.

Data transfer is more difficult since we have to specify a buffer and we have to wait for the transfer to occur. Transfers can be interrupt driven or DMA driven. There is significant performance improvement with DMA on blocks of data.

I'll describe a program to send pixel data to the SSD1306 controller. This is the controller used in many cheap 128x64 displays. DMA will be used, so it's very fast. This is the program:

static const uint8_t DISPLAY_Cmds_Refresh[]={ CMD_CLEARIO, DISPLAY_CS, // select the display CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB0, // set page 0 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE0, // 128*0 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB1, // set page 1 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE1, // 128*1 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB2, // set page 2 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE2, // 128*2 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB3, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE3, // 128*3 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB4, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE4, // 128*4 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB5, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE5, // 128*5 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB6, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE6, // 128*6 CMD_CLEARIO, DISPLAY_DC, // command mode CMD_NEXTBYTE, 0x00, // set lower column address CMD_NEXTBYTE, 0x10, // set higher column address CMD_NEXTBYTE, 0x00, // set display start line CMD_NEXTBYTE, 0xB7, // set page 3 address CMD_SETIO, DISPLAY_DC, // data mode CMD_NEXTWBLOCK, DISP_PAGE7, // 128*7 CMD_WAITRX, CMD_SETIO, DISPLAY_CS, // deselect the display CMD_CLEARIO, DISPLAY_DC, // command mode CMD_END, };
Here is what's going on:

To send the full 128x64 frame, it's sent in 8 pages of 128 bytes per page. The frame data is held in a buffer defined ouside the SPI engine. All the engine needs is a set of buffer data structures. I've used buffer numbers DISP_PAGE0..DISP_PAGE7 in the snippet above. These are logical buffer numbers. I'll describe the buffer structure soon.

The overall structure is simple. There are 8 similar blocks, one for each display page as defined by the SSD1306 datasheet. Chip select is pulled low for the whole frame, then released at the end of the frame. Each page consists if a four byte sequence to setup the row, column and display page number. These four bytes are sent via the slower interrupt method because DMA would not improve speed significantly. (The interrupt method uses CMD_NEXTBYTE, not CMD_NEXTWBLOCK.) But before sending the command bytes, DISPLAY_DC is pulled low indicating these are command bytes, not data bytes. Once the commands are sent, DISPLAY_DC is pulled high indicating that the following bytes are data. The 128 bytes of pixel data are then sent via DMA. That's the CMD_NEXTWBLOCK command. It starts DMA using the buffer number, then exits. When the DMA interrupt occurs, its ISR calls _spi_nextstate() which resumes the SPI engine with the DMA cleanup at CMD_NEXTWBLOCK2.

These steps are repeated for each display page. CMD_WAITRX simply waits for the last byte to be transmitted. Normally on SPI we are assured a byte is transmitted when we receive a byte after the transmit. In this case it's DMA so we are assured when the DMA actually completes.

The Buffer Structures

The spi engine sends and receives all data through an array of buffer structures. A member in the array is defined like this:

typedef struct { uint8_t * source; uint8_t * destination; uint16_t count; uint8_t * flag; } membuffer_t;

The same structure is used for reads and writes. It's the same no matter the peripheral.

But the display driver's pixel buffer is defined elsewhere and differently:

uint8_t DisplayMap[16*DISPLAYROWS]; //1024 bytes

Graphics manipulation takes place in DisplayMap which is in the display driver and known only to the display driver. The display driver interfaces directly to an intermediate interface, spi_engine_display. This module contains all code related to the physical display such as chip control lines and transfer commands. Spi_engine_display.c is logically separate from spi_engine.c, but not fully separate since it's an included file. It's designed to help with protability. Its main job is to insert it's peripheral's structures into spi_engine.c.

So the display driver moves pixels. The "pseudo" SPI display driver handles the OLED manipulation. And the SPI engine handles SPI communication.

With this scheme, display.c doesn't need to know the display hardware is organized into eight pages of 128 bytes. Spi_display_engine doesn't need to know how to manipulate the SPI interface. And spi_engine doesn't need to know how to interface to the OLED module. Each module knows only what it needs to know. Portability is maximized through these three layers.

The tricky part is constructing the SPI engine source code in a way that allows upper level drivers to inject information into it at compile time. I used macros to do this. Usually I stay away from macros but I couldn't figure out a better way unless structures were build at run time. I didn't want runtime overhead.

Here is the macro that defines the display pages:

#define DISPLAYMEM {&DisplayMap[0*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[1*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[2*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[3*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[4*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[5*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[6*DISPLAYCOLS],0,128,0}, \ {&DisplayMap[7*DISPLAYCOLS],0,128,0}, \

So DISPLAYMEM defines an array for the display's buffers. Now we need to get that array into _buffer[] inside spi_engine.c. I did this in a slightly unusual manner:

const membuffer_t _buffer[]={ // PERIPHERALS: #define LOAD_BUFFERS #include "spi_engine_display.h" #define LOAD_BUFFERS #include "spi_engine_accelerometer.h" };

The header files are read multiple times. That's right, mulyiple times. Each time they are read, only a specific portion of the header is enabled. So in spi_engine_display.h we have:

#ifdef LOAD_BUFFERS #undef LOAD_BUFFERS DISPLAYMEM #endif // LOAD_BUFFERS

Every driver that uses the SPI engine must have a header file with several selectively enabled sections such as this.

It's not enough to include the buffer structure. The buffers also have al ogical reference that the SPI programs can use. These are defined similarly:

#define DISPLAYMEM_ENUM DISP_PAGE0, \ DISP_PAGE1, \ DISP_PAGE2, \ DISP_PAGE3, \ DISP_PAGE4, \ DISP_PAGE5, \ DISP_PAGE6, \ DISP_PAGE7,

That macro defines logical reference numbers. It will be included like this:

#ifdef LOAD_BUFFER_IDS #undef LOAD_BUFFER_IDS DISPLAYMEM_ENUM #endif // LOAD_BUFFER_IDS

And this is inserted into an enum in spi_engine.c like this:

enum{ // PERIPHERALS: #define LOAD_BUFFER_IDS #include "spi_engine_display.h" #define LOAD_BUFFER_IDS #include "spi_engine_accelerometer.h" };

Note that care must be taken to order the includes the same way in every case.

The final step is synchronization. It's assumed that the peripherals are running asynchronous. As already noted, one SPI program cannot interrupt another. So synchronization is a must. This is a simple way of doing it:
static uint16_t _spiprog=0; static void _spi_prg_start( void ) { uint8_t * prg=0; if( _spiprog ==0 ) return; if( _spi_pc!=0 ) return; uint16_t bit=1; uint8_t i=0; while( bit ) { if( bit & _spiprog ) { _spiprog=_spiprog ^ bit; bit=0; prg=(uint8_t *) _spi_program[i]; } bit=bit<<1; i++; } _spi_pc=prg; _spi_nextstate(); } static void _spi_queue( uint8_t pnum ) { _spiprog=_spiprog | (1<<pnum); _spi_prg_start(); }
The variable _spiprog has 16 bits corresponding to 16 SPI possible SPI programs. When a bit is set via the _spi_queue() function, _spi_prg_start() will start that corresponding program when the engine becomes ready. This simple scheduler treats programs with an increasing priority which means that the low priority programs could be starved if high priority programs keep the engine busy. I didn't try to resolve this issue in this basic example.

Performance

I captured some images from a logic analyser to show the timing. This first image shows one full frame being sent to the OLED. You can see the 8 display pages. Signals for channels 1..8 are shown. Channel 4 is SPI clock. Channel 7 is the chip select. It takes 1.052 milliseconds to send the whole frame. I'm sending approximately 50 frames per second.

The next image shows how fast the DMA sends bytes, The clock is 12 mhz. You can see how little time there is between bytes -- no more than 2 to 3 clocks.

The third image shows the command bytes sent to the oled via the interrupt method. As you can see, it takes considerably longer to service the interrupt than it does to send through DMA.

The last image is the accelerometer read. Chip select is low for 49.08 microseconds.

Demo

I created a demonstration which shows the accelerometer and OLED working together with this SPI engine. I adapted a neat little Tiny 3d Engine by Themistokle Benetatos found here. I converted the raw accelerometer g values to an angle using a table lookup. Then used the angles to orient the wire frame model. Here is the video:

-- Don Jindra

Saturday, August 25, 2012

State Machine Function Calls

I had a project which controlled up to six motors. The motors ran asynchronously and were controlled via a CAN bus. To make the system more deterministic and simpler, no RTOS was used. Everything was controlled by state machines. Logically we could consider each motor having its own "thread" and its own collection of state machines. But since all of the motors performed identical operations they could share code even though each could be executing different parts of that code. Only the data for each motor had to be unique -- like homing states or motor positions.

State machines can result in some ugly code if you're not careful. They're really nothing but glorified GOTO statements. To make the code more maintainable and more efficient I wanted to simulate nested function calls. And I wanted each function call to behave like a blocking call. But each call was actually going to be to yet another state machine.

You can't block a state machine. It kind of defeats the purpose. If one motor "thread" was blocked then all "threads" stop. None of the other motors could proceed with their states. State machines have to keep running.

Let's say StateMachine1 calls StateMachine2 which calls StateMachine3 and this machine is in a waiting mode. Then on each cycle we have to traverse from SM1 down to SM3, test a condition, then back out. Normally there's no good reason to do this. SM1 is simply waiting for SM3 to finish. Why bother calling SM1 and SM2 when we know we're in SM3? It would be better if an outer loop simply proceeds directly to SM3.

I came up with a simple way to do this. Each motor "thread" is really a data structure with some global information and a reference to be used for some local information. At a minimum globally we need a function address which is the currently executing state machine, and the current state of that machine. At a minimum the local information, which could almost be considered stack information, has the parent's state machine and its state. The outermost loop simply executes an indirect function call to the appropriate machine passing a pointer to that thread's global data structure. The state machine uses the permanent thread number to reference that thread's local data for that machine. Each state machine contains its data and only knows about its data. When one state machine "calls" another, the calling code copies the parent's address and state into the child's data structure and puts the child's address into the common data structure. In other words, each "thread" has a global data structure plus a changeable portion which is local only to the currently executing state machine. Each state machine has its own data structure for each possible thread. In my case that was six data structure, one for each motor. So each motor is assigned a thread number and this thread number is used as an index into its private data area in each state machine.

Note that the state machines must have two system-known states (or cases). SM_INITIALIZE is used to load the global area's local data pointer. SM_ENTRY is always the first state of the machine on a call, that is, it's the logical entry point. Internal states begin at START_INTERNAL_STATES.

Here is sm.h which defines the global data structures:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// sm.h

typedef void (*tStateMachine)( void* thread );

typedef struct{
  tStateMachine  caller;
  int            callerstate;
  void *         callerslocal;
} tLocalData;


typedef struct{
  tStateMachine  currentmachine;  // executable function
  int            state;     // execute this state in the function
  tLocalData *   local;     // points to the current local data to use while executing
  tLocalData *   retlocal;    // saved pointer to child's data... this may be used to return data to parent from child
  int            id;     // id is set once and never changes
  int            status;    // can return status
} dThread, *tThread;


enum {
       SM_INITIALIZE=1,
       SM_ENTRY
     };

#define EXECUTE(thread) (*thread->currentmachine)(thread)

#define CALL_THEN_RETURNTO(sm, ret)  gp->state=ret;   \
        CallStateMachine( gp, (tStateMachine ) &sm )

#define START_INTERNAL_STATES 100

Here is sm.c which contains the core setup, call, and return functions:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
// ----- fail-safe -------------------------------------------------------------------

static void nocaller( void * thread )
{
  // should never get here!
}

//============================================================================
//=======================[ StartStateMachine ]================================
//============================================================================

// This is used to setup the top-most state machine. It will "call" other
// state-machines in its thread.  

static void StartThread( tThread thread, tStateMachine targetmachine, int id )
{
   thread->state=SM_INITIALIZE;          // each s-m must support this
   thread->id=id;                        // an instance identification for the target state-machine
   (*targetmachine)( thread );           // get the pointer to local data
   thread->local->caller=&nocaller;      // a dummy caller
   thread->local->callerstate=0;         // state is irrelevant
   thread->currentmachine=targetmachine; // this is the top-most state-machine
   thread->state=SM_ENTRY;               // this is where we always start
}

//============================================================================
//=======================[ CallStateMachine ]=================================
//============================================================================

// This saves the state of the parent state-machine into the
// child's data area. When the child eventually returns this
// saved status will allow the parent to resume execution at
// its next state. When a child is called, no code in the
// parent is executed until the child returns. So this scheme
// behaves like a regular function call.
//
// The local pointer is changed by this call. This pointer always
// points to the child's local data. And since the parent will 
// regain control immediately after CallStateMachine() returns, 
// but prior to the child's actual execution, the parent may pass
// parameters to the child through this local data structure. 
// When this is required, both parent and child know the structure
// of the child's data area. This is handled by selective 
// "switched" includes of a common ".h" file. 

static void CallStateMachine( tThread thread, tStateMachine targetmachine )
{
   int savestate;
   tLocalData * savelocal;
      
   savestate=thread->state;               // temporarily save the state
   savelocal=thread->local;               // temporarily save local data ptr
   
   thread->state=SM_INITIALIZE;
   (*targetmachine)( thread );            // get the pointer to the child's local data
                                          // thread->local is now changed
   thread->local->caller=thread->currentmachine; // save the parent's address so we can return
   thread->local->callerstate=savestate;  // also remember the parent's state (where we will return)
   thread->local->callerslocal=savelocal; // also remember the parent's local data
   thread->currentmachine=targetmachine;  // set the new (child) state-machine
   thread->state=SM_ENTRY;                // this is where we always start a child
}

//============================================================================
//=======================[ ReturnToCaller ]===================================
//============================================================================

static void ReturnToCaller( tThread p )
{
  p->currentmachine = p->local->caller; // return to the previous (parent) machine
  p->state=p->local->callerstate;       // restore the parent's state
  p->retlocal=p->local;
  p->local=p->local->callerslocal;      // restore pointer to parent's local data
  // Parent's status can be set in every state-machine.
  // Data is still saved in p->local, now moved to p->retlocal, so
  // if need be, we can return data since the parent
  // now has a pointer to the child's local (updated) data.
}

And here is a simple test program which is mostly stubbed out.
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
enum {
       MOTOR1,
       MOTOR2,
       MOTOR3,
       MOTOR4,
       MOTOR5,
       MOTOR6,
       MAX_ID  // # of instances needed
     };

// prototypes
static void InitializeMotor( tThread gp );
static void SendBatch( tThread gp );
static void TransferData( tThread gp );
static void MotorThread( tThread gp );  // top state machine

static int done=1; // dummy value
static int error=0; // dummy value
static char motorparams[]={ 1,2,3,4,0 }; // dummy values
static char motorposition[]={ 5,6,7,8,0 }; // dummy values

#define TRANSF_ERROR 1

//------ MotorThread --------------------------------------------------------------------------
//
// This handles position updates.
// Plus, this starts EPOS and starts homing.

static void MotorThread( tThread gp )
{
   // internal states
   enum {
          STARTUP=START_INTERNAL_STATES,
          MOTORIDLE,
          MOTORMOVE,
        };

   typedef struct {
      tStateMachine caller;
      int callerstate;
      void * callerslocal;
   } tMotorLocals;
   
   #define LOCAL_BATCH
     #include "smlocals.h"

   static tMotorLocals local[MAX_ID]; // local instance variables
   tMotorLocals * lp; // pointer to the local data
   int gotdata=0;

   lp=&local[gp->id]; // first, load pointer to the local data

   switch( gp->state ) // then just jump to the current state
   {
      case SM_INITIALIZE:
        gp->local=(tLocalData*) lp;  // this is always required
        break;

      case SM_ENTRY:  // always required as first state
     
      case STARTUP:
        CALL_THEN_RETURNTO( InitializeMotor, MOTORIDLE );
        break;
               
      case MOTORIDLE:
        // call something to check for position input
        if( gotdata ) gp->state=MOTORMOVE;
        break;
        
      case MOTORMOVE:
        // let another state machine take over ...
        // ... then resume at the idle state
        CALL_THEN_RETURNTO( SendBatch, MOTORIDLE ); // call and advance
        ( (tBatchLocals*) gp->local )->packet=motorposition;  // can pass variables
        ( (tBatchLocals*) gp->local )->currentitem=0;
        break;
         
   } // end switch (state)
}

//------ InitializeMotor --------------------------------------------------------------------------
//
// This sends motor parameters to the motor controller.
// For example, Maxon EPOS controllers need to know what
// motors they are controlling, speeds, currents, and PID loops, etc.

static void InitializeMotor( tThread gp )
{
   // internal states
   enum {
          STARTUP_1=START_INTERNAL_STATES, // internal states begin after ...
          STARTUP_2,        // ... common states so there's no overlap
          STARTUP_DONE
        };

  // setup the data structures this state machine will use   
  #define LOCAL_BATCH
  #define LOCAL_STARTUP
    #include "smlocals.h"

   static tStartupLocals local[MAX_ID];  // allocate the local instances
   tStartupLocals * lp; // use lp as a pointer to local data

   lp=&local[gp->id];   // always setup the local pointer first

   switch( gp->state )  // execute the current state
   {
      case SM_INITIALIZE:
        gp->local=(tLocalData*) lp; // sets the pointer to instance data
        break;

      case SM_ENTRY:
        gp->state=STARTUP_1;        // this is how to advance to the next state
        break;
        
      case STARTUP_1:
        if( done ) gp->state=STARTUP_2;  // or advance on a flag
        break;

      case STARTUP_2:
        CALL_THEN_RETURNTO( SendBatch, STARTUP_DONE ); // call and advance
        ( (tBatchLocals*) gp->local )->packet=motorparams;  // can pass variables
        ( (tBatchLocals*) gp->local )->currentitem=0;
        break;
        
      case STARTUP_DONE:
        ReturnToCaller( gp ); // transfers back to the parent state machine
        break;

   } // end switch (state)
}

// ----- SendBatch ----------------------------------------------------------------------------------
//
// This will send a batch of data to the motor controller.
// Typically this will be on CAN or serial and will consist
// of a batch of 1 or more packets of some protocol.

static void SendBatch( tThread gp )
{
   // internal states
   enum { B_DONE=START_INTERNAL_STATES, // all packets were sent
          B_START_TRANSACTION,  // sends data to a motor controller
          B_START_ITEM,
          B_SENT
        };

  #define LOCAL_BATCH
  #define LOCAL_TRANSF
   #include "smlocals.h"
   
   static tBatchLocals local[MAX_ID];
   tBatchLocals * lp;
   int more=0;

   lp=&local[gp->id];

   switch( gp->state )
   {
      case SM_INITIALIZE:
        gp->local=(tLocalData*) lp;
        break;
      
      case SM_ENTRY:

      case B_START_ITEM:
         // setup stuff here like buffers, packet pointers
         gp->state=B_START_TRANSACTION; // next state
         break;

      case B_START_TRANSACTION:         // starts tramsmitting 1 packet
         CALL_THEN_RETURNTO( TransferData, B_SENT );
         ( (tTransfLocals*) gp->local )->retries=3; // pass variables
         break;

      case B_SENT:
         // add code to check if more packets need sending
         if( more ) gp->state=B_START_ITEM;
         if( done ) gp->state=B_DONE;              // next state
         break;
         
      case B_DONE:
         ReturnToCaller( gp );
         break;

   } // end switch (machine state)
}

// ----- TransferData -------------------------------------------------------------------------------
//
// This will send a batch of data to the motor controller.
// Typically this will be on CAN or serial and will consist
// of a batch of 1 or more packets of some protocol.

static void TransferData( tThread gp )
{
   // internal states
   enum { T_DONE=START_INTERNAL_STATES, // all packets were sent
          T_START_ITEM,         // starts new transaction based on options
          T_TX,
          T_ACK,
          T_RETRY,              // previous operation failed, retrying
        };

  #define LOCAL_TRANSF
    #include "smlocals.h"
   
   static tTransfLocals local[MAX_ID];
   tTransfLocals * lp;

   lp=&local[gp->id];

   switch( gp->state )
   {
      case SM_INITIALIZE:
        gp->local=(tLocalData*) lp;
        break;
      
      case SM_ENTRY:
        gp->status=0; // clear status;

      case T_TX:
         // call a function or yet another state machine to start
         // sending the data
         gp->state=T_ACK; // next state
         break;

      case T_ACK:
         // maybe wait on an ack, then:
         if( done ) gp->state=T_DONE;
         if( error ) gp->state=T_RETRY;
         break;

      case T_RETRY:
        if( lp->retries ) // see if any reties remain
         {
           lp->retries--;  // yes, do another try
           gp->state=T_TX;
         }
         else
         {
           // no reties reain, set some error condition and exit
           gp->status=TRANSF_ERROR;  // global status is easy to set
           gp->state=T_DONE;
         }
         break;
         
      case T_DONE:
         ReturnToCaller( gp );
         break;

   } // end switch (machine state)
}

static dThread Motor1Thread;
static tThread pMotor1Thread=&Motor1Thread;
static dThread Motor2Thread;
static tThread pMotor2Thread=&Motor2Thread;
static dThread Motor3Thread;
static tThread pMotor3Thread=&Motor3Thread;
static dThread Motor4Thread;
static tThread pMotor4Thread=&Motor4Thread;
static dThread Motor5Thread;
static tThread pMotor5Thread=&Motor5Thread;
static dThread Motor6Thread;
static tThread pMotor6Thread=&Motor6Thread;

// This is the main (outermost) loop. All we need to do is call 
// this periodically to advance all state machines.  Probably at 
// least one communication state machine must be added to handle
// the serial port or other communication method.

static void ExecuteAllStates( void )
{
  EXECUTE( pMotor1Thread );
  EXECUTE( pMotor2Thread );
  EXECUTE( pMotor3Thread );
  EXECUTE( pMotor4Thread );
  EXECUTE( pMotor5Thread );
  EXECUTE( pMotor6Thread );
}

static void InitializeTest( void )
{
  StartThread( pMotor1Thread, (tStateMachine) &MotorThread, MOTOR1 );
  StartThread( pMotor2Thread, (tStateMachine) &MotorThread, MOTOR2 );
  StartThread( pMotor3Thread, (tStateMachine) &MotorThread, MOTOR3 );
  StartThread( pMotor4Thread, (tStateMachine) &MotorThread, MOTOR4 );
  StartThread( pMotor5Thread, (tStateMachine) &MotorThread, MOTOR5 );
  StartThread( pMotor6Thread, (tStateMachine) &MotorThread, MOTOR6 );
}


InitializeTest should be called once then ExecuteAllStates should be called in an infinite loop after that.

As I hope you can see, no matter how deep we nest the state machine calls we never pay for it with a time penalty. Execution always proceeds directly to the currently executing state. We merely load a couple of pointers and execute a case in a switch statement. The biggest penalty is with the call itself but this occurs only once to get into that state machine. While we're there that penalty disappears.


Finally, here is smlocals.h which is where I defined the local data structures:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#ifdef LOCAL_STARTUP

   typedef struct {
      tStateMachine caller;
      int callerstate;
      void * callerslocal;
   } tStartupLocals;

#endif 
#undef LOCAL_STARTUP


#ifdef LOCAL_BATCH

   typedef struct {
      tStateMachine caller;
      int callerstate;
      void * callerslocal;
   char * packet;
      int       currentitem;
   } tBatchLocals;

#endif 
#undef LOCAL_BATCH

#ifdef LOCAL_TRANSF

   typedef struct {
      tStateMachine caller;
      int callerstate;
      void * callerslocal;
   int retries;
   int status;
   char * buffer;
   int count;
   } tTransfLocals;

#endif 
#undef LOCAL_TRANSF

This method of defining local data structures is a bit uncommon and may be unnecessary. As a policy I try to hide data from functions which don't need to know about that data. It may be overkill. Basically, within functions (state machines) I switch in the appropriate structure with one or more #define statements. These flags enable only the structures used by that function -- that it, its own and others it may need to know to pass variables to something it calls.

-- Don Jindra


Index

Created August 2012