This Article is about HALs (Hardware Abstraction Layers), what they do, and how C++ can help implementing them. I want to showcase how C++ templates can be used, and compare this to the “classic” approach.

But first, lets think about the problems progamming for a microcontroller.

Low Level Programming Issues

When developing code for a microcontroller, one of the jobs is to access the registers in order control the state of output pins, read the state of input pins, and set the direction of i/o pins. There are several ways how microcontroller vendors can design the hardware, but the programming model is usually memory based. This means there are several memory locations mapped to the chips i/o lines - writing to the locations changes the lines state, reading from locations retrieves the lines state. There are also memory locations for configuration of the port lines - as input, as output, input with pullup - each chip has its own way how to set or clear a line.

Some microcontrollers even have special instructions for port access (older ones, e.g. Z80), or special commands useful for single bit manipulations, like the AVR chips used in most Arduinos.

When it comes to programming, one has to learn the pecularities of the specific chip. This takes quite some time.

If one wants to program a variety of chips this time is multiplied.

The solution for this problem is, that an expert creates a library which encapsulates the gory details of port manipulation.

better call HAL

This library is sometimes called a HAL (Hardware Abstraction Layer). A HAL presents an uniform programming interface which works always the same regardless how the hardware is accessed.

The people who developed the Arduino platform have done this, and this is one of the reasons for the great success of the Arduinos. They are quite easy to program, one can create a program which blinks a LED within 10 minutes, including the installation of the development environment.

The code for an AVR based Arduino or an Expressif ESP32 is exactly the same, the HAL hides the different hardware architectures.

HAL costs

Unfortunatly there is a price, and it may be too high. This depends on on the implementation of the HAL

  • the total size of the code may increase, because the HAL contains functions or implementations which are not used in the application
  • the latency of i/o manipulation may be worse to “bare metal code” (this is platform specific code)
  • there may be errors in the HAL which are hard to find
  • the API design of the HAL may not match the applications needs

me and the Arduino HAL

In one of my hobby development projects i thougt: Hey, why not use an Arduino for it ?

  • they’re cheap
  • i wanted to use an AVR processor anyway
  • they look great for prototyping - dont need to make a pcb and so on..
  • there is a library for nearly anything that might be connected to an Arduino

I was a bit sceptical about the development environment, but i tried.

After some time i hit limits of the built in Arduino libraries. The timer i wanted to use was already used, the library hat high latency seting pin states, and so on. I had also problems adding own header files.

I went back to my trusty Atmel Studio, but used avrdude with the “wiring” programmer type to upload my code to the Arduino board. Because like readable and managable code i also wrote a HAL using c++ template programming.

Templates to the rescue

There are many programmers thinking: Nooo Way, C++ , templates on a microcontroller - this is a “no go”.

I believe the opposite is true. Modern C++ allows to solve many of the problems in HAL programming by moving decisions from runtime into compile time.

This is a tiny program with a trivial functionality. It waits until a bit 6 of port b goes high (pin 12 on an arduino mega). Then it goes into a loop, toggling bit 7 of port b(this is pin 13 on an arduino mega, which is connected to the onboard led) The program directly accesses the ports. No HAL at all.

#include <avr/io.h>
#include <util/delay.h>


int main(void)
{
        DDRB |= (1<<7);

        while(  (  ( PINB & (1<<6) ) !=(1<<6) ) )
                ;

        while(1)
        {
                _delay_ms(100);
                PORTB |= (1<<7);
                _delay_ms(100);
                PORTB &= ~(1<<7);
        }
}

The second program uses a class template which is parametrized by an uint8 - the pin number. ‘note that the program sets the output direction always to out. this is to make it comparable to the classic version.’

#include <avr/io.h>
#include <util/delay.h>

typedef enum {in,out} PinDir_T;

template<uint8_t PinNo>
class IOPin
{
        public:
        IOPin(){};
        void SetDir(PinDir_T dir)
        {
                switch(PinNo)
                {
                        case  2: DDRE |= (1<<4); break; // pin  2 is PE 4 on arduino mega
                        case 12: DDRB |= (1<<6); break; // pin 12 is PB 6 on arduino mega
                        case 13: DDRB |= (1<<7); break; // pin 13 is PB 7 on arduino mega
                }
        }
        void SetHigh(void)
        {
                switch(PinNo)
                {
                        case  2: PORTE |= (1<<4); break;
                        case 12: PORTB |= (1<<6); break;
                        case 13: PORTB |= (1<<7); break;
                }
        }
        void SetLow(void)
        {
                switch(PinNo)
                {
                        case  2: PORTE &= ~(1<<4); break;
                        case 12: PORTB &= ~(1<<6); break;
                        case 13: PORTB &= ~(1<<7); break;
                }
        }
        bool State(void)
        {
                switch(PinNo)
                {
                        case  2: return (PINE & (1<<4) ) == (1<<4); break;
                        case 12: return (PINB & (1<<6) ) == (1<<6); break;
                        case 13: return (PINB & (1<<7) ) == (1<<7); break;
                }
        }
};



int main(void)
{
        IOPin<13> ledpin;
        IOPin<12> testpin;
        ledpin.SetDir(out);

        while(0 == testpin.State())
                ;

        while(1)
        {
                _delay_ms(100);
                ledpin.SetHigh();
                _delay_ms(100);
                ledpin.SetLow();
        }
}

The main program is much more readable, and changing the pins requires only the change of the pin number (provided, that the IOPin class has definitions for that pin number). It is also immediately clear, that the code in main() has no AVR specific aspects. They are all encapsulated in the IOPin class

But.. uuuuuhhhhh - the code bloat.. and oooooohhhh - it will be slow.

No, its not bloated, and no its not slow. Lets compare the code sizes !

Classic version

$ make size
avr-size main.o  avr-test.elf
   text    data     bss     dec     hex filename
     48       0       0      48      30 main.o
    308       0       0     308     134 avr-test.elf

avr-size -Ax avr-test.elf
avr-test.elf  :
section                      size       addr
.data                         0x0   0x800200
.text                       0x134        0x0
.stab                       0x5dc        0x0
.stabstr                    0xead        0x0
.comment                     0x11        0x0
.note.gnu.avr.deviceinfo     0x40        0x0
.debug_info                 0xbbc        0x0
.debug_abbrev               0xb1a        0x0
.debug_line                  0x1d        0x0
.debug_str                  0x3e6        0x0
Total                      0x30e7

Template version

$ make size
avr-size main.o  avr-test.elf
   text    data     bss     dec     hex filename
     48       0       0      48      30 main.o
    308       0       0     308     134 avr-test.elf

avr-size -Ax avr-test.elf
avr-test.elf  :
section                      size       addr
.data                         0x0   0x800200
.text                       0x134        0x0
.stab                       0x60c        0x0
.stabstr                   0x11ff        0x0
.comment                     0x11        0x0
.note.gnu.avr.deviceinfo     0x40        0x0
.debug_info                 0xbbc        0x0
.debug_abbrev               0xb1a        0x0
.debug_line                  0x1d        0x0
.debug_str                  0x3e6        0x0
Total                      0x3469

What’s this ? The sizes are the same ? Is this a copy and paste error ? :)

No. The total size of the ELF files differs ($30e7 vs $3469) but diffing the hex files reveals that the generated code is exactly the same in the classic and in the template version.

I’ll try to explain why:

  • Templates are evaluated at compile time. That means, the constant pin number value passed to the template results in a compile time constant switch expression
  • The compile time constant switch expression is simpified by the compiler to the content of the selected case.
  • The compiler inlines the code in the called function of the IOPin class.

Basically, the compiler throws away nearly everything from the HAL class that isnt needed, and leaves only the code that one would have written by hand.

A look into the lss file of the template version shows the efficiency of this optimization - it generates two assembler statements for the while(0== testpin.State()) loop:

Template version

00000100 <main>:
        {
                switch(PinNo)
                {
                        case  2: DDRE |= (1<<4); break; // pin  2 is PE 4 on arduino mega
                        case 12: DDRB |= (1<<6); break; // pin 12 is PB 6 on arduino mega
                        case 13: DDRB |= (1<<7); break; // pin 13 is PB 7 on arduino mega
 100:   27 9a           sbi     0x04, 7 ; 4
#ifdef TEMPLATEBASED
        IOPin<13> ledpin;
        IOPin<12> testpin;
        ledpin.SetDir(out);

        while(0 == testpin.State())
 102:   1e 9b           sbis    0x03, 6 ; 3
 104:   fe cf           rjmp    .-4             ; 0x102 <main+0x2>

The classic implementation isn’t any bit better: Classic version

00000100 <main>:


int main(void)
{
#ifdef CLASSIC
        DDRB |= (1<<7);
 100:   27 9a           sbi     0x04, 7 ; 4

        while(  (  ( PINB & (1<<6) ) !=(1<<6) ) )
 102:   1e 9b           sbis    0x03, 6 ; 3
 104:   fe cf           rjmp    .-4             ; 0x102 <main+0x2>

The code and a Makefile is in the examples/avr-cpp-example subdirectory of my toolchain project.