The original approach is *really* bad at drawing vertical lines (it ends
up working a pixel at a time and works the chip select for each one.
Optimize both the pixel fill and the use of the line buffer. The result
is 20% faster for quarter screen fills, 3x for horizontal lines and 6x
for vertical lines.
Signed-off-by: Daniel Thompson <daniel@redfelineninja.org.uk>
sx is measured in pixels (2-bytes) and len(display.linebuffer) gives
a value in bytes so the divisor isn't right.
Whilst we are here let's make sure we use integer division too.
Fixes: #18
If an application crashes let's report it on the device so it can be
distinguished from a hang (if nothing else it should mean we get better
bug reports).
There's a bunch of different changes here but there are only really three
big wins. The biggest win comes from restructuring the 2-bit RLE decode
loop to avoid the inner function (~20%) but the switch to 16-bit writes in
_fill() and adoption of quick_write (e.g. no CS toggling) are also
note worthy (and about 5% each).