The original approach is *really* bad at drawing vertical lines (it ends
up working a pixel at a time and works the chip select for each one.
Optimize both the pixel fill and the use of the line buffer. The result
is 20% faster for quarter screen fills, 3x for horizontal lines and 6x
for vertical lines.
Signed-off-by: Daniel Thompson <daniel@redfelineninja.org.uk>