Monday, 14 August 2017

The World's Fastest Endian

Introduction

The modern computing world is dominated by a particular binary data format where certain bytes are arranged in a particular order. This affects reading and writing of data quantities that are larger than a byte. Intel dominates with this particular byte order. It is called Little Endian and it is the world's fastest endian. Not only on the desktop but also in the mobile space where it has grown rapidly in recent tears. What is an Endian? The term Endian is said to be coined by engineer Danny Cohen from his paper "On Holy Wars and a Plea for Peace" where it referenced endian from the story Gulliver's Travels. Where war broke out from an argument over whether the big or little end of a boiled egg was the right one to crack. Although only a story, it still sounds silly. But this is where the computing terms Big Endian and Little Endian get their naming from.

So Endian in this context refers to the order of bytes used in storage of larger data types. The smallest unit is a bit, four bits make a nibble and 8 bits form a byte. Data larger than a byte will be stored in a particular endian, that is a particular byte order. In quantities such as 16-bit, 32-bit, 64-bit and beyond. Translated into bytes that would be two bytes, four bytes, eight bytes and so on. Or a short word, long word and long long word. As you can tell, whoever came up with these terms was obsessed by food or was a big talker! :-D


Big Beginnings and Little Endians

The two most commonly known are Big Endian and Little Endian. Also written as Motorola and Intel. Motorola uses big endan. Intel uses little endian. Within data the number as a whole has what's considered to be the most significant bit down to the least significant bit. And complimenting that, the most significant byte to the least significant byte. That is, the most significant digit is the highest digit for that number, and the least significant digit the lowest digit for that number, as represented on the computer. Although for a computer it needs the complete number as a whole to work at all, or it can crash, the quantity is divided into the most significant and least significant portion of that whole amount.
An endian sets the order that these bytes are arranged in what ever quantity there is. For big endian the most significant byte (or MSB) is stored first, following to the least. For little endian the least significant byte (or LSB) is stored first, following to the most. This is all very well but how about an example? Starting simple, suppose we have the number 1, in 16-bit form, in memory. This is how it would look:


Big:    00,01

Little: 01,00

As you can see with big endian it is stored, in relative terms, in human or forward order. And with little endian it is stored backwards. But what about 32-bit? For big endian this is straightforward enough. But for little endian, well, in effect, not only are bytes reversed but here 16-bit words must be reversed as well. Confusing? It gets worse for 64-bit! But let's just see how $A1B2C3D4 (hexadecimal notation) would be written in 32-bit:


Big:    A1,B2,C3,D4

Little: D4,C3,B2,A1

As you can see I've been joking with you and the result is totally reversed. Haha. :-D  So the order it is stored in is simple enough. Big endian is forward. Little endian is backward. However, there are also hybrid endians, such as the PDP-11 computer, where only the top and bottom 16-bit words in a 32-bit value are stored in little endian: Like so:


PDP-11: B2,A1,D4,C3

You may have experience with reading or writing code on your Commodore 64 or similar computer. And come across the terms low byte and high byte. This is exactly the same deal. 16-bit memory locations on the Commodore 64, with the MOS 6510 (and also on 6502 for that matter), were stored as little endian. So the 8-bit Commodore Computers were little endian. As a comparison, when Commodore dropped their 8-bit line in favour of the Amiga, the new 16-bit generation, they also changed endian. Since Motorola meant big endian.
Order of bits in a byte can also be big endian or little endian. That is forward or reversed. Usually bits can be isolated and combined to form the needed amount singly so this hasn't been as much of a deal as byte ordering. Computers can easily shift and rotate bits left or right in data and mask it out.
At one stage I thought all the bytes reversed on an Intel CPU would mess up this bit shifting and the result would be a little endian byte mess! But no, as it happens, the byte order really only affects reading from and writing to memory. As far as the programmer is concerned, the data inside an Intel CPU once it is read in, can be considered big endian. Since it works with it as a whole. A whole binary number in what ever quantity.
But an interesting observation here, is that even the bits in a byte on little endian Intel systems, are ordered in big endian order. Inside a byte, the most significant bit is stored first and the least significant bit is stored last, looking from left to right.  This is fairly standard across the board and if bit order differed it would have caused deeper issues than a simple byte ordering. Data between computers would have been incompatible at the atomic level inside bytes and even simple data structures would have needed mass conversion.



The Little Endian That Could

But where did this choice of endian come from? What is the history? How did Motorola and Intel chose the endian that would affect computers all over the earth?


It may surprise you that the root of the x86 architecture as a whole and the PC as we know it did not start at Intel alone. In 1969 Computer Terminal Corporation (CTC), who later became Datapoint, were working on a computer terminal which would also happen to be a standalone computer system. CTC had already designed the basis of the CPU intended for their computer with the instruction set and associated machine code executing the instructions. Similar to the early Amiga chip set prototypes, they could do it with separate electronic components, by combining TTL (transistor-transistor logic) circuits. But they wanted this to be manufactured as a stand alone chip. As an MPU, a micro processing unit. A computer CPU (central processing unit) already existed in the form of combining TTL chips on different circuit boards but an MPU reduced this to one chip containing a simplified CPU core.  So they commissioned two companies to do the job under contract. Intel and Texas Instruments. Intel were mostly known then for their memory chips at the time. And TI for their TTL and IC (integrated circuit) chips. As it so happened, it took too long for Intel to produce the CPU design on silicon. And the chips TI did produce, were too unreliable. So CTC ended up putting the "CPU" together themselves using the TTL components (in IC chips) they sought to avoid. They let Intel keep the design and they dropped TI. Intel eventually made this design into a CPU reality. And it became the Intel 8008. This spawned the 8080 and later the 8086. (Then also the 8088.) The basis of x86.  The significance here and the big point I'm making here is the original design by CTC for the CPU was little endian. And since Intel became a major player in the CPU market, they kept making little endian processors. The ISA (instruction set architecture) for what become the 8008 was developed at CTC by Victor Poor and Harry Pyle with TTL design handled by Gary Asbell. The original design was based on a serial processing configuration where internally one bit was processed at a time on 8-bit data. The Intel implementation used parallel processing so 8-bit data was processed all at once. Despite this, the serial CTC processor was still faster than the Intel parallel processor.
Given these were primarily 8-bit based CPUs the little endian choice made sense. Since they worked in a byte wise fashion and mainly processed with 8-bit precision the LSB (least significant byte) of data was where they started from. So by itself a byte could be thought of as being a LSB, even if a solo byte contains the quality of being both a MSB and LSB. By treating it as a LSB, they had a starting point, and if any more precision was required, they simply read in another byte for the next significant. Also, because of this, when doing math in quantities greater than 8-bit precision, the LSB is processed first. The result is carried over to the next byte up and calculated until it reaches the top MSB (most significant byte). Similar to how we would add numbers by adding the right most digits and carrying the one to the next digits on the left, when those digits added up to form two digits. This would also explain how the Commodore 64 with a simple 8-bit CPU can do math with numbers of greater quantity than 8-bit including floating point math. It also affects memory locations. The first 256 bytes of memory otherwise known as page zero fits in an index byte that would specify LSB with the MSB being zero. For a 16-bit location it reads in the low order byte then the high order byte. Thus low byte then high byte.
The history for Motorola seems to be less well known. Although the first Motorola CPU, the 6800 is 8-bit like it's rivals and similar in operation, with 64KB memory space and accumulators for math operations, it is big endian. This looks likely to be influenced by mainframes such as those made by IBM. Which in effect worked in big endian. This choice was carried over to the 68000 series which was internally 32-bit. To the successor to the 68000, the ColdFire. And of course, with the combination of Apple, IBM and Motorola (the AIM alliance), the PowerPC.
There has also been a big vs. little endian debate waging war throughout the years with an almost religious virtuosity. Historically the choice would usually be made for technical or other reasons. But each has their supporters. Like the Holden vs. Ford car battle in my native Australia it was Motorola against Intel. And like any Amiga fan I classified myself as a Motorola person. Each endian has advantages and disadvantages, or rather, a particular feature above another. Big endian gives you the number sign in the first byte and you can test how large it is from that first point. Little endian tells you if a number is odd or even in the first byte and you can read any multiple sized quantity from that first point on.



The Big Endian That Would

A lot of the early big computers such as minicomputers and mainframes made by IBM are big endian. Like the IBM Series/1 minicomputer. The IBM System/360, System/370 and ESA/390 mainframes. And the more recent IBM z/Architecture mainframe. The DEC PDP-10 also supported big endian.
It was also common for RISC based CPUs to be big endian. The HP PA-RISC was big endian. As was the Sun SPARC. There is also the Microsoft Xbox 360 using Xenon processor, Nintendo GameCube, Wii, Wii U and Sony PlayStation 3 using the Cell CPU. All consoles using a CPU based on the Power architecture. And all big endian.
Known current CPUs with big endian architecture are the (Motorola 68000 based) Freescale ColdFire, Xilinx Microblaze (FPGA), Hitachi SuperH, IBM z/Architecture and Atmel AVR32.
Other CPUs supporting big endian by being bi-endian now would be SPARC, Power Architecture and derivative PowerPC, which were originally big. ARM architecture was exclusively little-endian but became bi-endian with ARM 3. MIPS, HP PA-RISC, Hitachi  SuperH SH-4 and Intel Itanium IA-64 are also bi-endian. Intel x86 has had implicit big endian support with BSWAP instruction to swap bytes since 80486 series. And since Core series has specific big endian support with MOVEBE instruction to read and write memory data in big endian format. 
As a side note, the IA-64 is derivative of the PA-RISC as well since Intel collaborated with HP to produce a modern 64-bit RISC CPU architecture without the old constraints of x86. Since Commodore were looking at the PA-RISC, had an Amiga reboot happened, it might have had an Intel Inside. But at least it still wouldn't have been a PC! LOL :-D
As well as hardware supporting big endian we also find software and file formats supporting big endian. There are data encoders in big endian. Such as LZMA compression algorithm using it in some data types. The XDR External Data Representation Standard. And Java byte code has data encoded in big endian. For internet data structures, the data for IP addresses also has to appear in a certain endian, even though the dotted quad has referenced four bytes as separate units. This is called network order. And it's big endian. :-) Text strings are by their nature non-endian or endian agnostic since they are byte based. But are stringed together relatively in big endian order in the sense they are stored in forward order. This also has the effect that on a big endian system it can read a string from memory in blocks of sizes like 32-bits and the end result is exactly as it looks in memory. So it can optimise code checking for an exact four letter sequence of characters. There is also BCD, binary coded decimal, as used in the 6502 series. Here decimal digits are stored in each nibble, so a bit different to byte based storage, yet the nibbles are stored as big endian. 
Encoding binary data in big endian can also make sense regardless of CPU used for decoding. For example, say there is a program that is compressing a data stream, complimented by another program to decompress it. Because of the nature of the encoding, binary data bits of varying lengths would be packed together and likely would be cached inside a CPU register for proper optimised code, before it is written as a small block to memory. The binary stream could also be packed in left to right order so that it can be read in and decoded as if it is one large stream, which it would be over all, with binary codes from the left affecting the interpretation of binary codes read later on the right. And with our big endian CPU the data is stored exactly as it is encoded. On little endian this encoding and decoding is not so easy. Say we optimised the data for a little endian CPU. Now, let's consider the data stream being encoded on a 32-bit CPU, but being decoded on a 64-bit CPU. Here we have a problem, the encoding 32-bit CPU code wants to write the data in blocks of four bytes reversed (32-bits wide), but the decoding 64-bit CPU code wants to read in blocks of 8 bytes (64-bits wide). So it is forced to inefficiently read in blocks of four bytes or the end result is every pair of 32-bit data words will be reversed. And the same would happen for 32-bit CPU code decoding a stream written from 64-bit CPU code, unless it compensated by reading ahead four bytes before reading back, in 32-bit blocks. As you can see, with differing word sizes, little endian data is not interoperable. On big endian, with multiple word sizes, data is interoperable. So encoding and decoding a stream here as big endian makes sense.



The following is a list of file formats using big endian coding and some are surprising.

  • Adobe Photoshop
  • IMG (GEM Raster) 
  • JPEG
  • MacPaint 
  • SGI (Silicon Graphics) 
  • Sun Raster
  • WPG (WordPerfect Graphics Metafile) (From the PC!)



The following is a list of file formats with bi-endian coding.
  • DXF (AutoCad)
  • Microsoft RIFF (.WAV & .AVI)
  • TIFF
  • XWD (X Window Dump) 


This one added for interest. A little endian format from a big endian Mac! :-)
  • QTM (Quicktime Movies)


Big Trouble in Little Endian

With the dominance of x86 in the desktop and server market little endian rules the roost. And not even Intel themselves with the more modern IA64 architecture design has managed to break this stride. This signals that people just don't want change even if the alternative is modern and/or better. Of course AMD didn't help by extending x86 to 64-bit and getting a hit on the pop charts where Intel were forced to follow the trend. And as ARM picked up in the mobile space it too followed on the Intel lead and promoted little endian as the endian of choice. 
Because of this, even with high level languages being common in the mainstream, code has become very dependant on little endian hardware. In reality, high level code should be portable, but in the real world we live in this is simply not the case. It's doubtful code would be checked for real portability and the unwritten rule would be if it works on Intel (or anything little endian) then it works. A side effect of this would be in web browsers which are also becoming rooted in little endian and having the effect of rooting browsers running on big endian! Things such as tables generated by web page scripts have become dependant on little endian even though by their very nature scriptural data should be endian agnostic. And this is causing problems for those porting an updated browser engine to OS4. Even though we still don't have Flash or full video support, the problems there pale to the ones we have now where sites will stop working correctly or crash the browser because the whole rendering engine is dependant on little endian.
I've seen code myself that assumes the hardware works in a certain way when reading memory. A simple case would be looking for an ID (such as a text ID) and comparing it with the reverse of what it really wants. At this point the code has gone below the high level it should be at and entered the low level. It is also unnecessary since simple macros can be created and do exist to deal with these types of values that are both readable and portable. Yet, they don't do that. And prefer to, it seems, write more confusing cryptic code that is tied to one particular hardware architecture. In lots of cases it would be a four character string ID so technically incorrect to treat as one quantity and it is just a cheat. But on the opposite side, if done on big endian, it works as intended and is readable in a WYSIWYG way. Still, in what ever method is used, I think the simple portable way would be best and a subset of endian problems wouldn't appear.
Windows has helped to push this little endian dependant coding style since it is dominant on computers and little endian. Just as Linux has become now and furthers the push. And when Apple dropped PPC just after they went to PPC64 in favour of Intel, they didn't help here either, since that caused an immediate decline in big endian on the desktop. And further pushed little endian into the already existing market place. So big endian architectures have been struggling for support. We see this in the AmigaOne machines. The first model was relatively reasonable in price due in part to Apple still using and promoting PowerPC on the desktop. But since PowerPC hasn't been available as a standard desktop CPU for purchase has meant that later AmigaOne machines must find an alternate source for PowerPC and that is lacking in both hardware and software support.
It is even affecting the big and powerful IBM POWER machines and servers. Where IBM made the decision to run the main OS in little endian mode to alleviate the porting and compiling of software written to work on x86 architectures. This just looks like they are selling out and bowing down to commercial pressure. Rather than coders fixing the problem where it should be fixed in the source code so it is portable. These days I would have expected high level compilers to support specific endians of data types or be able to support WYSIWYG so portability can be controlled at the compiler level instead of forcing the coder to kludge it in or take the lazy approach. This is also affecting PowerPC downstream in other main Linux distros. Although official PowerPC desktop release builds were dropped leaving x86 only builds there were still community supported PowerPC ports. So now 32-bit PowerPC builds are dropped. But 32-bit i386 is still officially supported. I wonder why they keep supporting i386 when now days nobody should be running anything under x86-64. 64-bit PowerPC little endian is the only build now with a supported port. They called it ppc64el. Yes a bit of a geeky joke there. But what could be more to the point, if you rearrange it another way, is it really ppc64intel? Aside from a slight pun it also rhymes. ;-)
We also see this in hardware such as in the Radeon GPU which has dropped support for big endian packed pixel formats since the Southern Islands series. This has affected some OS4 applications. For true-colour ARGB values this is understandable as working with BGRA is workable in byte units, even if the RGB array is reversed. But with hi-colour this doesn't make sense as a 15-bit hi-colour value splits the 5-bit sized colour bits unevenly across two bytes. The isolated 5-bit values are also in big endian bit format. So this is one example where what was practical and logical has been replaced by something impractical and illogical with the respect to the data it contains because of market pressure. For what is popular. Optimising for the host CPU here is irrelevant as they already support big endian operations natively and in most cases the code should not deal with it directly but pass it to the GPU for processing.
Other hardware such as USB, PCI and sound cards expect and use data structures in little endian. This hasn't been much of a problem. And working with this hardware has been possible with relative ease. Even on the Amiga the CIA chip addresses were split into low, mid and high bytes for 24-bit and 16-bit counters derivative of the C64 6526 little endian CIA. But in that case any endian implied didn't matter as each counter had to be accessed byte wise due to memory layout. In some ways it was like the methods used in classic 8-bit CBM BASIC, where memory pointers and other 16-bit values were POKEd into memory one byte at a time, by isolating the low byte and high byte separately.


Big Losers and Little Winners?

So, is that it for us? Big endian and little endian battled it out in the market place and the big endian lost? We could look at it the way we look at the Amiga and PC, CD and MP3, or Beta and VHS. One is obviously superior over the other, but it isn't always the best man that wins. It can be the most popular that simply wins out. We could look at it as little endian winning the battle given that is the most common endian in the hands of the consumer, and most popular in that respect. But big endian is still out there in the world as demonstrated above. Of course, we are interested in the Amiga world, so what matters to us is what is relevant to us. And when what matters to us is irrelevant to the rest of computing society, then we have a problem, because it filters down to us. And right now we have a problem.

The Endian

What's the future of big endian? It's hard to say. We could easily prophesy that it is looking bleak and it might be an accurate prediction. After all, if you are not part of the common populace, you tend to be left behind. But big endian is not a dead endian. It's still in use in various markets around the world. It's just that it isn't used in common consumer computers worldwide so this has mounted the pressure against big endian. There has been talk of porting AmigaOS to x86 for decades now. AROS was there at the start. And there are rumours that's where MorphOS will transition to next. Will OS4 be forced to kneel down and also worship in the church of the little endian? It's possible! Though a full blown port to x86/64 is still unlikely now, it could gradually transition to be endian agnostic and little endian compatible. PowerPC is bi-endian, so a litle endian port of OS4 can still run on PowerPC, and that could be a start. It's been said that OS4 running in little endian would alleviate many problems we are having now. Such as software ports, hardware structures and interoperable data like audio and graphic formats.
But, it would also mean the entire software library we have now, as well as OS4 system software in current form, would be instantly incompatible. Unless there was some compatibility layer built in or a special loader that could patch code at runtime. Next to the obvious solution of recompiling every piece of software we have including the OS, mass testing aside. Another option is porting to another CPU, such as ARM, but retaining big endian. Where we can have access to more affordable CPU hardware of evenly matched power. Or even, a hybrid solution of sorts, such as using x86 [ :-O ] but coding it to specifically use big endian instructions only. This would likely require a custom compiler. The easiest in the meantime might be to mark data as little endian, since on certain PowerPC models, pages can be marked as a specific endian. This would also require support from the compiler as well as all the variables and data blocks able to be marked as little endian and respecting data type size.
Well that's it for my article. I think I've said enough. And hope you have learned as much as I have when researching this as you have been reading it. The discussion certainly won't end here. Feel free to continue the discussion in the comments. Big words or small words. All endians, great and small. For now. The Endian. :-P

2 comments:

  1. That is just sad... so it is.

    ReplyDelete
  2. Wow! Thanks for this new perspective. I didn't realize that endianness was such a big barrier to x86 transition (or ARM for that matter). This would explain a lot. Thanks for doing the research. Time for me to read more about bi-endians I guess

    ReplyDelete