hawk.ro / stories / Reverse engineering Aurora Videographics .im file format

Venix and Aurora/75 Videographics System
Part II - Aurora image file format

More to discover

Apart from compiling aclock and finding a way to transfer files to and from the system, I looked around to see what other things of interest can be found. Besides a nice Venix demo
and some text mode games

there was also a lot of stuff in /usr/jr.

Most of the programs there would not run under PCEm. Text sniplets in some files revealed references to some computer to 35mm film printers (called cameras) that an internet search revealed to have been "a thing" in the 80s.1 There was obviously something more on the secondary harddrive, that was not included in the back-up.

The most promising were the .im files found in /usr/jr/art (quite a suggestive name).
These were quite obviously images and fonts, however, a cursory web search did not reveal much about their format.

A simple attempt

On first look, the files don't seem compressed (repeating patterns and large chunks of 0s around) so the simplest approach seemed to be writing a small program for trying various pixel formats and easily varying image size to look for patterns. However, despite some vague patterns, just enough to convince me of an unseen order, nothing more came out of this.

Filename is bars.im, we can see bars, there's got to be something here! :)

Some clues lead to an important piece of computer history

Time for another look through the rest of the files. grmsg.dta proves very helpful - as the name implies it contains a lot of text messages related to the functioning of the programs, and between those, finally its name:
Aurora/75 Videographics System - Edition 3.1
One search result reveals an important piece of history:

Yet, despite the program's revolutionary and manifold contributions, SuperPaint would get scrapped by its development company, Xerox PARC, just a year later, forcing Shoup to leave and found his own graphics company, Aurora Systems.

Another search for Richard Shoup SuperPaint leads to www.rgshoup.com/prof/SuperPaint/. The page and the links are worth reading, especially for one interested in the history of frame buffers, paint programs and palette animations.

A call for help that gets a sad answer

Having found that, I sent an e-mail to Dr. Shoup in the hope that he would be able to at least give me some hints regarding the file format. After that I took a look at the sources he kindly provided on his site and noticed the Run-Length Encoding used. A quick add of this to my experimental viewer revealed that although the result looks larger, and more "patterned" it still doesn't resemble a good image.

RLE decoded data from heart.im

Then the reply to my message arrived: Sadly, Dr. Shoup is no longer alive. Not only is this way closed, but it's also sad to find that a great programmer and inventor is gone.

SuperPaint file format

Since I still had the impression that SuperPaint's .pa file format could be related to Aurora's .im, I tried to understand the .pa format. Using SuperPaint's sources I was eventually able to write my own simple decoder. Leaving aside the header and palette, the image data is in a strange format. There is one RLE record (value,run) that fills part of each consecutive line. So in order to get a good decode the program must know the height of the image (usually with raw formats the width tends to influence the outcome) as well as keep an array of lengths for each decoded line.

It's easier to describe it with commented C.

/* SuperPaint RLE */
	xoff=malloc(ysize*sizeof(int)); /* array of x offsets */
	for(y=0;y<ysize;y++)
		xoff[y]=0;	/* initialize the array */
	while(cbp<bsize)	/* curent buffer pointer < buffer size - here both are */
	{			/* reffering to sizeof (RUNCODE) that is 2 bytes */
		for(y=0;y<ysize;y++) /* for each line */
		{
			if(cbp>=bsize) break;	/* avoid overflow */
			if(xoff[y]>=xsize) { cbp+=2; continue; } /* ignore the record if the current line is full */
			c=rc[cbp].VALUE;	/* get the value but */
			mp.r=(c&0x30);		/* don't care about palette */
			mp.g=(c&0x0c)<<2;	/* just create some */
			mp.b=(c&0x03)<<4;	/* different colors */
			c=c&0xc0;		/* for each value */
			mp.r|=c;		/* the upper bits */
			mp.g|=c;		/* should brighten */
			mp.b|=c;		/* our pixels */
			c=rc[cbp].RUN;
			for(x=xoff[y];(x<xsize)&&(x<xoff[y]+c+1);x++) /* fill RUN+1 pixels with VALUE but */
				 		/* don't go over the line */
				plot(screen,x+xpos,ypos+ysize-y,&mp,0);

			xoff[y]+=c;	/* keep track of our position (offset) for each line */
			cbp++;		/* next record */
		}
	}
Decoded result

Renounce the brute force approach

It's obvious that SuperPaint format is closely related to the hardware solution of SuperPaint machine. I.E. its video memory was basically just a long shift register and RLE decoding was apparently done in hardware. So instead of fooling around with various formats I just looked at hex dumps of the .im files and tried to put toghether a header structure. Eventually, after a lot of looking at hex dumps and calculating various values, the following structures (all values are HEX) became apparent: .im header starts with: cb b2 00 00 00 00 00 04 last 2 bytes seem to indicate its length (little-endian 0x400) it then has 8 bytes records as follows:
typedef struct {
        uint16_t        t1;   /* section type */
/* 0x28 = ???animation? in heart.im before palette ; 0x20 = palette (seen e.g. in aur.im), 0x10 = data? */

        uint16_t        t2;
/* this is 0x01 for palette and regular data but 0x80 for epilogue;
 * fonts have various values here, including ASCII codes */

        uint16_t        offh; /* offset high word */
        uint16_t        offl; /* offset low word */
        } imheada;

Regarding offset high word and offset low word: this originated on a 16bit machine and 32bit values were stored like this: B2 B3 B0 B1 where B0 is the least significant byte and B3 the most significant one. I don't know if that was the choice of the compiler or programmer.
Offset is absolute position in the file for that record; e.g. 01 00 00 38 would mean position 0x13800 into the file. We would encounter more like that.

The word data in the comment above is somewhat misleading, type 0x10 header points to another structure that begins with this header:

typedef struct
{
        uint16_t a1;
        uint16_t a2;
        int16_t a3;
        int16_t a4;     /* unknown (maybe signed) value */
        uint16_t s1;    /* size1? later learned it's width */
        uint16_t s2;    /* size2? height */
        uint8_t unkn[32]; /* this is identical for all available picture files */
	/* there are some differences in various sections of fonts */

        uint16_t rtabegh; /* HIGH word of start of rta */
        uint16_t rtabegl; /* LOW word of start of rta */
} rtah;
array of records follows, having the following structure:
typedef struct {
        uint16_t        t1;     /* 0x0007 = index into RLE, no other record type found */
        uint16_t        offh;   /* pointer to RLE high word */
        uint16_t        offl;   /* pointer to RLE low word */
        } rta;

And indeed, at the offset pointed by the first record in this array (e.g. in heart.im the first record of this list found at 0xa30 points to the beginning of RLE data at 0x2600.

The RLE data is the only similarity with SuperPaint, being encoded as a pair of BYTES containing value and run respectively. However, this doesn't help very much with how to display the actual data.

A question and an idea

There is a question bothering me: why would this array of offsets be required? It seems that each rta record is pointing at a block that would decode to 256 bytes. However, why would one need this array? Following the RLE records produces the same result. Or does it?
Having a programmatic way to decode the data (the start offset) I wrote a program to decode all these structures and write the decoded (or decompressed) result to another "raw" file. I also added a verification step to see if it sometimes happen that the decoded data would end up longer than 256 bytes. And indeed it happened in a few cases, especially near the end of files, but not only there. E.g. while decoding bars.im, I noticed that each 32 blocks my program would encounter an overrun. One thing to notice is that the header for this image suggests that one of the sizes is 511 bytes. This would later prove to be a clue.

More analysis reveals the pattern

Analysing the output of my decoder shows that the raw image data from the first section of bars.im is made of only 8 different values - this suggests a standard TV pattern of 8 colored bars. Looking at the hex dump of that raw image data, I noticed that after 1024 bytes of the same "color" (0x21) followed 16 blocks of 16 bytes all starting with 1 0x21 followed by 15 0x22:
000003f0  21 21 21 21 21 21 21 21  21 21 21 21 21 21 21 21  |!!!!!!!!!!!!!!!!|
00000400  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000410  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000420  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000430  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000440  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000450  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000460  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000470  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000480  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000490  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004a0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004b0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004c0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004d0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004e0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
000004f0  21 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |!"""""""""""""""|
00000500  22 22 22 22 22 22 22 22  22 22 22 22 22 22 22 22  |""""""""""""""""|

This along with the fact that on some huge widths (4096) there were some visible patterns in the (decoded) sections of some images:

and the periodicity of overruns, finally revealed the right conclusion: the picture data is organized as rectangles of 16 by 16 pixels, smaller at the right and bottom. The image is also vertically mirrored, i.e. first line is at the bottom. Now the reason for the array of offsets becomes apparent: each "record" points to a small 16*16 (or smaler) area in the image. The smaller areas at right and bottom are the reason for the periodic overruns I found with bars.im

The "Eureka" moment!

Actually I have decoded bars.im first, but they were simple bars so only with this logo I verified my hypothesis.
The jagged lines on right and the extra lines at the bottom are the effect of the overruns.

After refining the algorithm to take into account the proper image size and use smaller rectangles at the right and top (end)...

Proper alignment and decoding, palette is ignored for now

The end is near but palette poses another challenge

At first I thought that palette has to be simple, just triplets of RGB values. The beginning of the palette section looks quite clear. There are 2 16bit words, one is always 0 the other is always 0x80. I assume the second is the table length, in records, and this seem to be confirmed by all the available files. What's more, no file would contain more than 128 (0x80) different values in its data section.

As with my other assumptions, this proved rather wrong. Any triplet interpretation used didn't produce the right colors. Fortunatelly, there are some good images for experiment: bars.im, grays.im, bars100.im. However, despite the multitude of options, nothing would produce the proper colors (or lack thereof). I was somewhat afraid that it might use HSV or HSL values instead of RGB, but that seemed unlikely. Besides, grays should have the Saturation value at or near 0, this wasn't the case no matter how I interpreted the data. Back to looking at hexdumps, then.

<SNIP>
10 - 00 10 20
11 - 30 40 50
12 - 60 70 80
13 - 90 A0 B0
14 - C0 D0 E0
15 - F0 DF DE
16 - 1C 25 D7
17 - DD 1C 00
18 - FF EF FF
<SNIP>

Dumping the triplets like this makes it easier to try and make some sense out of them. But what if it doesn't make sense? In this example we see something that's quite unlikely to be encountered in any palette, so the triplets idea is clearly wrong. However, this nice sequence of raising values (seen in grays.im) pointed me in the right direction. The palette data is not kept as records(R,G,B) but as one long array of 128 bytes of red, then 128 bytes of green and 128 of blue.

Nice rainbow, isn't it?

This is not over yet

All in all a good exercise - my first try at reversing a completely unknown image format. I was lucky to have several files available, and with suggestive names at that. Having a few files around helped me understand the headers, the regular pattern of bars.im provided the final hints for pixel arrangement and along with grays.im gave me the clues for palette section format.

Update: The decoder is here.
Also, an archive with the files I worked with and their contents as png files.

Can you hear the 1000Hz tone in your head?

1 Later I found out that the "camera" was an optional accesory and the main part of the system were two video boards driving a monitor each, so in this respect the Aurora System was similar to SuperPaint. Even later, I learned from the current owner of the (at this time non-working) system that the boards were/are Number Nine Computer Revolution 512 series graphics cards.
Published 2015-09-17 by Mihai Gaitos - contacthawk.ro