'------------------------------------------------------------------ ' ' This is an implementation of a self-designed compression algorithm. ' ' The code tries to be as simple and small as possible. ' ' (c) Peter van Eerten 2018, MIT license ' '------------------------------------------------------------------ ' ' The idea is that the compression will use bit patterns based on occurrences of data: ' ' xx11 -> 4 bytes ' xxx10 -> 8 bytes ' xxxx01 -> 16 bytes ' xxxxxxxx00 -> remaining 228 bytes ' ' The compressed data is identified by the 2 bits on the right. Based on the pattern, the ' decompression will determine the amount of bits which describes the byte. So, if a bit ' pattern is '11', then the next 2 bits describe an index to the actual byte (index being ' 00, 01, 10, 11). If a bit pattern is '10' then there are 3 bits describing the byte, and ' with bit pattern '01' there are 4 bits describing the byte. This way, it is possible to ' store 28 bytes in a compressed manner. ' ' The remaining 228 bytes will be stored as-they-are plus 2 identifying bits '00' for each ' byte. This indeed enlarges the result somewhat, though effectively, the total compression ' ratio can still be pretty good depending on the type of data. If the data contains a lot ' of similar bytes, like BMP or ASCII files, the algorithm can reach a ratio up to 40%. ' ' Of course, compressed files need a conversion table to restore the actual bytes. The ' conversion table is put into the compressed file by the compression routine. This also ' means, that a resulting file always will larger than 28 bytes. So compressing very small ' files of 10 bytes will actually result into a larger file (negative compression). ' '---------------------------------------------------------------------- ' Optimize for speed PRAGMA OPTIONS -Ofast DECLARE idx ASSOC long ' Get filename IF AMOUNT(ARGUMENT$) < 2 THEN PRINT "Usage: ", ME$, " " END 1 ENDIF file$ = TOKEN$(ARGUMENT$, 2) ' Using array references is faster than PEEK DECLARE data TYPE uint8_t* data = BLOAD(file$) length = FILELEN(file$) ' Analyze file - count occurences of bytes FOR x = 0 TO length-1 INCR idx(STR$(data[x])) NEXT ' Sort array based on value SORT idx DOWN ' Get the order of indexes dlm$ = OBTAIN$(idx) ' Array with the values for the 28 special chars. ' We declare 256 and leave the other elements 0. DECLARE new[256] = { 0 } ' Determine position for each byte FOR x = 1 TO AMOUNT(dlm$) IF x BETWEEN 1 AND 28 THEN new[VAL(TOKEN$(dlm$, x))] = x NEXT result = MEMORY(31+length*2) ' Identifier POKE result, ASC("B") POKE result+1, ASC("A") POKE result+2, ASC("Z") byte = 3 ' Store table FOR x = 1 TO 28 POKE result+byte+x-1, VAL(TOKEN$(dlm$, x)) NEXT DECLARE buf TYPE uint32_t INCR byte, 28 bits = 0 ' Determine value to be poked based on position FOR x = 0 TO length-1 pk = new[data[x]] IF pk BETWEEN 1 AND 4 THEN buf = buf | ((((pk-1) << 2)| 3) << bits) INCR bits, 4 ELIF pk BETWEEN 5 AND 12 THEN buf = buf | ((((pk-5) << 2)| 2) << bits) INCR bits, 5 ELIF pk BETWEEN 13 AND 28 THEN buf = buf | ((((pk-13) << 2)| 1) << bits) INCR bits, 6 ELSE buf = buf | (( data[x] << 2) << bits) INCR bits, 10 ENDIF ' Lower byte full? Poke to memory WHILE bits > 7 POKE result + byte, (buf & 255) buf = (buf >> 8) DECR bits, 8 INCR byte WEND NEXT POKE result + byte, (buf & 255) BSAVE result TO file$ & ".baz" SIZE byte+1 FREE result, data PRINT "Done in ", TIMER, " msecs. New size is ", FILELEN(file$ & ".baz")*100/length, "% from original."