Binary savegames insight

Because I’ve been dealing with the new format of ‘binary’ savegames of EU4 and CK2 a lot lately, this a couple of paragraphs that may prove helpful to anyone that also wants to get dirty dealing with them so it’s even possible to edit them in a hex editor for ones needs.

Savegame files have long followed the tradition of most files in Paradox games, meaning that they were text-based and easily understandable. So much that they could be edited through any text editor. This was always all fine, at the cost of the size of savegames, which for the newest games fall within the 30-40 MB range but I remember CK1 games being at least twice as big.

For many years this was an acceptable trade-off until the arrival of Ironman modes which presume very frequent (usually monthly, in game’s time terms) savegame writes which make speed at which those are created crucial. To make things better, compression was introduced which is enforced for all such saves, which trades less strain on disk performance for bigger processor usage, actually saving some time overall. But this wouldn’t be enough and that’s why ‘binary savegames’ were born to address three inherent issues the text files had:

  1. There was always huge repetition of attribute identifiers which ranged from short ones (like ‘name’, ‘type’, ‘id’) to more space-consuming ones (like ‘technology_group’, ‘government_rank’, ‘revolutionary_flag_texture’) and all those letters took actual space in the file (and read/write time). Because there’s only handful (several hundreds at most) of such unique identifiers, their full text representation (x bytes long, where x = number of letters) was replaced with a short identifier (2 bytes), which I will further refer to as ‘token’.
  2. A lot of saved data corresponds to numbers but in text savegames it was stored as text. Not only this is usually more space-consuming (1 digit in text = 1 byte) but also requires to convert numbers to text when saving and doing an opposite conversion upon loading.
  3. Finally, there’s a lot of markup that is inconsequential for the computer like line breaks, indentation and such which was there for human readability.

This way, addressing those issues, rather than having this file structure:

We arrive at this:

So how to read through this new representation of data?

First, theory. All those hex codes in the binary version of the file represent one of couple of possible structures:

1. Tokens
Tokens, that I already mentioned, are shortened representation of text identifiers and they always take 2 bytes. Those are, in this example, ‘date’, ‘save_game’, ‘player’, etc. Not only token have their hardcoded textual representation but sometimes they also have an attached properties: type of data that follows a token (more on types below) and define whether any text snippet that follows is enclosed in quotes or not. Each game has its specific sets of tokens, supplied along with Ironmelt program (‘eu4bin.csv’ and ‘ck2bin.csv’ files).

2. Special codes
Usually type of data is not implied by token but set explicitly by one of special codes which are also tokens of some sort and also take always 2 bytes. Special codes also denote equals sign and braces that build the file structure.

The special codes are as follows:

Code Type / character
01 00 =
03 00 {
04 00 }
0C 00 Integer (or Date)
0D 00 Float
0E 00 Boolean
0F 00 String
14 00 Integer (or Date)
17 00 String
67 01 Float5
90 01 Float5

3. Data
Data itself, in type either implied by a preceding token or set by a special code.

a) ‘Boolean’ is the simplest 1-byte long type which is either ‘01’ for true or ‘00’ for false;
b) ‘Integer’ is a normal 4-byte long big-endian integer number;
c) ‘Float’ is stored as ‘Integer’ (also 4-bytes long); it is interpreted by the game as a float by dividing it by 1000 – it has therefore 3 digits of precision in text representation;
d) ‘Float5’ is a higher-precision version of ‘Float’ also stored as ‘Integer’; it occupies 8-bytes (but only first 4 are actually filled in and interpreted); it is interpreted by the game as a float by dividing it by 32768 – and has 5 digits of precision in text representation;
e) ‘Date’ is also stored as ‘Integer’, also takes 4-bytes; proper interpretation described below;
f) ‘String’ represents text entries – 2 first bytes denote the string length (in big-endian notation); followed by the text encoded as ANSI string (1 character = 1 byte), no string termination character is needed.

The date, as stored in the binary savegame file, is a number of hours that passed since 1 Jan 5000 B.C. (as ‘Integer’). To arrive at the specific date you have to take into account some regularities (like different length of months) but leave out some other (there are no leap years). Here is the detailed code which should let you do these maths:

string DecodeDate(int input)
{
   int[] monthLength = new int[] { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };

   int hour = input % 24; // remainder of division
   int year = -5000 + input / 24 / 365;
   int day = 1 + input / 24 % 365;
   int month = 1;

   for (int i = 0; i < monthLength.Length; i++)
   {
      if (day > monthLength[i])
      {
         day -= monthLength[i];
         month++;
      }
      else
      {
         break;
      }
   }
   return year + "." + month + "." + day;
}

So, going back to our example, here’s how we can decode the beginning of a binary file. I left only actual encoded text information in the text preview on the right for clarity, everything else is dots. I gave successive blocks in the file alternating colors so that they are better visible.

45 55 34 62 69 6E

First of all, there is a file type identifier which is encoded as ANSI string. ‘EU4bin’ and ‘CK2bin’ denote binary files for either of games, just as text files start with ‘EU4txt’ and ‘CK2txt’ snippets. These identifiers are 6 bytes long.

4D 28

We don’t expect any data yet, this is not a special code so it must be a token obviously, meaning ‘date’ (you may look it up in eu4bin.csv). The default data type for this token is, you might’ve guessed, ‘Date’.

01 00

Now, a special code assignment (equals sign) follows.

0C 00

And another special code, meaning that we should expect, ordinarily, an ‘Integer’ type. And we are coming to an exception – we should actually ignore special codes switching from ‘Date’ to ‘Integer’ type and follow the token-prescribed type to have the file processed well.

10 77 5D 03

After we had an assignment sign and a special code (ignored but still) we must finally read that date. It is stored internally as ‘Integer’ and by using the proper approach we would decode this number to 11 November 1444.

69 2C

Another token, this time ‘save_game’ which expects quoted text strings ahead.

01 00

And naturally, a special code for assignment.

0F 00

This time we also find the data type clarified with a special code, which is ‘String’.

0F 00

This one may look misleading but after an assignment special code and data type special code, there is no need for any further special code. This is the beginning of String data which defines its length. F in hex notation means that the String will have 15 letters.

69 72 6F 6E 6D 61 6E 5F 62 69 6E 2E 65 75 34

So we grab the next 15 bytes to convert them to ‘ironman_bin.eu4’ text.

38 2A

Time for another token, ‘player’ (we also expect quoted string here).

01 00

Assignment operator

0F 00

Another time we are reminded this will be string.

03 00

Three bytes (characters) long

57 41 4C

Which is “WAL” (yes, I fired up once a hands-off Wallachia game)

C9 2E

Now to something a bit more interesting – we begin with a token, ‘savegame_version’ (no specified type to follow, because as we will see, it won’t be followed directly by any data).

01 00

Assignment operator

03 00

And another special code, this time for left brace, to complete the opening sequence of “savegame_version={“

And in further bytes (colored black), next tokens are placed, followed by assignment operators, integer special codes and integer numbers themselves.

04 00

Everything is closed with a special code for a closing brace.

 

For your hex edits I can fully recommend this favorite program: http://www.hexedit.com/ (the site and screenshot look ancient but the program has modern-looking themes and a lot of functionality).

For more detailed insight, you can follow the code at: https://bitbucket.org/CodeOfWar/ceironmelt.

And you can ask questions about decoding in Paradox forums, e.g. the Ironmelt ones.

2 thoughts on “Binary savegames insight

  • May 20, 2017 at 07:09
    Permalink

    Thank you so much for this. Was trying to figure out a EU4bin save meta file. Could see the string values themselves, but wasn’t sure how they were being delimited. The explanation here made everything so much clearer.

    Reply

Leave a Reply

Your e-mail address will not be published. Required fields are marked *