GFF
The Binary Generic File Format (GFF) is intended to be used for almost every file type in a game so it must support as many data types as are needed.
A GFF editor is included with the toolset.
Contents
General overview
In a GFF, data is grouped into structs, with the file level being the top level struct. Each struct can have any number of fields of any type including structs and variable size lists of any type.
Differences between GFF V4.0 and GFF V3.2
For those familiar with GFF V3.2, here is a short list of differences between V3.2 and the new version
- 4-byte numerical labels instead of 16 byte strings for faster access
- Supports lists of any type for convenience (instead of only structs)
- All struct data is stored in the data block (rather than fields) to allow for faster loads. Fields provide a mapping of where the data is stored.
- Supports references to mimic pointers to allow complex data structures
- A few fields were added to the Header
- File Version records the version of the file type using the GFF
- Target Platform records the intended platform for the resource
- In general, the format has been simplified and unnecessary sections were removed (label array, field indices array and list indices array)
Overall, the new format should result in smaller files with faster access times.
File Format Conceptual Overview
Field Data Types
There will be a starting list of supported types and new types can be added as needed. The file format can support as many as 65,535 different basic types.
Type | Type ID | Discussion |
---|---|---|
UINT8 | 0 | |
INT8 | 1 | |
UINT16 | 2 | |
INT16 | 3 | |
UINT32 | 4 | |
INT32 | 5 | |
UINT64 | 6 | |
INT64 | 7 | |
FLOAT32 | 8 | |
FLOAT64 | 9 | |
Vector3f | 10 | |
Vector4f | 12 | |
Quaternionf | 13 | |
ECString | 14 | An ECString is always a reference to elsewhere in the raw data (regardless of flags). The string is essentially a list of WCHARs. |
Color4f | 15 | |
Matrix4x4f | 16 | |
TlkString | 17 | A TlkString is not actually a string, but a pair of UInt32 values. One is the index of a string in the TLK string table. |
Strings are stored as list of WCHARs.
There's also a "Generic" type that's only usable in lists (and references?), with type ID 0xFFFF
Field Labels
The Binary GFF uses 4-byte IDs to label each field. Within each struct, each field must have a unique ID. These IDs could be string hashes or numerical IDs. The only requirement is that the reader and writer of these files agree on the IDs. A large list of common IDs can be found by opening Dragon Age\tools\plugins\EditorGff40.dll with any text editor and searching for the string BinaryGFFIDList.h
File Format Physical Layout
Platform dependence
The endianness of the data is that of the target platform. For example, data files for intel processors should be little endian whereas data for power pcs should be big endian.
There may be other differences in the files generated for different platforms in order to achieve proper alignment or the desired in memory layout of the data.
Overall File Layout
Header |
Struct Array |
Field Array |
Raw Data Block |
Header
The header is located at the start of the file and contains the following values.
Value | Description |
---|---|
GFFMagicNumber | All GFF files will start with the hexadecimal value 0x47464620, which is the ASCII value for “GFF “. |
GFFVersion | 4 bytes representing the version of the underlying GFF format. This should be “V4.0” or 0x56342E30 for all files using this format. |
TargetPlatform | 4-byte field indicating the intended target platform for this file.
There will most likely be more specialized targets for the PC in the future. |
FileType | 4-byte field used to identify this file type. By convention it should be the three letter file extension followed by a space. |
FileVersion | 4-byte version of the FileType. By convention it should be “Vx.x” or “xx.x” where X is a digit. |
StructCount | 4-byte unsigned number of elements in the Struct Array. |
Data Offset | 4-byte unsigned offset from the beginning of the file to the Raw Data Block. |
The first five fields are always in big endian and never byteswapped. This keeps those fields human readable on any machine.
Struct Array
A struct is a grouping of data. A struct definition describes which data is in a struct. Many instances of a struct type may occur in a single file but there will only be one definition for each struct type.
The Struct Array starts immediately after the header. The first element in the Struct Array is the Top-Level Struct for the GFF file and it describes what the file looks like at the top level. Since the Top-Level Struct is always present, every GFF file contains at least one element in the struct array.
The Struct Array looks like this:
Struct 0 (Top-Level Struct)* |
Struct 1 |
Struct 2 |
... |
Struct N-1** |
*Struct 0 is always present
**N = Header.StructCount
The GFF Struct contains the values listed in the table below.
Value | Description | |
---|---|---|
StructType | 4-byte programmer defined ID | |
FieldCount | 4-byte number of fields in the struct | |
FieldOffset | 4-byte unsigned offset from the beginning of the file to the first field in the struct | |
StructSize | 4-byte unsigned size of the chunk of data representing the struct |
All the fields for a struct are contiguous so knowing the address of the first one and the number of fields is enough information to access any element in the struct.
Field Array
The Field Array starts immediately after the Struct Array. Each field entry describes a piece of data contained in a struct. Each struct’s fields must be contiguous in the array and appear in increasing order of their labels. The fields for the Top Level Struct appear first in the array.
The Field Array looks like this:
Struct 0 field 0 |
Struct 0 field 1 |
Struct 0 … |
Struct 0 field Struct.FieldCount — 1 |
Struct x field 0 |
Struct x field 1 |
Struct x … |
Struct x field Struct.FieldCount — 1 |
... |
Each field looks like this:
Value | Description |
---|---|
Label | 4-byte label used to look up the field |
FieldType | 4-byte describing the type of the field (see below for explanation) |
Index | 4-byte unsigned offset to the location of the data |
The label is just a 4-byte value used to find the correct field. They could be string hashes or some other numerical ID.
The index field stores the location of the data as an offset from the beginning of the struct in the data block. This can result in padding within the structs which can be garbage data for all we care (although by convention we usually start with 0xFF). In particular this happens when trying to maintain alignment for data types such as 16 byte alignment for vectors, etc.
The type is broken up into two 2-byte values that describe the type of the field.
The type looks like this:
Value | Description |
---|---|
TypeID | A 2-byte unsigned number indicating the type |
Flags | 2-bytes of bit flags |
The following flags are currently defined starting from the most significant bits.
Bit | Flag Description |
---|---|
1 (MSB) | List Flag. If this flag is set then this type is a list of the described type. |
2 | Struct flag. If this flag is set then this type is a struct.
If the struct flag is not set then the BaseType indicates the type of the field by the integer id of that type. If the struct flag is set then the BaseType indicates the index of this struct’s description in the Struct Array. |
3 | Reference flag
If the reference flag is set then the data stored in the data block is actually an offset from the beginning of the data block to the location of the actual data. References can be used to mimic pointers in a GFF. |
Raw Data Block
The data block is where the actual data is stored. The data for the top level struct is stored at the beginning of the block. All other data will be fields in the struct or accessed by reference.
Lists
The address pointed to by a list is actually a reference to another location in the file which stores the list. The first thing in the list is a 4-byte unsigned length of the list followed by the elements.
This is what the list looks like:
Length |
Element 0 |
Element 1 |
... |
Element Length - 1 |
Empty lists can just store a null reference to prevent creating another block.
Generic lists (the lists with FieldType set to 0xFFFF) store pairs: type, reference where type defines FieldType (with flags) of the individual element and reference is a standard Reference pointing to data of the element. Therefore, each entry in a generic list is 8 bytes long.
References
A reference is a 4-byte unsigned offset from the beginning of the data block to the location of the data.
Null references are stored as 0xFFFFFFFF. Null references can be used in lists, as well as individual reference items.
Improvements and optimizations
Field look-up
Field look-up is faster in this version of the GFF because it requires an integer comparison instead of a string comparison. Additionally, storing fields in order allows efficient search for the specific field. Finally, knowing what the data structure should look like allows the program to make a good guess as to where the data should be. If the guess is wrong (because the file is an older version) then the program only has to start searching at the initial guess, which is still a faster search than without a good guess.
Binding tables
It is possible to build a binding table for a type of file that will be read in often. The binding table will allow data to be loaded from the GFF without parsing through the file each time.
Direct loads to memory
Using this format it is possible to optimize specific file types by writing the GFF in exactly the way it will be written in memory. After verifying that the file type is up to date, the game can read the data block into memory and cast the pointer it directly to the in C data type. This optimization would not result in any loss of generality since the GFF will still be accessible in the usual way.