Version 6 (modified by 13 years ago) ( diff ) | ,
---|
Structured Binary Data
This page will document my thoughts and design ideas for the structured binary data project. The project aims to address #317; a description of my overall approach can be found on the GSoC project page.
Requirements
- View on different levels; for instance, view the integer and sequence of bytes comprising a string if necessary.
- Check whether files are consistent.
- Handle broken files.
- Don’t try to read the whole file at once.
- Allow full modifications. Ideally, allow creation of a whole filesystem from scratch.
Existing Tools
I am researching existing tools related to my project, so they can be used for inspiration.
Construct
A Python library for creating declarative structure definitions. Each instance
of the Construct
class has a name, and knows how to read from a stream, write
to a stream, and determine its length. Some predefined Construct
subclasses
use an arbitrary Python function evaluated at runtime, or behave differently
depending on whether sub‐Construct
s throw exceptions. Const
uses a
sub‐Construct
and makes sure the value is correct. Also has lazy
Construct
s.
Unfortunately, if you change the size of a structure, you still have to change everything else manually.
TODO look at issues and forks.
BinData
Makes good use of Ruby syntax; mostly has the same features as Construct.
Imperative DSLs
DSLs in this category are used in an obvious, deterministic manner, and complex
structures can’t be edited. They are simple imperative languages in which
fields, structures, bitstructures, and arrays can be defined. The length,
decoded value, and presence of fields can be determined by expressions using
any previously decoded field, and structures can use
if
/while
/continue
/break
and similar statements. Structures can inherit
from other structures, meaning that the parent’s fields are present at the
beginning of the child. Statements can move to a different offset in the input
data. There may be a real programming language that can be used along with the
DSL.
- PyFFI
- Lets you create or modify files instead of just reading them. Fields can refer to blocks of data elsewhere in the file. Uses an XML format.
- Synalize It!
- Not completely imperative; if you declare optional structs where part of the data is constant, the correct struct will be displayed. Has a Graphviz export of file structure. Uses an XML format.
- Other free
- Wireshark Generic Dissector.
- Other proprietary
- Hex Editor Neo.
Less interesting tools
- Simple formats in hex editors
- These support static fields and dynamic lengths only: FlexHex, HexEdit, Hex Workshop, and Okteta.
- Simple formats elsewhere
- ffe, Node Packet, and Scapy can only handle trivial structures. Python’s struct and VStruct use concise string formats to describe simple structures. Hachoir uses Python for most things.
- Protocol definition formats
- ASN.1, MIDL, Piqi, and other IPC implementations go in the other direction: they generate a binary format from a text description of a structure. ASN.1 in particular has many features.
- Wireshark and tcpdump
- As the Construct wiki notes, you would expect these developers to have some sort of DSL, but they just use C for everything. Wireshark does use ASN.1, Diameter, and MIDL for protocols developed with them.