Changes between Initial Version and Version 1 of Bithenge


Ignore:
Timestamp:
2012-08-20T06:34:18Z (12 years ago)
Author:
Sean Bartell
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Bithenge

    v1 v1  
     1= Bithenge =
     2
     3[[PageOutline(2-3)]]
     4
     5Bithenge consists of a library and tools that make working with structured
     6binary data easier. The project was started as part of
     7[wiki:GSOC Google Summer of Code 2012] to address #317. The code is at
     8[https://code.launchpad.net/~wtachi/helenos/bithenge lp:~wtachi/helenos/bithenge]
     9and periodic updates are posted to
     10[http://lists.modry.cz/cgi-bin/listinfo/helenos-devel HelenOS-devel].
     11
     12Bithenge is intended to be a general‐purpose library with a variety of uses,
     13but only one tool has been implemented so far. It is a program that reads a
     14script file describing a binary format, applies it to binary data, and prints
     15the decoded result.
     16
     17== Quick start ==
     18
     19The Bithenge source code is in `uspace/app/bithenge` and `uspace/lib/bithenge`
     20in the bithenge branch. It is automatically built in HelenOS to create the
     21`bithenge` application. For Linux, it can be built by running
     22`make -f Makefile.linux`, first in `uspace/lib/bithenge` and then in
     23`uspace/app/bithenge`.
     24
     25The program can be run with `bithenge <script file> <binary file>` to decode a
     26file. There are some example files in
     27[https://bazaar.launchpad.net/~wtachi/helenos/bithenge/files/head:/uspace/dist/src/bithenge uspace/dist/src/bithenge];
     28see the [Bithenge/Syntax syntax documentation].
     29
     30== Overview ==
     31
     32=== Trees ===
     33
     34Bithenge represents all data in the form of a data structure called a “tree,”
     35similar to the data structure represented by JSON. A tree consists of a boolean
     36node, integer node, string node, or blob node, or an internal node with
     37children. A boolean node holds a boolean value, an integer node holds a signed
     38integer, and a string holds a Unicode string.
     39
     40A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes
     41are polymorphic, allowing any source of raw binary data to be used. Bithenge
     42includes blob node implementations for in‐memory buffers, files, and block
     43devices. An implementation has also been written that reads another task’s
     44virtual memory, but it hasn’t been committed because it’s unclear whether it
     45will be useful in its current form.
     46
     47An internal node has an arbitrary number of children, each with a unique key.
     48The key can be any node other than an internal node. Arrays can be represented
     49by internal nodes with integer keys starting at 0. The tree node can provide
     50children in an arbitrary order; the order will be used when displaying the
     51tree, but should have no semantic significance. Internal nodes are polymorphic
     52and can delay creation of child nodes until necessary, so keeping the whole
     53tree in memory can be avoided.
     54
     55Internal nodes are currently responsible for freeing their own children. In the
     56future, it should be possible for there to be multiple references to the same
     57node, but it isn’t clear whether this should be implemented with symbolic
     58links, an acyclic graph with reference counting, or a full graph.
     59
     60Note that all interpreted data is represented in Bithenge with nodes.
     61Therefore, the word “blob” usually refers to a blob node, and so on.
     62
     63A decoded tree for a FAT filesystem might look something like this:
     64{{{
     65○───bits─▶16
     66
     67├───fat──▶○
     68│         ├───0───▶0xfff0
     69│         ├───1───▶0xffff
     70│         └───2───▶0x0000
     71
     72└───root──▶○
     73           ├───0───▶○
     74           │        ├───name───▶README.TXT
     75           │        └───size───▶0x1351
     76           │
     77           └───1───▶○
     78                    ├───name───▶KERNEL.ELF
     79                    └───size───▶0x38e9a2
     80}}}
     81
     82=== Transforms ===
     83
     84A transform is a function from an input tree and zero or more argument trees to
     85an output tree. One example is `uint32le`, which takes a 4‐byte blob node as
     86the input tree and provides an integer node as the output tree. Another example
     87is `uint_le(len)`, which takes a bit blob node as the input tree and an integer
     88node as an argument tree, and produces an integer node as the output tree.
     89
     90Several transforms, including `uint32le` and `uint_le(len)`, are built in to
     91Bithenge. More complicated transforms can be created by a format provider, as
     92explained below. For instance, the
     93[https://bazaar.launchpad.net/~wtachi/helenos/bithenge/view/head:/uspace/dist/src/bithenge/fat.bh fat.bh]
     94file contains a `fat_filesystem` transform that takes a byte node as the input
     95tree and produces a complex output tree with various decoded information about
     96the filesystem.
     97
     98=== Data providers ===
     99
     100Generally speaking, a data provider is anything that provides data from outside
     101Bithenge in the form of a tree. The Bithenge library includes functions for
     102various data providers, and can automatically create a tree from a prefixed
     103string:
     104
     105||= Prefix =||= Tree Type =||= Example =||= Description =||
     106|| `file:`  || Byte blob node || `file:/textdemo` || Read the contents of a file. This is the default if no prefix is used. ||
     107|| `block:` || Byte blob node || `block:bd/initrd` || Read the contents of a block device. (HelenOS only.) ||
     108|| `hex:`   || Byte blob node || `hex:01000000` || Use a string of hexadecimal characters to create a blob node. ||
     109
     110These prefixes can also be used with the `bithenge` program.
     111
     112=== Format providers ===
     113
     114A format provider creates transforms based on external data, or vice versa.
     115The only implemented format provider is the script parser, which creates a
     116transform from a file written in the
     117[Bithenge/Syntax Bithenge scripting language]. A future idea is the DWARF
     118format provider, which would read DWARF information from an executable and
     119create a transform that could be applied to a task memory dump.
     120
     121=== Library ===
     122
     123The Bithenge library includes APIs for working with trees and creating and
     124applying transforms. It also contains the built‐in transforms, data providers,
     125and format providers. The [Bithenge/Library library overview] describes the API
     126and conventions used; detailed documentation is included in Doxygen comments in
     127the source.
     128
     129=== Clients ===
     130
     131A client is any program that provides a user interface to the Bithenge library.
     132The only implemented client is the simple `bithenge` program described above.
     133In the future, there could be an interactive browser for decoded trees, or a
     134program to generate C code based on a script file.
     135
     136== More information ==
     137
     138* Bithenge/Syntax: the syntax of Bithenge scripts.
     139* Bithenge/Library: the library API, conventions, and internals.
     140* Bithenge/Design: design decisions, limitations, ideas, and inspirations.