| 1 | = Bithenge = |
| 2 | |
| 3 | [[PageOutline(2-3)]] |
| 4 | |
| 5 | Bithenge consists of a library and tools that make working with structured |
| 6 | binary data easier. The project was started as part of |
| 7 | [wiki:GSOC Google Summer of Code 2012] to address #317. The code is at |
| 8 | [https://code.launchpad.net/~wtachi/helenos/bithenge lp:~wtachi/helenos/bithenge] |
| 9 | and periodic updates are posted to |
| 10 | [http://lists.modry.cz/cgi-bin/listinfo/helenos-devel HelenOS-devel]. |
| 11 | |
| 12 | Bithenge is intended to be a general‐purpose library with a variety of uses, |
| 13 | but only one tool has been implemented so far. It is a program that reads a |
| 14 | script file describing a binary format, applies it to binary data, and prints |
| 15 | the decoded result. |
| 16 | |
| 17 | == Quick start == |
| 18 | |
| 19 | The Bithenge source code is in `uspace/app/bithenge` and `uspace/lib/bithenge` |
| 20 | in the bithenge branch. It is automatically built in HelenOS to create the |
| 21 | `bithenge` application. For Linux, it can be built by running |
| 22 | `make -f Makefile.linux`, first in `uspace/lib/bithenge` and then in |
| 23 | `uspace/app/bithenge`. |
| 24 | |
| 25 | The program can be run with `bithenge <script file> <binary file>` to decode a |
| 26 | file. There are some example files in |
| 27 | [https://bazaar.launchpad.net/~wtachi/helenos/bithenge/files/head:/uspace/dist/src/bithenge uspace/dist/src/bithenge]; |
| 28 | see the [Bithenge/Syntax syntax documentation]. |
| 29 | |
| 30 | == Overview == |
| 31 | |
| 32 | === Trees === |
| 33 | |
| 34 | Bithenge represents all data in the form of a data structure called a “tree,” |
| 35 | similar to the data structure represented by JSON. A tree consists of a boolean |
| 36 | node, integer node, string node, or blob node, or an internal node with |
| 37 | children. A boolean node holds a boolean value, an integer node holds a signed |
| 38 | integer, and a string holds a Unicode string. |
| 39 | |
| 40 | A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes |
| 41 | are polymorphic, allowing any source of raw binary data to be used. Bithenge |
| 42 | includes blob node implementations for in‐memory buffers, files, and block |
| 43 | devices. An implementation has also been written that reads another task’s |
| 44 | virtual memory, but it hasn’t been committed because it’s unclear whether it |
| 45 | will be useful in its current form. |
| 46 | |
| 47 | An internal node has an arbitrary number of children, each with a unique key. |
| 48 | The key can be any node other than an internal node. Arrays can be represented |
| 49 | by internal nodes with integer keys starting at 0. The tree node can provide |
| 50 | children in an arbitrary order; the order will be used when displaying the |
| 51 | tree, but should have no semantic significance. Internal nodes are polymorphic |
| 52 | and can delay creation of child nodes until necessary, so keeping the whole |
| 53 | tree in memory can be avoided. |
| 54 | |
| 55 | Internal nodes are currently responsible for freeing their own children. In the |
| 56 | future, it should be possible for there to be multiple references to the same |
| 57 | node, but it isn’t clear whether this should be implemented with symbolic |
| 58 | links, an acyclic graph with reference counting, or a full graph. |
| 59 | |
| 60 | Note that all interpreted data is represented in Bithenge with nodes. |
| 61 | Therefore, the word “blob” usually refers to a blob node, and so on. |
| 62 | |
| 63 | A decoded tree for a FAT filesystem might look something like this: |
| 64 | {{{ |
| 65 | ○───bits─▶16 |
| 66 | │ |
| 67 | ├───fat──▶○ |
| 68 | │ ├───0───▶0xfff0 |
| 69 | │ ├───1───▶0xffff |
| 70 | │ └───2───▶0x0000 |
| 71 | │ |
| 72 | └───root──▶○ |
| 73 | ├───0───▶○ |
| 74 | │ ├───name───▶README.TXT |
| 75 | │ └───size───▶0x1351 |
| 76 | │ |
| 77 | └───1───▶○ |
| 78 | ├───name───▶KERNEL.ELF |
| 79 | └───size───▶0x38e9a2 |
| 80 | }}} |
| 81 | |
| 82 | === Transforms === |
| 83 | |
| 84 | A transform is a function from an input tree and zero or more argument trees to |
| 85 | an output tree. One example is `uint32le`, which takes a 4‐byte blob node as |
| 86 | the input tree and provides an integer node as the output tree. Another example |
| 87 | is `uint_le(len)`, which takes a bit blob node as the input tree and an integer |
| 88 | node as an argument tree, and produces an integer node as the output tree. |
| 89 | |
| 90 | Several transforms, including `uint32le` and `uint_le(len)`, are built in to |
| 91 | Bithenge. More complicated transforms can be created by a format provider, as |
| 92 | explained below. For instance, the |
| 93 | [https://bazaar.launchpad.net/~wtachi/helenos/bithenge/view/head:/uspace/dist/src/bithenge/fat.bh fat.bh] |
| 94 | file contains a `fat_filesystem` transform that takes a byte node as the input |
| 95 | tree and produces a complex output tree with various decoded information about |
| 96 | the filesystem. |
| 97 | |
| 98 | === Data providers === |
| 99 | |
| 100 | Generally speaking, a data provider is anything that provides data from outside |
| 101 | Bithenge in the form of a tree. The Bithenge library includes functions for |
| 102 | various data providers, and can automatically create a tree from a prefixed |
| 103 | string: |
| 104 | |
| 105 | ||= Prefix =||= Tree Type =||= Example =||= Description =|| |
| 106 | || `file:` || Byte blob node || `file:/textdemo` || Read the contents of a file. This is the default if no prefix is used. || |
| 107 | || `block:` || Byte blob node || `block:bd/initrd` || Read the contents of a block device. (HelenOS only.) || |
| 108 | || `hex:` || Byte blob node || `hex:01000000` || Use a string of hexadecimal characters to create a blob node. || |
| 109 | |
| 110 | These prefixes can also be used with the `bithenge` program. |
| 111 | |
| 112 | === Format providers === |
| 113 | |
| 114 | A format provider creates transforms based on external data, or vice versa. |
| 115 | The only implemented format provider is the script parser, which creates a |
| 116 | transform from a file written in the |
| 117 | [Bithenge/Syntax Bithenge scripting language]. A future idea is the DWARF |
| 118 | format provider, which would read DWARF information from an executable and |
| 119 | create a transform that could be applied to a task memory dump. |
| 120 | |
| 121 | === Library === |
| 122 | |
| 123 | The Bithenge library includes APIs for working with trees and creating and |
| 124 | applying transforms. It also contains the built‐in transforms, data providers, |
| 125 | and format providers. The [Bithenge/Library library overview] describes the API |
| 126 | and conventions used; detailed documentation is included in Doxygen comments in |
| 127 | the source. |
| 128 | |
| 129 | === Clients === |
| 130 | |
| 131 | A client is any program that provides a user interface to the Bithenge library. |
| 132 | The only implemented client is the simple `bithenge` program described above. |
| 133 | In the future, there could be an interactive browser for decoded trees, or a |
| 134 | program to generate C code based on a script file. |
| 135 | |
| 136 | == More information == |
| 137 | |
| 138 | * Bithenge/Syntax: the syntax of Bithenge scripts. |
| 139 | * Bithenge/Library: the library API, conventions, and internals. |
| 140 | * Bithenge/Design: design decisions, limitations, ideas, and inspirations. |