| | 1 | = Bithenge = |
| | 2 | |
| | 3 | [[PageOutline(2-3)]] |
| | 4 | |
| | 5 | Bithenge consists of a library and tools that make working with structured |
| | 6 | binary data easier. The project was started as part of |
| | 7 | [wiki:GSOC Google Summer of Code 2012] to address #317. The code is at |
| | 8 | [https://code.launchpad.net/~wtachi/helenos/bithenge lp:~wtachi/helenos/bithenge] |
| | 9 | and periodic updates are posted to |
| | 10 | [http://lists.modry.cz/cgi-bin/listinfo/helenos-devel HelenOS-devel]. |
| | 11 | |
| | 12 | Bithenge is intended to be a general‐purpose library with a variety of uses, |
| | 13 | but only one tool has been implemented so far. It is a program that reads a |
| | 14 | script file describing a binary format, applies it to binary data, and prints |
| | 15 | the decoded result. |
| | 16 | |
| | 17 | == Quick start == |
| | 18 | |
| | 19 | The Bithenge source code is in `uspace/app/bithenge` and `uspace/lib/bithenge` |
| | 20 | in the bithenge branch. It is automatically built in HelenOS to create the |
| | 21 | `bithenge` application. For Linux, it can be built by running |
| | 22 | `make -f Makefile.linux`, first in `uspace/lib/bithenge` and then in |
| | 23 | `uspace/app/bithenge`. |
| | 24 | |
| | 25 | The program can be run with `bithenge <script file> <binary file>` to decode a |
| | 26 | file. There are some example files in |
| | 27 | [https://bazaar.launchpad.net/~wtachi/helenos/bithenge/files/head:/uspace/dist/src/bithenge uspace/dist/src/bithenge]; |
| | 28 | see the [Bithenge/Syntax syntax documentation]. |
| | 29 | |
| | 30 | == Overview == |
| | 31 | |
| | 32 | === Trees === |
| | 33 | |
| | 34 | Bithenge represents all data in the form of a data structure called a “tree,” |
| | 35 | similar to the data structure represented by JSON. A tree consists of a boolean |
| | 36 | node, integer node, string node, or blob node, or an internal node with |
| | 37 | children. A boolean node holds a boolean value, an integer node holds a signed |
| | 38 | integer, and a string holds a Unicode string. |
| | 39 | |
| | 40 | A blob node represents an arbitrary sequence of raw bytes or bits. Blob nodes |
| | 41 | are polymorphic, allowing any source of raw binary data to be used. Bithenge |
| | 42 | includes blob node implementations for in‐memory buffers, files, and block |
| | 43 | devices. An implementation has also been written that reads another task’s |
| | 44 | virtual memory, but it hasn’t been committed because it’s unclear whether it |
| | 45 | will be useful in its current form. |
| | 46 | |
| | 47 | An internal node has an arbitrary number of children, each with a unique key. |
| | 48 | The key can be any node other than an internal node. Arrays can be represented |
| | 49 | by internal nodes with integer keys starting at 0. The tree node can provide |
| | 50 | children in an arbitrary order; the order will be used when displaying the |
| | 51 | tree, but should have no semantic significance. Internal nodes are polymorphic |
| | 52 | and can delay creation of child nodes until necessary, so keeping the whole |
| | 53 | tree in memory can be avoided. |
| | 54 | |
| | 55 | Internal nodes are currently responsible for freeing their own children. In the |
| | 56 | future, it should be possible for there to be multiple references to the same |
| | 57 | node, but it isn’t clear whether this should be implemented with symbolic |
| | 58 | links, an acyclic graph with reference counting, or a full graph. |
| | 59 | |
| | 60 | Note that all interpreted data is represented in Bithenge with nodes. |
| | 61 | Therefore, the word “blob” usually refers to a blob node, and so on. |
| | 62 | |
| | 63 | A decoded tree for a FAT filesystem might look something like this: |
| | 64 | {{{ |
| | 65 | ○───bits─▶16 |
| | 66 | │ |
| | 67 | ├───fat──▶○ |
| | 68 | │ ├───0───▶0xfff0 |
| | 69 | │ ├───1───▶0xffff |
| | 70 | │ └───2───▶0x0000 |
| | 71 | │ |
| | 72 | └───root──▶○ |
| | 73 | ├───0───▶○ |
| | 74 | │ ├───name───▶README.TXT |
| | 75 | │ └───size───▶0x1351 |
| | 76 | │ |
| | 77 | └───1───▶○ |
| | 78 | ├───name───▶KERNEL.ELF |
| | 79 | └───size───▶0x38e9a2 |
| | 80 | }}} |
| | 81 | |
| | 82 | === Transforms === |
| | 83 | |
| | 84 | A transform is a function from an input tree and zero or more argument trees to |
| | 85 | an output tree. One example is `uint32le`, which takes a 4‐byte blob node as |
| | 86 | the input tree and provides an integer node as the output tree. Another example |
| | 87 | is `uint_le(len)`, which takes a bit blob node as the input tree and an integer |
| | 88 | node as an argument tree, and produces an integer node as the output tree. |
| | 89 | |
| | 90 | Several transforms, including `uint32le` and `uint_le(len)`, are built in to |
| | 91 | Bithenge. More complicated transforms can be created by a format provider, as |
| | 92 | explained below. For instance, the |
| | 93 | [https://bazaar.launchpad.net/~wtachi/helenos/bithenge/view/head:/uspace/dist/src/bithenge/fat.bh fat.bh] |
| | 94 | file contains a `fat_filesystem` transform that takes a byte node as the input |
| | 95 | tree and produces a complex output tree with various decoded information about |
| | 96 | the filesystem. |
| | 97 | |
| | 98 | === Data providers === |
| | 99 | |
| | 100 | Generally speaking, a data provider is anything that provides data from outside |
| | 101 | Bithenge in the form of a tree. The Bithenge library includes functions for |
| | 102 | various data providers, and can automatically create a tree from a prefixed |
| | 103 | string: |
| | 104 | |
| | 105 | ||= Prefix =||= Tree Type =||= Example =||= Description =|| |
| | 106 | || `file:` || Byte blob node || `file:/textdemo` || Read the contents of a file. This is the default if no prefix is used. || |
| | 107 | || `block:` || Byte blob node || `block:bd/initrd` || Read the contents of a block device. (HelenOS only.) || |
| | 108 | || `hex:` || Byte blob node || `hex:01000000` || Use a string of hexadecimal characters to create a blob node. || |
| | 109 | |
| | 110 | These prefixes can also be used with the `bithenge` program. |
| | 111 | |
| | 112 | === Format providers === |
| | 113 | |
| | 114 | A format provider creates transforms based on external data, or vice versa. |
| | 115 | The only implemented format provider is the script parser, which creates a |
| | 116 | transform from a file written in the |
| | 117 | [Bithenge/Syntax Bithenge scripting language]. A future idea is the DWARF |
| | 118 | format provider, which would read DWARF information from an executable and |
| | 119 | create a transform that could be applied to a task memory dump. |
| | 120 | |
| | 121 | === Library === |
| | 122 | |
| | 123 | The Bithenge library includes APIs for working with trees and creating and |
| | 124 | applying transforms. It also contains the built‐in transforms, data providers, |
| | 125 | and format providers. The [Bithenge/Library library overview] describes the API |
| | 126 | and conventions used; detailed documentation is included in Doxygen comments in |
| | 127 | the source. |
| | 128 | |
| | 129 | === Clients === |
| | 130 | |
| | 131 | A client is any program that provides a user interface to the Bithenge library. |
| | 132 | The only implemented client is the simple `bithenge` program described above. |
| | 133 | In the future, there could be an interactive browser for decoded trees, or a |
| | 134 | program to generate C code based on a script file. |
| | 135 | |
| | 136 | == More information == |
| | 137 | |
| | 138 | * Bithenge/Syntax: the syntax of Bithenge scripts. |
| | 139 | * Bithenge/Library: the library API, conventions, and internals. |
| | 140 | * Bithenge/Design: design decisions, limitations, ideas, and inspirations. |