Bithenge Syntax
This page describes the syntax and semantics of Bithenge scripts.
Simple example
transform item(scale) = struct { .name_len <- uint8; .name <- ascii <- known_length(.name_len); .value <- (in * scale) <- uint32le; }; transform main = repeat {item(100)};
More complicated examples can be found in uspace/dist/src/bithenge.
Tokenization
Bithenge scripts consist of ASCII text. When a script is read, it is first
divided into pieces known as tokens. As in most programming languages, the
longest possible token is read, so ==
is always a single token rather than
two. Whitespace (space, form‐feed, newline, carriage return, horizontal tab,
and vertical tab) and comments (#
until newline) are allowed between tokens;
both LF (Unix) and CRLF (DOS) newline conventions can be used.
There are several types of tokens:
- Symbols
&& ++ == >= // <- <= != || < > = + - * % ; : { } ( ) .
- Keywords
do else false if in partial repeat struct switch transform true while
- Identifiers
- Any sequence of letters, digits, and underscores, beginning with a letter, that is not a keyword.
- Integer literals
- A sequence of decimal digits; negative literals are not yet possible.
Keywords and identifiers are case‐sensitive. Tokens can be at most 256 characters long.
Scripts
- script → *definition
- definition → "
transform
" identifier [parameters] "=
" transform ";
" - parameters → "
(
" [identifier *(",
" identifier)] ")
"
A Bithenge script consists of a series of transform definitions. It must define
a transform named main
, which is applied when the script is used. Transforms
may be defined with parameters, in which case corresponding arguments must be
provided when the transform is invoked.
Examples:
transform main = uint8;
defines an extremely simple main transform.transform is_odd(val) = (val % 2 == 1);
defines a transform that checks whether its argument is odd.
Transform invocation
- transform → identifier [arguments]
- arguments → "
(
" [ expression *(",
" expression) ] ")
"
A built‐in or previously defined transform can be invoked, with arguments if necessary.
Examples:
is_odd(17)
is a transform that appliesis_odd
with 17 as the argument.
Transform composition
- transform → transform "
<-
" transform
A transform can be created by composing two existing transforms. It will work by first applying the right transform to the input, then applying the left transform to the result.
Examples:
ascii <- zero_terminated
decodes a zero‐terminated ASCII string.repeat{uint32le} <- known_length(8)
decodes two 32‐bit integers.
Expression transforms
- transform → "
(
" expression ")
"
Expression transforms produce their output by calculating an expression. If
"in
" is used in the expression, it refers to the expression transform’s
input; otherwise, the input must be an empty blob. In either case, the
parentheses are mandatory.
Examples:
(in + 1)
adds 1 to its input.(.bytes_per_sector * .sectors_per_cluster)
calculates the number of bytes per cluster.
Struct transforms
- transform → "
struct
" "{
" *struct-field "}
" - struct-field → [ "
.
" identifier ] "<-
" transform ";
"
A struct transform applies each of its subtransforms in sequence to the input blob. The result is an internal node, with the specified key used for the result of each subtransform. If a subtransform has no corresponding key, its result must be an internal node; the result’s keys will be merged into the struct transform’s result.
It must be possible to determine the size of the subblob each subtransform
consumes. Bithenge currently only handles the simple cases: it knows uint32le
needs 4 bytes, but not that known_length(2) <- repeat{uint32le}
needs 8
bytes.
Example:
struct { .min <- uint32le; .max <- uint32le; .mid <- ((.min + .max) // 2); <- repeat(3) {uint8}; }
One possible output is {"min": 0, "max": 8, "mid": 4, 0: 127, 1: 126, 2: 125}
.
Partial transforms
- transform → "
partial
" [ "(
" expression ")
" ] "{
" transform "}
"
A partial transform applies its subtransform to a prefix of a blob. An expression can be given to specify the offset within the blob at which the subtransform is applied.
Examples:
partial {uint8}
decodes the first byte of the arbitrary‐length input as an integer.partial(1) {uint8}
decodes the second byte of the arbitrary‐length input as an integer.
Conditional transforms
- transform → "
if
" "(
" expression ")
" "{
" transform "}
" "else
" "{
" transform "}
" - struct-field → "
if
" "(
" expression ")
" "{
" *struct-field "}
" [ "else
" "{
" *struct-field "}
" ]
An if transform applies its first subtransform if the expression evaluates to
true, and the second subtransform if the expression evaluates to false. A
second form can be used in struct transforms:
struct { ... if (...) {...} ... }
is equivalent to
struct { ... <- if (...) { struct { ... } } else { struct { } }; ... }
.
- transform → "
switch
" "(
" expression ")
" "{
" *( (expression/"else
") ":
" transform ";
" ) "}
" - struct-field → "
switch
" "(
" expression ")
" "{
" *( (expression/"else
") ":
" "{
" *struct-field "}
" ";
" ) "}
"
A switch transform compares the first expression to each of the case
expressions in turn, stopping when they compare equal and using that
subtransform. The last case can be "else
", which will always be used if none
of the previous cases matched. The second form can be used in struct
transforms, much like the second form of "if
".
Examples:
if (little_endian) { uint32le } else { uint32be }
decodes a little‐ or big‐endian integer, depending onlittle_endian
.struct { switch (.type) { 0: {.x <- uint8; .y <- uint8;}; 1: {.val <- uint16;}; } }
decodes two 8‐bit integers or one 16‐bit integer depending on the value of.type
.
Repetition transforms
- transform → "
repeat
" [ "(
" expression ")
" ] "{
" transform "}
"
A repeat transform applies its subtransform repeatedly to sequential subblobs
of the input. The output is an internal node with a member for each repetition,
with keys starting at 0
. If an expression is given, it determines the number
of times to repeat; otherwise, the subtransform is applied until it fails or
there is no input remaining.
- transform → "
do
" "{
" transform "}
" "while
" "(
" expression ")
"
A do/while transform applies its subtransform repeatedly until the expression
evaluates to false
. The output has the same format as repeat transform
output. Scope member expressions should be used in the expression to access the
output of the subtransform and determine when to stop.
Examples:
repeat{uint8}
decodes each byte as an integer.repeat(256) {uint32le}
decodes an array of 256 integers.do { struct { .type <- uint8; .value <- uint32le; } } while (.type != 0)
decodes types and values, stopping after a type of 0 is seen.
Expressions
- expression → "
true
" - expression → "
false
" - expression → integer
Expressions can be boolean or integer literals.
- expression → identifier
Expressions can use parameters.
- expression → "
in
"
In an expression transform, "in
" refers to the transform’s input. Otherwise,
it refers to the input of the whole transform in the definition.
- expression → "
.
" identifier
A scope member expression looks in each containing struct transform for a member with the given key. It starts from the innermost struct transform and searches as far as the whole transform being defined.
- expression → expression "
.
" identifier - expression → expression "
[
" expression "]
"
An expression can look for a member of another expression with a given key.
- expression → expression "
[
" expression ":
" "]
" - expression → expression "
[
" expression ":
" expression "]
" - expression → expression "
[
" expression ",
" expression "]
"
An expression can create a subblob of another expression. The first expression specifies the blob and the second expression specifies the start offset within the blob. If a comma is used, the third expression specifies the length of the subblob; otherwise it specifies the end offset. If no third expression is given, the subblob extends to the end of the blob.
- expression → "
(
" expression ")
" - expression → expression operator expression
Expressions can, of course, use binary operators. The operators and their precedences are as follows:
Operator | Precedence | Associativity | Description |
---|---|---|---|
". " | 5 | Left‐to‐right | Member (see above) |
"[] " | 5 | Left‐to‐right | Member or subblob (see above) |
"* " | 4 | Left‐to‐right | Integer multiplication |
"// " | 4 | Left‐to‐right | Floored/euclidean integer division; divisor must be positive |
"% " | 4 | Left‐to‐right | Floored/euclidean modulo operation; divisor must be positive |
"+ " | 3 | Left‐to‐right | Integer addition |
"- " | 3 | Left‐to‐right | Integer subtraction |
"++ " | 3 | Left‐to‐right | Blob concatenation |
"< " | 2 | Left‐to‐right | Integer less‐than |
"<= " | 2 | Left‐to‐right | Integer less‐than‐or‐equal |
"> " | 2 | Left‐to‐right | Integer greater‐than |
">= " | 2 | Left‐to‐right | Integer greater‐than‐or‐equal |
"== " | 1 | Left‐to‐right | Equal‐to (not supported for internal nodes) |
"!= " | 1 | Left‐to‐right | Unequal‐to (not supported for internal nodes) |
"&& " | 0 | Left‐to‐right | Logical and |
"|| " | 0 | Left‐to‐right | Logical or |
Built‐in transforms
These transforms are implemented in C and included with Bithenge. Note that precise names are preferred; scripts can define shorter aliases if necessary.
name | input | output | description | example |
---|---|---|---|---|
ascii | byte blob node | string | decodes some bytes as ASCII characters | hex:6869 becomes "hi"
|
bit | 1‐bit blob node | boolean | decodes a single bit | 1 becomes true
|
bits_be | byte blob node | bit blob node | decodes bytes as bits, starting with the most‐significant bit | hex:0f becomes bit:00001111
|
bits_le | byte blob node | bit blob node | decodes bytes as bits, starting with the least‐significant bit | hex:0f becomes bit:11110000
|
known_length(len) | blob node | blob node | requires the input to have a known length | |
nonzero_boolean | integer | boolean | decodes a boolean where nonzero values are true | 0 becomes false
|
uint8 | 1‐byte blob node | integer node | decodes a 1‐byte unsigned integer | hex:11 becomes 17
|
uint16be | 2‐byte blob node | integer node | decodes a 2‐byte big‐endian unsigned integer | hex:0201 becomes 513
|
uint16le | 2‐byte blob node | integer node | decodes a 2‐byte little‐endian unsigned integer | hex:0101 becomes 257
|
uint32be | 4‐byte blob node | integer node | decodes a 4‐byte big‐endian unsigned integer | hex:00000201 becomes 513
|
uint32le | 4‐byte blob node | integer node | decodes a 4‐byte little‐endian unsigned integer | hex:01010000 becomes 257
|
uint64be | 8‐byte blob node | integer node | decodes a 8‐byte big‐endian unsigned integer | hex:0000000000000201 becomes 513
|
uint64le | 8‐byte blob node | integer node | decodes a 8‐byte little‐endian unsigned integer | hex:0101000000000000 becomes 257
|
uint_be(len) | bit blob node | integer node | decodes bits as an unsigned integer, starting with the most‐significant bit | |
uint_le(len) | bit blob node | integer node | decodes bits as an unsigned integer, starting with the least‐significant bit | |
zero_terminated | byte blob node | byte blob node | takes bytes up until the first 00 | hex:7f0400 becomes hex:7f04
|