Context Navigation

Bithenge Syntax

Simple example
Tokenization
Scripts
Transform invocation
Transform composition
Expression transforms
Struct transforms
Partial transforms
Conditional transforms
Repetition transforms
Expressions
Built‐in transforms

This page describes the syntax and semantics of Bithenge scripts.

Simple example

transform item(scale) = struct {
    .name_len <- uint8;
    .name <- ascii <- known_length(.name_len);
    .value <- (in * scale) <- uint32le;
};

transform main = repeat {item(100)};

More complicated examples can be found in uspace/dist/src/bithenge.

Tokenization

Bithenge scripts consist of ASCII text. When a script is read, it is first divided into pieces known as tokens. As in most programming languages, the longest possible token is read, so == is always a single token rather than two. Whitespace (space, form‐feed, newline, carriage return, horizontal tab, and vertical tab) and comments (# until newline) are allowed between tokens; both LF (Unix) and CRLF (DOS) newline conventions can be used.

There are several types of tokens:

Symbols: && ++ == >= // <- <= != || < > = + - * % ; : { } ( ) .
Keywords: do else false if in partial repeat struct switch transform true while
Identifiers: Any sequence of letters, digits, and underscores, beginning with a letter, that is not a keyword.
Integer literals: A sequence of decimal digits; negative literals are not yet possible.

Keywords and identifiers are case‐sensitive. Tokens can be at most 256 characters long.

Scripts

script → *definition
definition → "transform" identifier [parameters] "=" transform ";"
parameters → "(" [identifier *("," identifier)] ")"

A Bithenge script consists of a series of transform definitions. It must define a transform named main, which is applied when the script is used. Transforms may be defined with parameters, in which case corresponding arguments must be provided when the transform is invoked.

Examples:

transform main = uint8; defines an extremely simple main transform.
transform is_odd(val) = (val % 2 == 1); defines a transform that checks whether its argument is odd.

Transform invocation

transform → identifier [arguments]
arguments → "(" [ expression *("," expression) ] ")"

A built‐in or previously defined transform can be invoked, with arguments if necessary.

Examples:

is_odd(17) is a transform that applies is_odd with 17 as the argument.

Transform composition

transform → transform "<-" transform

A transform can be created by composing two existing transforms. It will work by first applying the right transform to the input, then applying the left transform to the result.

Examples:

ascii <- zero_terminated decodes a zero‐terminated ASCII string.
repeat{uint32le} <- known_length(8) decodes two 32‐bit integers.

Expression transforms

transform → "(" expression ")"

Expression transforms produce their output by calculating an expression. If "in" is used in the expression, it refers to the expression transform’s input; otherwise, the input must be an empty blob. In either case, the parentheses are mandatory.

Examples:

(in + 1) adds 1 to its input.
(.bytes_per_sector * .sectors_per_cluster) calculates the number of bytes per cluster.

Struct transforms

transform → "struct" "{" *struct-field "}"
struct-field → [ "." identifier ] "<-" transform ";"

A struct transform applies each of its subtransforms in sequence to the input blob. The result is an internal node, with the specified key used for the result of each subtransform. If a subtransform has no corresponding key, its result must be an internal node; the result’s keys will be merged into the struct transform’s result.

It must be possible to determine the size of the subblob each subtransform consumes. Bithenge currently only handles the simple cases: it knows uint32le needs 4 bytes, but not that known_length(2) <- repeat{uint32le} needs 8 bytes.

Example:

struct {
    .min <- uint32le;
    .max <- uint32le;
    .mid <- ((.min + .max) // 2);
    <- repeat(3) {uint8};
}

One possible output is {"min": 0, "max": 8, "mid": 4, 0: 127, 1: 126, 2: 125}.

Partial transforms

transform → "partial" [ "(" expression ")" ] "{" transform "}"

A partial transform applies its subtransform to a prefix of a blob. An expression can be given to specify the offset within the blob at which the subtransform is applied.

Examples:

partial {uint8} decodes the first byte of the arbitrary‐length input as an integer.
partial(1) {uint8} decodes the second byte of the arbitrary‐length input as an integer.

Conditional transforms

transform → "if" "(" expression ")" "{" transform "}" "else" "{" transform "}"
struct-field → "if" "(" expression ")" "{" *struct-field "}" [ "else" "{" *struct-field "}" ]

An if transform applies its first subtransform if the expression evaluates to true, and the second subtransform if the expression evaluates to false. A second form can be used in struct transforms: struct { ... if (...) {...} ... } is equivalent to struct { ... <- if (...) { struct { ... } } else { struct { } }; ... }.

transform → "switch" "(" expression ")" "{" *( (expression/"else") ":" transform ";" ) "}"
struct-field → "switch" "(" expression ")" "{" *( (expression/"else") ":" "{" *struct-field "}" ";" ) "}"

A switch transform compares the first expression to each of the case expressions in turn, stopping when they compare equal and using that subtransform. The last case can be "else", which will always be used if none of the previous cases matched. The second form can be used in struct transforms, much like the second form of "if".

Examples:

if (little_endian) { uint32le } else { uint32be } decodes a little‐ or big‐endian integer, depending on little_endian.
struct { switch (.type) { 0: {.x <- uint8; .y <- uint8;}; 1: {.val <- uint16;}; } } decodes two 8‐bit integers or one 16‐bit integer depending on the value of .type.

Repetition transforms

transform → "repeat" [ "(" expression ")" ] "{" transform "}"

A repeat transform applies its subtransform repeatedly to sequential subblobs of the input. The output is an internal node with a member for each repetition, with keys starting at 0. If an expression is given, it determines the number of times to repeat; otherwise, the subtransform is applied until it fails or there is no input remaining.

transform → "do" "{" transform "}" "while" "(" expression ")"

A do/while transform applies its subtransform repeatedly until the expression evaluates to false. The output has the same format as repeat transform output. Scope member expressions should be used in the expression to access the output of the subtransform and determine when to stop.

Examples:

repeat{uint8} decodes each byte as an integer.
repeat(256) {uint32le} decodes an array of 256 integers.
do { struct { .type <- uint8; .value <- uint32le; } } while (.type != 0) decodes types and values, stopping after a type of 0 is seen.

Expressions

expression → "true"
expression → "false"
expression → integer

Expressions can be boolean or integer literals.

expression → identifier

Expressions can use parameters.

expression → "in"

In an expression transform, "in" refers to the transform’s input. Otherwise, it refers to the input of the whole transform in the definition.

expression → "." identifier

A scope member expression looks in each containing struct transform for a member with the given key. It starts from the innermost struct transform and searches as far as the whole transform being defined.

expression → expression "." identifier
expression → expression "[" expression "]"

An expression can look for a member of another expression with a given key.

expression → expression "[" expression ":" "]"
expression → expression "[" expression ":" expression "]"
expression → expression "[" expression "," expression "]"

An expression can create a subblob of another expression. The first expression specifies the blob and the second expression specifies the start offset within the blob. If a comma is used, the third expression specifies the length of the subblob; otherwise it specifies the end offset. If no third expression is given, the subblob extends to the end of the blob.

expression → "(" expression ")"
expression → expression operator expression

Expressions can, of course, use binary operators. The operators and their precedences are as follows:

Operator	Precedence	Associativity	Description
"`.`"	5	Left‐to‐right	Member (see above)
"`[]`"	5	Left‐to‐right	Member or subblob (see above)
"`*`"	4	Left‐to‐right	Integer multiplication
"`//`"	4	Left‐to‐right	Floored/euclidean integer division; divisor must be positive
"`%`"	4	Left‐to‐right	Floored/euclidean modulo operation; divisor must be positive
"`+`"	3	Left‐to‐right	Integer addition
"`-`"	3	Left‐to‐right	Integer subtraction
"`++`"	3	Left‐to‐right	Blob concatenation
"`<`"	2	Left‐to‐right	Integer less‐than
"`<=`"	2	Left‐to‐right	Integer less‐than‐or‐equal
"`>`"	2	Left‐to‐right	Integer greater‐than
"`>=`"	2	Left‐to‐right	Integer greater‐than‐or‐equal
"`==`"	1	Left‐to‐right	Equal‐to (not supported for internal nodes)
"`!=`"	1	Left‐to‐right	Unequal‐to (not supported for internal nodes)
"`&&`"	0	Left‐to‐right	Logical and
"`\|\|`"	0	Left‐to‐right	Logical or

Built‐in transforms

These transforms are implemented in C and included with Bithenge. Note that precise names are preferred; scripts can define shorter aliases if necessary.

name	input	output	description	example
ascii	byte blob node	string	decodes some bytes as ASCII characters	`hex:6869` becomes `"hi"`
bit	1‐bit blob node	boolean	decodes a single bit	`1` becomes `true`
bits_be	byte blob node	bit blob node	decodes bytes as bits, starting with the most‐significant bit	`hex:0f` becomes `bit:00001111`
bits_le	byte blob node	bit blob node	decodes bytes as bits, starting with the least‐significant bit	`hex:0f` becomes `bit:11110000`
known_length(len)	blob node	blob node	requires the input to have a known length
nonzero_boolean	integer	boolean	decodes a boolean where nonzero values are true	`0` becomes `false`
uint8	1‐byte blob node	integer node	decodes a 1‐byte unsigned integer	`hex:11` becomes `17`
uint16be	2‐byte blob node	integer node	decodes a 2‐byte big‐endian unsigned integer	`hex:0201` becomes `513`
uint16le	2‐byte blob node	integer node	decodes a 2‐byte little‐endian unsigned integer	`hex:0101` becomes `257`
uint32be	4‐byte blob node	integer node	decodes a 4‐byte big‐endian unsigned integer	`hex:00000201` becomes `513`
uint32le	4‐byte blob node	integer node	decodes a 4‐byte little‐endian unsigned integer	`hex:01010000` becomes `257`
uint64be	8‐byte blob node	integer node	decodes a 8‐byte big‐endian unsigned integer	`hex:0000000000000201` becomes `513`
uint64le	8‐byte blob node	integer node	decodes a 8‐byte little‐endian unsigned integer	`hex:0101000000000000` becomes `257`
uint_be(len)	bit blob node	integer node	decodes bits as an unsigned integer, starting with the most‐significant bit
uint_le(len)	bit blob node	integer node	decodes bits as an unsigned integer, starting with the least‐significant bit
zero_terminated	byte blob node	byte blob node	takes bytes up until the first `00`	`hex:7f0400` becomes `hex:7f04`

Last modified 13 years ago Last modified on 2012-08-24T06:54:12Z

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text