Motivation
One of the major bottlenecks in all blockchain systems is storage: transactions occupy storage space, and need to be kept around forever. Transaction size also translates directly to cost, as transaction fees are generally proportional to the size of a transaction in bytes.
For a variety of reasons, the current binary encoding for A-Block transactions is both inefficient and cannot be upgraded well:
- Byte arrays are often stored as hexadecimal strings, which occupy two bytes per byte of actual data.
- Among other things, this includes all binary data on the stack,
TxOut destination addresses and OutPoint transaction hashes
- Enums are encoded with a 32-bit type ID, even when far fewer bytes would suffice.
- Most significantly, this includes script opcodes, so every opcode in a script ends up occupying 8 bytes: 4 to indicate that the stack entry is a
StackEntry::Op, and another 4 for the actual opcode number.
- All arrays/vectors are encoded with a 64-bit length prefix, even though this is unnecessarily long for all variable-length sequences (such as the number of transaction inputs/outputs) and completely unnecessary in many cases where the array length is fixed (such as the number of bytes in a public key)
- The version number stored in a transaction is stored near the end of the transaction, making it impossible to determine what version the transaction was serialized with without first deserializing all the inputs and outputs (whose format is dependent on the transaction version!!!)
- A full script is encoded in every
TxIn, and is then matched against known script patterns to determine the actual transaction type. As most transaction inputs and outputs follow a consistent format (e.g. P2PKH), it would be nice to split TxIn and TxOut into enums which store only the minimal required information for each type of transaction, rather than having to serialize a full script (which is space-inefficient) and then pattern-match against the script contents (which is error-prone and adds code complexity).
- Scripts currently operate on integers of type
usize, which is non-portable as its size may vary across systems. This should be changed to an integer type with known size, likely u64 or u128, so as to ensure consistent behavior across all platforms.
- Script entries have separate item types for public keys, signatures, integers and binary data. Not only does this currently require two enum discriminators per script entry in serialized scripts, but it also has numerous disadvantages:
- Numbers and byte strings are not interoperable. For instance, there is no way for a script to hash a number, or to concatenate a number with a byte string. Even if we choose to treat them separately at the opcode level (to allow more compact serialization using VarInts), it would be nice if they were treated as byte strings by the interpreter, ideally by simply treating them as byte strings with a fixed length.
- Public keys and signatures could be trivially treated as byte strings. To ensure compact representation, we could dedicate some opcodes for representing constant-length byte data, so that public keys and signatures could be stored in the script without a length prefix.
Potential difficulties:
- Currently, transaction hashes consist of the letter
g followed by 31 (!!!) hexadecimal digits. This means that in their current state, they cannot be decoded into an array of bytes, as they only contain 15.5 bytes of information. I’ve chosen to treat them as 16-byte arrays where the 4 trailing bits are zero, however we could take this opportunity to extend them.
- The same goes for P2SH addresses, which currently consist of the letter
H followed by 63 hexadecimal digits. However, as there are no existing transactions on-chain which use P2SH, this can be changed to use 64 digits without any issues.
- While the hash signed by P2PKH transactions is fortunately based on a textual representation of a transaction (see
construct_tx_in_out_signable_hash()), the actual transaction hash is not. Instead, it’s based on the bincode serialization of the transaction object. This means that in order to handle legacy transactions, it will still be necessary to have code to serialize old transactions in the old format so that their hash can be determined.
get_stack_entry_signable_string, used when computing the signable message for druid transactions, relies on the exact types of stack elements. Changing this would also break existing transactions, however it seems that there are no existing druid transactions on-chain, so this should be a safe change.
Specification
Transaction
| Field name |
Field type |
Notes |
version |
varint(u64) |
|
inputs_len |
varint(u32) |
Number of elements in the following array |
inputs |
Array(TxIn) |
|
outputs_len |
varint(u32) |
Number of elements in the following array |
outputs |
Array(TxOut) |
|
fees_len |
varint(u32) |
Number of elements in the following array |
fees |
Array(TxOut) |
|
druid_info |
DruidInfo |
|
Address
| Field name |
Field type |
Notes |
type |
u8 enum |
See below |
type |
Field name |
Field type |
Notes |
0: P2PKH |
pubkey_hash |
Array(u8) (32) |
SHA3-256 hash of the receiver’s public key |
1: P2SH |
lock_script_hash |
Array(u8) (32) |
SHA3-256 hash of the stringified lock script (see below) |
2: Burn |
no fields |
|
|
Asset
| Field name |
Field type |
Notes |
type |
u8 enum |
See below |
type |
Field name |
Field type |
Notes |
0: Token |
amount |
varint(u64) |
The value in AIBCOIN |
1: Item |
amount |
varint(u64) |
The value in item asset tokens |
|
genesis_hash |
AssetGenesisHash |
The value in item asset tokens |
AssetGenesisHash
| Field name |
Field type |
Notes |
type |
u8 enum |
See below |
type |
Field name |
Field type |
Notes |
0: Create |
no fields |
|
Only allowed in a TxOut in an item creation transaction. This indicates that the genesis hash is equal to the enclosing transaction’s hash. |
1: Hash |
hash |
TxHash |
The hash of the transaction which created the item. |
2: Default |
no fields |
|
Dummy value, seems to be used as a placeholder for a response from a DRUID transaction? |
DruidExpectation
| Field name |
Field type |
Notes |
from |
Array(u8) (32) |
SHA3-256 hash of the transaction inputs which need to be spent (TODO: this is currently completely broken?) |
to |
Address |
The address to which the value must be sent |
asset |
Asset |
The asset which must be transferred to the to address |
DruidInfo
| Field name |
Field type |
Notes |
participants |
varint(u32) |
If 0, the DruidInfo is considered absent and the subsequent fields are omitted |
druid_len |
varint(u32) |
The length of the following array |
druid |
Array(u8) |
UTF-8 string |
expectations_len |
varint(u32) |
Number of elements in the following array |
expectations |
Array(DruidExpectation) |
|
OutPoint
| Field name |
Field type |
Notes |
t_hash |
TxHash |
A transaction hash |
n |
varint(u32) |
The index of an output in the transaction with the specified hash |
TxHash
| Field name |
Field type |
Notes |
hash |
Array(u8) (16) |
SHA3-256 hash of a transaction (see below). The hash is truncated to 16 bytes, and the 16th byte is masked with 0xF0. |
TxIn
| Field name |
Field type |
Notes |
type |
u8 enum |
See below |
type |
Field name |
Field type |
Notes |
0: P2PKH |
previous_out |
OutPoint |
The P2PKH transaction output being spent |
|
public_key |
PublicKey |
|
|
signature |
Signature |
|
1: P2SH |
previous_out |
OutPoint |
The P2SH transaction output being spent |
|
unlock_script |
Script |
|
|
lock_script |
Script |
|
2: CreateItem |
block_number |
varint(u64) |
The current block number, used as a placeholder to prevent replay attacks |
|
public_key |
PublicKey |
|
|
signature |
Signature |
|
3: Coinbase |
block_number |
varint(u64) |
The mined block’s block number |
TxOut
| Field name |
Field type |
Notes |
value |
Asset |
|
locktime |
varint(u64) |
|
script_public_key |
Address |
In spite of the name (which is preserved for legacy reasons), this is actually the receiver’s address. |
PublicKey
| Field name |
Field type |
Notes |
public_key |
Array(u8) (32) |
An Ed25519 public key |
Signature
| Field name |
Field type |
Notes |
signature |
Array(u8) (64) |
An Ed25519 signature |
Script
| Field name |
Field type |
Notes |
len |
varint(u32) |
Number of elements in the following array |
ops |
Array(TBD) |
Undecided as to how scripts should be stored. Do we break existing scripts and switch to a new format which unifies numbers, public keys, signatures and bytes? |
Example
TBD
Considerations
TBD