REGEX matching Bitcoin Script templates - OP_INBLOCK pre-requisites

Introduction

This document explores approaches to implementing OP_INBLOCK functionality also known as Atomic Transfer Pairs.

The goal of this functionality within the A-Block blockchain is to enable a first transaction, to describe another “dependent output” that must exist and be validated simultaneously in order for the first transaction to be valid. Validation includes being added to a block. In most cases the other output will be dependent on an input that also bears a similar requirement on the first transaction, as such in most cases it is expected that the transfer pair must be all included in the same block. Although in the case where the condition is only enforced in one direction, it is feasible that the dependent output could be included in a prior block – whether to allow this is a design decision to be made.

It is a goal that such transactions can be constructed both by cooperation of parties, as well as independently where at the time of constructing the first transaction, the counterparty may not yet be known.

Scenarios

  1. Collaborative pair creation (CPC)

  2. This is where both parties are in communication and negotiate the terms of a trade directly, which includes being able to provide each other with necessary parameters for transaction construction.

  3. Offer first, acceptance later (OFAL)

  4. In this scenario Alice defines the terms of a transaction she will accept and what she is willing to pay as her side of the trade, effectively creating an offer that can be accepted by anyone that fulfills Alice’s stated terms.

  5. Independent offerings, later matched by a 3rd party. (3PM)

Construction basics

Enforcement of transaction scripts only occurs on the input side of a transaction. In standard Bitcoin no validation whatsoever is done on output scripts. This is in part because it would usually be impossible to run the script successfully without providing a child transaction that spends it in order to calculate a sighash and a valid signature over that hash.

Inputs a party may wish to use are unlikely to have OP_INBLOCK conditions attached to them as at the time they are created future spending circumstances (e.g. relative market values) probably are not known.

We propose that OP_INBLOCK validation can happen within the scriptSig itself. The most simple mechanism would be to prefix the usual <pub_key> push operations such that the script must pass the OP_INBLOCK test before proceeding to validation of the signatures.

Risks: In standard Bitcoin script, the only operations allowed in scriptSig are pushdata and all of its variants. Functional opcodes are disallowed and if present will cause the script to fail. This wasn’t alway true, but the measure was put in place by Satoshi whilst he was closing some security holes. Importantly, at the time he also changed the way that the two parts (scriptSig + scriptPubkey) were executed which protected against the specific attacks he was patching against. Nevertheless it will be important to conduct a thorough security analysis on relaxing this rule and we may choose to relax it in the most minimal way possible by only allowing OP_INBLOCK rather than allowing all opcodes.

Key problems to solve

  1. Template definition

  2. Variable placeholder validation

  3. Assume principle of defaulting to restrictive definition (i.e. minimal encoding)

  4. Complex structures out of scope but might limit multisig type arrangements.

  5. How does P2PK fit in?

  6. Return address for OFAL and 3PM scenarios

  7. In these cases the recipient address for the payment as part of the offer is unknown at the time the transaction inputs are signed, otherwise we have to defer signing until the counterparty is found in which case we have reverted to the CPC scenario which is not the goal.

  8. This could be implemented as a variant of OP_INBLOCK that can return a value from the matched transaction and be plugged into the standard transfer part of the script. Alternatively this is a separate op code.

1. A point to note is this is a subset of the functionality of the proposed OP_PUSHTX op code.
  1. Indexability

  2. It is assumed that a 3rd party service will aid in matching buyers and sellers of assets for either CPC, OFAL or 3PM trades.

  3. It is further assumed that (initially) a majority of trades will be made with well known script templates where mechanisms for extracting important data fields, like: value, asset type, addresses etc are also well developed. This is analogous to block explorers and ElectrumX servers constructing address indexes from standard transaction types (p2pkh, p2pk, p2sh, bare multisig). The shortcoming of this approach is that unknown transaction types are impossible to index without accompanying code to recognise and extract such data.

    • Possibly the solution is to define a standard mechanism to define a script template schema that can be widely distributed such that a new transaction template can be essentially made recognizable and indexable by a wide variety of services by way of a plugin.

A note on OP_PUSHTX

It should be acknowledged that the template matching functionality is a subset of the capabilities of OP_PUSHTX when combined with in-script validation. As long as we have string op codes enabled (OP_LEFT, OP_RIGHT, OP_SUBSTR), we have all the primitives necessary to compare one script to another and enforce forward conditions. However, such in-script parsing of transactions pushes a lot of complexity onto the script authors that we will be seeking to avoid. It is important however to note the similarities in capability as we will prefer to avoid creating two overlapping systems as much as possible.

3 use cases for template matching

We note that in preceding sections we’ve identified more than one use case for a template matching language resembling REGEX.

  1. Proof of match

  2. That is the consensus ruleset that takes a candidate paired transaction and validates it fits the script defined criteria

  3. Data extraction - indexing

  4. A consistent mechanism for defining how key indexing variables can be extracted from a new script template (and whether a script matches that template).

  5. Data extraction - OP_INBLOCK return values

  6. A consistent mechanism for defining how variables (like pay-to addresses) can be extracted from the candidate paired transaction and thus injected as part of spending conditions.

  7. (bonus use case) Script hashing

  8. A mechanism for zeroing out variable parts of the script such that a script hash can be generated for easy lookup. E.g. the <pubkey_hash> part of p2pkh is variable, as such if we replace it with a zero and hash the script we have a reliable key for looking up the data extraction schema for this script type.

Complications

Just some notes of unresolved issues.

The ability to sign a transaction before the output scripts are completed necessarily requires a mechanism to exclude those incomplete parts from the sighash. OP_CODESEPARATOR may be a candidate mechanism for this purpose. We also need to preserve the ability for validation to be self-contained given only the transaction itself and its accepted pair.

Note that if we want to be able to chain transactions within blocks it will be necessary to make a choice between:

  • Modifying transaction ID to exclude these incomplete portions (this smells a bit like segwit). This could be in the form of a pre-confirmation and post-confirmation txid with the post-confirmation ID being the one that is used as final, and is part of the merkle tree. OR;
  • Accept that child transactions will not be able to be properly formed and signed until their parents are accepted into a block candidate.

Simple REGEX like scheme

We propose to keep the first iteration of the scheme as simple as possible whilst preserving the ability to extend the language further in the future.

The simplest of the script template for matching is the exact script itself.

[insert p2pkh example as opcodes and bytes]

This is the Regex equivalent of string literals, where script opcodes are treated as characters. Variable length PUSHDATA operations would be treated as exact-match-or-fail.

This is the simplest but least flexible option, and might still require placeholders for PUSHDATA values that might not be known at the time of creating the transaction.

The next level of generality is to introduce placeholders somewhat analogous to the regex “.” (dot) operator. We propose that we would distinguish between placeholders for op codes and places for constant data pushes, possibly only allowing the latter initially.

Subscript validation

Following that we may wish to impose further validation on the content of such placeholders, as well as defining whether such content should be part of returned data. For this we propose to introduce the notion of ‘subscripts’. This is a second invocation of the script engine that is passed the data item on the stack and the subscript executed against that input.

By way of example if we are defining a template for p2pkh we may want to validate that the value is exactly 20 bytes long. We will use a fictitious meta-opcode as the placeholder and might define our template as such:

OP_INBLOCK {DUP HASH160 OP_VAR{ SIZE 20 EQUALVERIFY } EQUALVERIFY CHECKSIG}

Blocks of opcodes in brackets are wrapped as sequence of script bytes in pushdata so they appear to the original invocation of the script engine as a single data blob. Only within the OP_INBLOCK execution context do they have a meaning.

In (pythonesque) pseudo code the behavior looks like this:

def OP_INBLOCK(script_template):

  for output in candidate_tx:
    script = output.script.iterator()
    templ_iter = script_template.iterator()
    valid = true
    while valid && templ_iter.hasNext():
      other_opcode = script.next()
      templ_opcode = templ_iter.next()
      if templ_opcode == OP_VAR):
        validation_script = templ_iter.next()
        stack = new Stack()
        //assume pushdata opcode has a ‘data’ attribute
        stack.push(other_opcode.getData())
        valid = invoke_script(validation_script, Stack)
        if (!valid):
          break
        else:
          valid = other_opcode == templ_opcode
        if valid:
          return output