Schema based general computational model for Lineage

1) High level design

  • We propose an ITEM instance can be the manifest for a transaction-set: it identifies the set (start/stop transactions), declares a job card that tells miners how to extract features and run OP_AI_EVAL, and declares how results must be written back (UPDATE tx format).

  • Keep the on-chain ITEM compact: use short keys, canonical JSON (no whitespace), and a fixed small selector DSL so miners can deterministically compute features.

  • Possibly for heavy models: prefer a two-tier pattern (light deterministic on-chain eval; heavy off-chain with commit/verify

2) Compact ITEM metadata schema (short keys)

Use canonical JSON with short keys (all keys shown in parens).

Top level:

  • v = schema version

  • id = ITEM id (optional if implicit)

  • s = set_start_txid

  • e = set_end_txid (stop)

  • j = job card object

  • o = output / update policy

  • p = provenance {creator,ts,nonce}

  • ctl = control flags (deterministic,gasmeter,maxnodes)

Job card j:

  • jid = job id

  • m = engine (must be “OP_AI_EVAL” for miner exec)

  • mod = model id or light op id (e.g., “m:v1” or “linreg:1”)

  • spec = [ feature spec objects ]

  • seed = optional deterministic seed

  • limit = max nodes / blocks to traverse

Feature spec object (each element in spec):

  • n = feature name (short)

  • sel = selector expression (see DSL below)

  • agg = aggregation: sum|avg|count|min|max|first|last|hist

  • t = type: num|cat|bool|text

  • w = window param (e.g., 10blocks or 24h) optional

  • d = default value if missing (optional)

Output / update o:

  • ut = update tx type/name (e.g., “UPDATE”)

  • ref = reference field name to link back to ITEM (e.g., item_id)

  • fields = list of fields to populate in the update {name->short key} or “cid” for off-chain blob

  • sig = boolean (require updater sig)

Control ctl:

  • det = true/false (must be deterministic)

  • maxN = max nodes to crawl

  • gas = gas limit hint

3) Mini DSL for selectors (sel)

Make this small but expressive. Examples:

  • meta.price → value in this tx metadata

  • meta.items.w → iterate array items, pick w field

  • tx.value → transaction value (cash/payment)

  • rel.parent or rel.parents→ txids listed in this tx’s metadata under parent links

  • blk.height → block height

  • time → tx timestamp

  • functions: sum(…), avg(…), count(…) can be used inside selector or use agg in spec.

  • filters: meta.items[cat=grain].q → select items whose cat = grain.
    Keep it unambiguous and implementable with a tiny deterministic parser.

4) BFS crawl algorithm (pseudocode)

Use adjacency stored in each transaction’s metadata (each tx includes rel.parents or rel.children links). ITEM gives start/stop hints and maxN.

queue=[ITEM.s]

seen=set()

count=0

while queue and count<JOB.limit and count<CTL.maxN:

txid=queue.pop(0)

if txid in seen: continue

tx = fetch_tx(txid)

process_tx_for_features(tx) # extract meta fields per job.spec

seen.add(txid); count+=1

for child in tx.meta.rel.children:

if child not in seen: queue.append(child)

for parent in tx.meta.rel.parents:

if parent not in seen: queue.append(parent)

if txid == ITEM.e: break

Miners must be able to fetch rel.* from each tx deterministically.

5) Feature extraction semantics

For each spec entry:

  1. Evaluate sel on each tx’s metadata in the visited set.

  2. Coerce values to t type.

  3. Aggregate over the visited set using agg.

  4. Apply w window if present (time or block window).

  5. Missing values use d or defined imputation (0 or null).
    Return a dense feature vector in a canonical order (order = spec array order).

6) OP_AI_EVAL execution & output

  • j.mod must point to a deterministic, lightweight algorithm representation that miners can execute (e.g., small parametric model, ruleset, linear model, decision tree). Model should be encoded as compact JSON or bytecode with a well-specified interpreter built into miner validation.

  • The evaluation result object to be written back should include:

    • item (ITEM id)

    • job (jid)

    • feat_hash (hash of canonicalized feature vector)

    • model (model id and version)

    • result (score, label, action)

    • optional cid pointing to full result blob (if too big)

  • This result is packed into the UPDATE tx declared in o. If o.sig true, updater must sign.

7) Example compact ITEM metadata (fits well within ~800 chars)

Canonical JSON, no spacing (short keys). This is ready to be placed as an ITEM metadata field:

{“v”:“1”,“s”:“txStart123”,“e”:“txStop999”,“j”:{“jid”:“J1”,“m”:“OP_AI_EVAL”,“mod”:“linreg:v1”,“seed”:“0xabc”,“limit”:500,“spec”:[{“n”:“wt_sum”,“sel”:“meta.items.w”,“agg”:“sum”,“t”:“num”},{“n”:“price_avg”,“sel”:“meta.price”,“agg”:“avg”,“t”:“num”},{“n”:“tx_cnt”,“sel”:“txid”,“agg”:“count”,“t”:“num”}]},“o”:{“ut”:“UPDATE”,“ref”:“item_id”,“fields”:[“score”,“action”,“cid”],“sig”:true},“p”:{“creator”:“addr1”,“ts”:1728000000,“nonce”:“42”},“ctl”:{“det”:true,“maxN”:200,“gas”:200000}}

8) UPDATE tx format (recommended)

When writing the evaluation back, use canonical JSON in the UPDATE tx metadata:

{“item”:“<ITEM.id>”,“jid”:“J1”,“model”:“linreg:v1”,“feature_hash”:“0x…”,“result”:{“score”:0.71,“label”:“ok”,“action”:“settle”},“cid”:“bafy…”,“ts”:1728…,“sig”:“0x…”}

If cid present, it points to full payload stored elsewhere (IPFS/CID or on-chain blob).

9) Determinism, miner constraints & security

  • Determinism: OP_AI_EVAL must be deterministic. No randomness unless seed is provided and used consistently.

  • Complexity limits: include limit/maxN/gas to bound work miners must do. Reject JOBs without limits.

  • Model size: require compact models (<< on-chain compute/time). For heavier models use commit/verify: a miner can verify a commit hash and a zero-knowledge or fraud-proof style scheme can attest result off-chain.

  • Canonicalization: define canonical JSON ordering, encoding, float precision, rounding and hashing rules (so different miners compute same feature_hash).

  • Access control: use o.sig and updater whitelist in o to prevent unauthorized writes.

  • Audit trail: include provenance (p) and ensure every UPDATE references item and jid.

10) Best practices & patterns

  • Short keys to save space. Provide a mapping doc off-chain for human readability.

  • Version v so you can evolve the schema safely.

  • Keep selectors simple and implement a tiny deterministic interpreter inside miner code — avoid full Turing-complete logic inside selectors.

  • Use CIDs for large payloads (store full evaluation details off-chain and put CID in the UPDATE).

  • Two-tier model: on-chain for quick deterministic scoring (rules, linear models, trees), off-chain for heavy ML with a commit/verify or oracle pattern.

  • Gas & timeouts: require JOB to specify maximum gas/time; miners skip if exceeds limits; validators reject noncompliant executions.

  • Testing harness: release a local test harness that canonicalizes txs, runs BFS, extracts features, runs OP_AI_EVAL, and produces UPDATEs — helps miners/validators implement behavior identically.