Twitter/X

Architectural changes for SmolLM2

Brief

λ (@lambda0xE) posted a mini-update on SmolLM2 describing storage and arithmetic redesigns that let the ML state manage memory and offload multiplications to GPUs, yielding a reported 16,000× speedup for single-tx execution and ~10× larger token clusters. Octra hosts a fully public SmolLM2-135M program (verified) but full runs cost ~4k OCT (~1B FP64 ops); a webcli interface is coming tomorrow.

Why it matters

Architectural changes for SmolLM2: redesigned storage structure, added fast elements for complex multiplication, and the ML state can now manage memory segments and distribute multiplication work to available GPU resources.

Key details

  • Performance and scale claims: single-transaction execution inside the ML state is now 16,000× faster and token cluster size increased ≈10×.
  • Public rollout and cost: Octra's first fully public inference program for SmolLM2-135M (training, weight loading, and state public) is live for inspection but the full run costs ~4,000 OCT because it performs ~1 billion FP64 ops; a webcli wrapper for interaction is promised tomorrow.
Source evidence

mini-update about SmoLm2, we could speed up the exec of inference ops by redesigning the storage struct, adding fast elements for complex multiplication, now the state itself can manage memory segments and, if necessary, distribute multiplication calculations to available gpu resources

useful work is useful in everything - part of a complex inference can be one of the parts of a PoUW algo (the very first idea that inspired us 3 years ago), we should not stop only at complex hfhe processing and proof systems, but cover any part of a complex work without division, which will be part of a single consensus mechanism

it's worth noting that the speed of single tx execution within the ML state has become 16k times faster, and the token cluster size has been increased by approx 10 times

new programs will be added shortly with public access via webcli (in addition to technical and security updates)

if anything interesting happens, we'll let you know later

the public inference state and storage program on octra side for your reference is here:
octrascan.io/address.html?ad…

λ (@lambda0xE)

the first octra program with fully public inference for SmolLM2-135M (training, weight loading, and state are fully public)

currently, it's only for informational purposes, because the full cycle is expensive (4k OCT) since it performs about 1 billion FP64 arithmetic ops

so, wait for an update of the webcli with an interface for interaction via a wrapper (will be available tomorrow)

now you can look at the process of loading weights:
octrascan.io/epoch.html?id=6… (using this epoch as an example)

example exec with resp: octrascan.io/tx.html?hash=a3…

  • the program is verified and completely open

— https://nitter.net/lambda0xE/status/2048164614597190004#m