Skip to content

IPU Assembly Instruction Reference

This document describes all available IPU assembly instructions.

Compound Instruction Layout

CompoundInst Layout - 179 total bits [178:160] 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 break immediate [172:157] lr reg [176:173] break inst opcode [178:177] [159:128] 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 lr reg [130:127] lr reg [134:131] mult stage reg [136:135] mult inst opcode [139:137] cr reg [143:140] lr reg [147:144] lr reg [151:148] mult stage reg [153:152] xmem inst opcode [156:154] break immediate [172:157] [127:96] 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 lr reg [99:96] vertical stride [101:100] horizontal stride [104:102] elements in row [106:105] aaq reg [108:107] acc inst opcode [112:109] aaq reg [114:113] cr reg [118:115] lr reg [122:119] lr reg [126:123] lr reg [130:127] [95:64] 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 lr immediate [68:53] lcr reg [73:69] lcr reg [78:74] lr reg [82:79] lr inst opcode [84:83] aaq reg [86:85] cr reg [90:87] post fn [92:91] agg mode [93:93] aaq inst opcode [95:94] [63:32] 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 lr immediate [36:21] lcr reg [41:37] lcr reg [46:42] lr reg [50:47] lr inst opcode [52:51] lr immediate [68:53] [31:0] 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 label token [9:0] lr reg [13:10] lr reg [17:14] cond inst opcode [20:18] lr immediate [36:21] Instruction Type Colors: BreakInst (Break / Debug) XmemInst (Extended Memory) MultInst (Multiply) AccInst (Accumulator) AaqInst (Activation and Quantization) LrInst (Link Register) CondInst (Conditional)


XMEM Instructions

Memory access instructions for loading and storing data between registers and memory.

str_acc_reg - Store Accumulator

Store accumulator to memory.

Syntax: str_acc_reg offset base

Operands: - offset: Offset register (lr0-lr15) - base: Base address register (cr0-cr15)

Operation:

Memory[offset + base] = r_acc
Example:
str_acc_reg cr0 cr1;;

ldr_mult_reg - Load Register

Load data from memory into a multiplication stage register.

Syntax: ldr_mult_reg dest offset base

Operands: - dest: Mult stage register (r0, r1, or mem_bypass) - offset: Offset register (lr0-lr15) - base: Base address register (cr0-cr15)

Operation:

dest = Memory[offset + base]
Example:
set lr0 0x1000;;
ldr_mult_reg r0 lr0 cr0;;

ldr_cyclic_mult_reg - Load Cyclic Register

Load with cyclic addressing into r_cyclic.

Syntax: ldr_cyclic_mult_reg offset base index

Operands: - offset: Offset register (lr0-lr15) - base: Base address register (cr0-cr15) - index: Index inside cyclic register (lr0-lr15)

Operation:

r_cyclic[index % 512:128] = Memory[offset + base]

ldr_mult_mask_reg - Load Mask Register

Load mask data from memory.

Syntax: ldr_mult_mask_reg offset base mask_idx

Operands: - offset: Offset register (lr0-lr15) - base: Base address register (cr0-cr15)

Operation:

r_mask = Memory[offset + base]

xmem_nop - No Operation (XMEM)

No operation for xmem slot.

Syntax: xmem_nop

xmem.store_aaq_result - Store AAQ Result

Write the 128-byte AAQ quantization result register to external memory.

Syntax: xmem.store_aaq_result offset base

Operands: - offset: Offset register (lr0-lr15) - base: Base address register (cr0-cr15)

Operation:

Memory[offset + base] = aaq_result  # 128 bytes
Example:
xmem.store_aaq_result lr0 cr0;;


MULT Instructions

Multiplication instructions for element-wise and element-vector operations. The multiplication result (mult_result) is forwarded to the ACC stage in the CPU and not stored in any register in the way.

mult.ee - Element-wise Multiply

Multiply elements of two registers element by element.

Syntax: mult.ee ra cyclic_offset mask_offset mask_shift

Operands: - ra: Multiplicand register (r0, r1, or mem_bypass) - cyclic_offset: Base offset for multiplier from RC (cyclic register) - mask_offset: Offset to select mask from RM (mask register) - mask_shift: Shift applied to the mask register

Operation:

Element-wise multiply with masking
Example:
mult.ee r0 lr0 lr1 lr2;;

mult.ev - Element-Cyclic Multiply (Deprecated)

[DEPRECATED: use mult.ve.cr or mult.ve.aaq] Multiply Ra elements against a fixed element from cyclic register.

Syntax: mult.ev ra fixed_cyclic_idx mask_offset mask_shift

Operands: - ra: Multiplicand register (r0, r1, or mem_bypass) - fixed_cyclic_idx: Fixed index for element selection from cyclic register - mask_offset: Offset to select mask from RM (mask register) - mask_shift: Shift applied to the mask register

Operation:

Multiply each Ra element by fixed cyclic element with masking
Example:
mult.ev r0 lr0 lr1 lr2;;

mult.ve - Vector-Element Multiply

Multiply a fixed element from Ra register against cyclic register elements.

Syntax: mult.ve ra cyclic_offset mask_offset mask_shift fixed_ra_idx

Operands: - ra: Multiplicand register (r0, r1, or mem_bypass) - cyclic_offset: Base offset for multiplier from RC (cyclic register) - mask_offset: Offset to select mask from RM (mask register) - mask_shift: Shift applied to the mask register - fixed_ra_idx: Fixed index for element selection from Ra register

Operation:

Multiply fixed Ra element by cyclic elements with masking
Example:
mult.ve r0 lr0 lr1 lr2 lr3;;

mult_nop - No Operation (MULT)

No operation for multiply slot.

Syntax: mult_nop

mult.ve.cr - Vector-Element Multiply (CR scalar)

Multiply each element of RC[cyclic_offset:cyclic_offset+128] by a scalar from a CR register. Elements beyond RC boundary are treated as 1 (dtype-specific).

Syntax: mult.ve.cr cyclic_offset mask_offset mask_shift cr_idx

Operands: - cyclic_offset: Base offset into RC (cyclic register); non-cyclic — out-of-bounds elements are padded with 1 - mask_offset: Offset to select mask from RM (mask register) - mask_shift: Shift applied to the mask register - cr_idx: CR register whose low byte supplies the fixed scalar multiplier (cr0-cr15)

Operation:

For i in [0,128): rb = RC[cyclic_offset+i] if in bounds else dtype_one; mult_res[i] = CR[cr_idx][0] * rb
Example:
mult.ve.cr lr0 lr15 lr15 cr3;;

mult.ve.aaq - Vector-Element Multiply (AAQ scalar)

Multiply each element of RC[cyclic_offset:cyclic_offset+128] by a scalar from an AAQ register. Elements beyond RC boundary are treated as 1 (dtype-specific).

Syntax: mult.ve.aaq cyclic_offset mask_offset mask_shift aaq_rf_idx

Operands: - cyclic_offset: Base offset into RC (cyclic register); non-cyclic — out-of-bounds elements are padded with 1 - mask_offset: Offset to select mask from RM (mask register) - mask_shift: Shift applied to the mask register - aaq_rf_idx: AAQ register whose low byte supplies the fixed scalar multiplier (aaq0-aaq3)

Operation:

For i in [0,128): rb = RC[cyclic_offset+i] if in bounds else dtype_one; mult_res[i] = AAQ[aaq_rf_idx][0] * rb
Example:
mult.ve.aaq lr0 lr15 lr15 aaq1;;


ACC Instructions

Accumulation instructions for combining values with optional masking and shifting.

acc - Accumulate

Accumulate multiply result.

Syntax: acc

Operation:

r_acc += multiply_result

acc.first - Accumulate First

Set accumulator to multiply result (do not add to previous r_acc).

Syntax: acc.first

Operation:

r_acc = multiply_result
Example:
acc.first;;

reset_acc - Reset Accumulator

Reset accumulator to zero.

Syntax: reset_acc

Operation:

r_acc = 0

acc_nop - No Operation (ACC)

No operation for accumulator slot.

Syntax: acc_nop

acc.add_aaq - Accumulate and Add AAQ

Accumulate multiply result, then add the selected AAQ register (32-bit) to each of the 128 accumulator words.

Syntax: acc.add_aaq aaq_rf_idx

Operands: - aaq_rf_idx: AAQ register index (aaq0-aaq3)

Operation:

r_acc += multiply_result;
for i in [0, 128): r_acc[i] += aaq_regs[aaq_rf_idx]
Example:
acc.add_aaq aaq0;;

acc.add_aaq.first - Accumulate and Add AAQ (First)

Set accumulator to multiply result plus selected AAQ register (do not add to previous r_acc).

Syntax: acc.add_aaq.first aaq_rf_idx

Operands: - aaq_rf_idx: AAQ register index (aaq0-aaq3)

Operation:

r_acc = multiply_result;
for i in [0, 128): r_acc[i] += aaq_regs[aaq_rf_idx]
Example:
acc.add_aaq.first aaq0;;

acc.max - Accumulator Max

For each element, set r_acc[i] = max(r_acc[i], mult_res[i], aaq_reg[aaq_rf_idx]).

Syntax: acc.max aaq_rf_idx

Operands: - aaq_rf_idx: AAQ register index (aaq0-aaq3)

Operation:

for i in [0, 128): r_acc[i] = max(r_acc[i], mult_res[i], aaq_regs[aaq_rf_idx])
Example:
acc.max aaq0;;

acc.max.first - Accumulator Max (First)

For each element, set r_acc[i] = max(mult_res[i], aaq_reg[aaq_rf_idx]). Previous r_acc is ignored (treated as 0).

Syntax: acc.max.first aaq_rf_idx

Operands: - aaq_rf_idx: AAQ register index (aaq0-aaq3)

Operation:

for i in [0, 128): r_acc[i] = max(mult_res[i], aaq_regs[aaq_rf_idx])
Example:
acc.max.first aaq0;;

acc.stride - Accumulator Stride

Reorder the multiplication result into r_acc using horizontal/vertical stride decimation. Only updates the RACC indexes written; leaves the rest unchanged.

Syntax: acc.stride elements_in_row horizontal_stride vertical_stride offset

Operands: - elements_in_row: Elements per row (8, 16, 32, or 64) - horizontal_stride: Horizontal stride mode (enabled, inverted, expand) - vertical_stride: Vertical stride mode (enabled, inverted) - offset: LR register; value % 4 gives start index in RACC (0, 32, 64, or 96)

Operation:

Decimate mult_res as rows×cols; apply horizontal stride (take every 2nd column, optional expand); then vertical stride (take every 2nd row). Write result into r_acc[start:start+N] where start = (offset%4)*32, N = 32|64|128.
Example:
acc.stride 8 off off lr0;;


AAQ Instructions

Activation and quantization: aggregate r_acc into AAQ registers.

aaq_nop - No Operation (AAQ)

No operation for AAQ slot.

Syntax: aaq_nop

agg - Accumulator Aggregate

Collapse 128 r_acc words into one value (SUM or MAX), apply post function, store to selected AAQ register.

Syntax: agg agg_mode post_fn cr_idx aaq_rf_idx

Operands: - agg_mode: sum or max - post_fn: value, value_cr, inv, or inv_sqrt - cr_idx: CR register for value_cr post function (cr0-cr15) - aaq_rf_idx: AAQ register to store result (aaq0-aaq3)

Operation:

If sum: v = sum(r_acc[0..127]). If max: v = max(r_acc[0..127], aaq[aaq_rf_idx]). Apply post_fn(v): value→v, value_cr→v*cr[cr_idx], inv→1/v, inv_sqrt→1/sqrt(v). aaq[aaq_rf_idx] = result.
Example:
agg sum value cr0 aaq0;;

aaq - AAQ Quantize

Quantize the 128-word accumulator from INT32 to INT8, storing clamped results in the aaq_result register. Requires INT8 mode.

Syntax: aaq

Operation:

Requires INT8 mode (cr15 == DType.INT8). For i in [0, 128): aaq_result[i] = clamp(trunc(r_acc[i]), -128, 127)
Example:
aaq;;


LR Instructions

Loop register manipulation instructions for controlling loop counters and addresses.

incr - Increment Loop Register

Increment a loop register by an immediate value.

Syntax: incr reg value

Operands: - reg: Loop register to increment (lr0-lr15) - value: Immediate value to add

Operation:

reg += value
Example:
incr lr0 1;;

set - Set Loop Register

Set a loop register to an immediate value.

Syntax: set reg value

Operands: - reg: Loop register (lr0-lr15) - value: 32-bit immediate value

Operation:

reg = value
Example:
set lr0 0x1000;;

add - Add Registers

Add two registers and store in destination.

Syntax: add dest src_a src_b

Operands: - dest: Destination loop register (lr0-lr15) - src_a: First source register (lr0-lr15 or cr0-cr15) - src_b: Second source register (lr0-lr15 or cr0-cr15)

Operation:

dest = src_a + src_b
Example:
add lr0 lr1 lr2;;

sub - Subtract Registers

Subtract two registers and store in destination.

Syntax: sub dest src_a src_b

Operands: - dest: Destination loop register (lr0-lr15) - src_a: First source register (lr0-lr15 or cr0-cr15) - src_b: Second source register (lr0-lr15 or cr0-cr15)

Operation:

dest = src_a - src_b
Example:
sub lr0 lr1 lr2;;


Conditional Branch Instructions

Control flow instructions for branching based on conditions or unconditionally.

beq - Branch if Equal

Branch if two registers are equal.

Syntax: beq reg1 reg2 label

Operands: - reg1: First register to compare (lr0-lr15) - reg2: Second register to compare (lr0-lr15) - label: Branch target label

Operation:

if (reg1 == reg2) PC = label
Example:
beq lr0 lr1 end;;

bne - Branch if Not Equal

Branch if two registers are not equal.

Syntax: bne reg1 reg2 label

Operands: - reg1: First register to compare (lr0-lr15) - reg2: Second register to compare (lr0-lr15) - label: Branch target label

Operation:

if (reg1 != reg2) PC = label
Example:
bne lr0 lr1 different;;

blt - Branch if Less Than

Branch if first register is less than second.

Syntax: blt reg1 reg2 label

Operands: - reg1: First register to compare (lr0-lr15) - reg2: Second register to compare (lr0-lr15) - label: Branch target label

Operation:

if (reg1 < reg2) PC = label
Example:
blt lr0 lr1 smaller;;

bnz - Branch if Not Zero

Branch if test register not equal to base register.

Syntax: bnz test_reg base_reg label

Operands: - test_reg: Register to test (lr0-lr15) - base_reg: Base comparison register (lr0-lr15) - label: Branch target label

Operation:

if (test_reg != base_reg) PC = label
Example:
bnz lr3 lr0 loop;;

bz - Branch if Zero

Branch if test register equals base register.

Syntax: bz test_reg base_reg label

Operands: - test_reg: Register to test (lr0-lr15) - base_reg: Base comparison register (lr0-lr15) - label: Branch target label

Operation:

if (test_reg == base_reg) PC = label
Example:
bz lr0 lr1 zero;;

b - Unconditional Branch

Always branch to label.

Syntax: b label

Operands: - label: Branch target label

Operation:

PC = label
Example:
b start;;

br - Branch Register

Branch to address in register.

Syntax: br reg

Operands: - reg: Register containing target address (lr0-lr15)

Operation:

PC = reg

bkpt - Breakpoint

Conditional breakpoint.

Syntax: bkpt

Operation:

Halt execution (debugging)


Break Instructions

Debug break instructions for halting execution and entering debug mode.

break - Break

Unconditional break.

Syntax: break

Operation:

Halt execution

break.ifeq - Break if Equal

Break execution if register equals value.

Syntax: break.ifeq reg value

Operands: - reg: Register to test (lr0-lr15) - value: Immediate value to compare against

Operation:

if (reg == value) BREAK
Example:
break.ifeq lr0 10;;

break_nop - No Operation (BREAK)

No operation for break slot.

Syntax: break_nop