Responsible disclosure: A potential sequencer-prover inconsistency in the Cairo VM
Overview
On Sunday, January 26th, Starkware informed us that they had found a critical issue in the Cairo VM related to a program that would successfully execute on the VM but would violate the AIR constraints. The bug was found while investigating a separate issue reported by a third party and a fix was already implemented in a PR. The PR was merged, and a release was cut, which is already updated. You can read Starkware’s disclosure post here.
Technical Implementation
The fix in pull request #1925 adds two changes:
- Additional verification while decoding instructions
- Additional verification on
verify_secure_runner
.
Instruction Decoding: Call Instruction
The call instruction does roughly the following:
- Saves the current frame pointer to
[ap]
- Saves the call return address to
[ap + 1]
- Updates both
fp
andap
toap + 2
, skipping over the saved data. - Updates the
pc
to the start of the target function
As some of the flags of the call instruction are fixed, we can verify that:
- The
dst
register holdsap+0
, where the current frame pointer will be stored.
dst_register == AP
dst_offset == 0
- The
op0
register holdsap+1
, where the call return address will be stored.
op0_register == AP
op0_offset == 1
- Both
fp
andap
are updated toap+2
:
ap_update == Add2
fp_update == APPlus2
If these conditions are not met, the decoding fails.
Instruction Decoding: Return Instruction
The return instruction does roughly the following:
- Restores the previous frame pointer (at
[fp - 2]
) - Jumps to the call return address (at
[fp - 1]
)
As some of the flags of the return instruction are fixed, we can verify that:
- The program counter is updated with an absolute jump
pc_update == Jump
- The jump location is taken from
res
, which equalsfp-1
:
res_logic == Op1
op1_offset == -1
op1_address == FP
- The next frame pointer is taken from
dst
, which equalsfp-2
fp_update == Dst
dst_register == FP
dst_offset == -2
If these conditions are not met, the decoding also fails.
Conditional Jump
This PR also enforces that when pc_update
is equal to 4
(conditional jump), then res_logic
must equal 0
(which implies ignoring that field).
This behavior is documented in the Cairo Whitepaper, page 33:
if pc_update == 4:
if res_logic == 0 && opcode == 0 && ap_update != 1:
res = Unused
else:
Undefined Behavior
Secure runner verification
The verify_secure_runner
function verifies that the completed run in a runner is safe to be relocated and used by other Cairo programs.
The PR verifies that the final frame pointer coincides with the caller's frame pointer, stored at [initial_frame_pointer - 2]
.
- When using
ExecutionMode::ProofModeCanonical
, the whole address must match. - When using
ExecutionMode::RunnerMode
, only the offset must match.
Impact Analysis
As noted in Starkware’s release:
”Since the missing check was in the sequencer and not the prover this has no implication whatsoever on the correctness or security of Starknet. In theory, it could have created a situation that a transaction that appears to have passed will later be reverted (reorg)”
the main risk was having transactions from Cairo0
contracts execute on the sequencer and revert instead of being proved. Since the transaction would not pass the prover there is no risk of incorrect transaction being proved but the revert would impact user experience.
Conclusion
As we’ve stated before, issues such as this one are always possible and likely in complex software and highlight the importance of having multiple teams paying attention to security, close collaboration between them, having simple codebases, and scrutinizing the interactions between components.
Many thanks to Starkware for the notice and quick fix!