R-Stitch Dynamic Trajectory Stitching for Efficient Reasoning

Extensive Reading Author Info R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning Background Existing acceleration methods like Speculative Decoding have limitations: Rigid Consistency: They require the Small Language Model (SLM) to match the LLM’s tokens exactly. If the SLM phrases a correct reasoning step differently, speculative decoding rejects it, wasting computation. Low Agreement: In complex reasoning tasks, token-level agreement between SLMs and LLMs is often low, leading to frequent rollbacks and minimal speed gains. ...

February 2, 2026 · Last updated on February 2, 2026 · 3 min · KKKZOZ