ArXiv-2507

Extensive Reading Author Info R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning Background Existing acceleration methods like Speculative Decoding have limitations: Rigid Consistency: They require the Small Language Model (SLM) to match the LLM’s tokens exactly. If the SLM phrases a correct reasoning step differently, speculative decoding rejects it, wasting computation. Low Agreement: In complex reasoning tasks, token-level agreement between SLMs and LLMs is often low, leading to frequent rollbacks and minimal speed gains. ...