SLED A Speculative LLM Decoding Framework for Efficient Edge Serving

Extensive Reading Author Info SEC: CCF C Background Insights Pure implementation of Speculative decoding in edge scenarios Edge device holds draft models Edge servers holds verifier models Approaches Route to the server when the confidence score associated with token generated by the edge device falls below a given threshold Two details: When sending tokens to the server, the edge device keeps generating draft tokens, expecting the verifier would accept all sent tokens When retrying due to network issues, the edge device can append new generated tokens to the draft sequence Evaluation ...

December 7, 2025 · Last updated on February 2, 2026 · 1 min · KKKZOZ