Streaming: Fix LS exposing internal details (intent detection)
Problem
When we implemented streaming, we used design that was faster to implement and it will be harder to maintain. The design decision is documented on the epic in a status report &11722 (comment 1698672696)
TL;DR: for every suggestion request we make two requests to the LS, one to get intent and the other to get suggestion/generation.
Solution
The fix is to hide the intent in LS. - Option B
This issue affects both LS and VS Code Extension
Details
The proposed solution will extend the standard inlineCompletion
response with information that tells the client a stream is coming. The best way to attach this "stream metadata" is to use the command
parameter of the InlineCompletionItem
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#textDocument_inlineCompletion
The command
can have a string name
and arbitrary type arguments.
Currently, we use the command
parameter for "suggestion accepted" command. This issue would introduce another command, something like gitlab.ls.streamAttached
with arguments of streamId
and trackingId
(mind you now the client generates the streamId
but after this change the server generates one).
The client would then decide whether to simply show the InlineCompletionItem
(in a case that there is no stream) or it should start streaming.
Other considerations
- there should be a LS configuration that indicates whether the client can handle streaming or not, if not, the LS should use the current method (or alternatively consume the full stream? [this should be discussed with the backend AI Framework team])
Original decision description
Option A: 2 calls to LS
Use @shekharpatnaik's approach of keeping the streaming completely separate. For each completion request from IDE, we'll need first to ask LS if we should do completion or generation and then call the appropriate endpoint LS.
sequenceDiagram
VS Code Extension->>+LS: should I do completion or generation
LS->>-VS Code Extension: generation
VS Code Extension->>+LS: get me stream for position XYZ
LS->>-VS Code Extension: stream
- Pros:
- the completely separate code means the lowest risk to the existing suggestion code
- fast delivery because we ignore existing code
- Cons:
- we'll have to duplicate logic for debouncing, telemetry, cancellation and other features
- we expose implementation details to the client (intent)
- we increase the maintenance cost of the feature
- more complex implementation for all clients
Option B: 1 call to LS
We extend the LS protocol for inline completion with the possibility of a follow-up stream. The LS decides whether it should do completion or generation; for generation, it will start streaming to the client.
sequenceDiagram
VS Code Extension->>+LS: give me inline competion for position XYZ
LS->>-VS Code Extension: completion or stream based on intent
- Pros:
- unified logic, LS fully controls the feature
- reuse of existing debouncing, cancellation and possibly parts of the telemetry (once we know what telemetry looks like)
- less maintenance cost
- simpler implementation for all clients
- Cons:
- It takes longer at the beginning because we have to think about what parts of completion and generation should be shared.