Dynamic Model Fan Out
Problem to solve
Right now we run out pipeline using apache beam on top of dataflow and we call all our models via 1 step (see the step labeled "Request Vertex Completions" in figure 1) which doesn't allow us to take advantage of some of the parallelisms that beam offers.
graph TD
subgraph Figure 1 Current State
readCode(Read Code)
filterCode(Filter Code)
chunkCode(Chunk Code)
applyPromptTransformations(Apply Prompt Transformations)
batchCodeChunks(Batch Code Chunks)
throttleCompletion(Throttle Completions API Call)
reqeustVertexCompletion(Request Vertex Completions)
batchCompleitions(Batch Code Completions)
throttleEmbedding(Throttle Embedding API Call)
requestEmbeddingSimilarity(Compute Similarity)
postProcessCompletions(Post Process Completions)
sendDataToBigQuery(Send Data to BigQuery)
readCode
--> filterCode
--> chunkCode
--> applyPromptTransformations
--> batchCodeChunks
--> throttleCompletion
--> reqeustVertexCompletion
--> postProcessCompletions
--> batchCompleitions
--> throttleEmbedding
--> requestEmbeddingSimilarity
--> sendDataToBigQuery
end
While adding Huggingface models a method of separating the Huggingface calls from the vertex calls was derived (see figure 2). But this doesn't really leverage the parallelism gains possible that are offered by beam and doesn't meet the goal of having models from different platforms be equal citizens within the prompt-library. As such this proposal was drafted.
graph TD
subgraph Figure 2 Huggingface and Vertex Split
readCode(Read Code)
filterCode(Filter Code)
chunkCode(Chunk Code)
applyPromptTransformations(Apply Prompt Transformations)
batchCodeChunks(Batch Code Chunks)
throttleCompletion(Throttle Completions API Call)
reqeustVertexCompletion(Request Vertex Completions)
reqeustHuggingFaceCompletion(Request Vertex Completions)
batchCompleitions(Batch Code Completions)
throttleEmbedding(Throttle Embedding API Call)
requestEmbeddingSimilarity(Compute Similarity)
mergeResults(Merge Completions)
postProcessCompletions(Post Process Completions)
sendDataToBigQuery(Send Data to BigQuery)
readCode
--> filterCode
--> chunkCode
--> applyPromptTransformations
--> batchCodeChunks
--> throttleCompletion
--> reqeustVertexCompletion
throttleCompletion
--> reqeustHuggingFaceCompletion
reqeustHuggingFaceCompletion --> mergeResults
reqeustVertexCompletion -->
mergeResults
mergeResults
--> postProcessCompletions
--> batchCompleitions
--> throttleEmbedding
--> requestEmbeddingSimilarity
--> sendDataToBigQuery
end
Proposal
It was proposed by @HongtaoYang and @tle_gitlab and I agree that we should be fanning those calls out for all the models so instead of calling all the models in one step or dividing the calling steps by platform we call each model in it's own parallel step. This will align with the work being done by @HongtaoYang to make vertex, hugging face, and anthropic models equal citizens in our architecture, will allow for faster processing times (read improved scale), and when the two initiatives work together make adding models faster in the future. For a visual of this see figure 3 below.
graph TD
subgraph Figure 3 Proposed Idea
readCode(Read Code)
filterCode(Filter Code)
chunkCode(Chunk Code)
applyPromptTransformations(Apply Prompt Transformations)
code-gecko_batchCodeChunks
code-gecko_throttleCompletion
code-gecko_requestCompletion
text-bison_batchCodeChunks
text-bison_throttleCompletion
text-bison_requestCompletion
CodeLlama13b_batchCodeChunks
CodeLlama13b_throttleCompletion
CodeLlama13b_requestCompletion
Phind-CodeLlama-34B-v2_batchCodeChunks
Phind-CodeLlama-34B-v2_throttleCompletion
Phind-CodeLlama-34B-v2_requestCompletion
batchCompleitions(Batch Code Completions)
throttleEmbedding(Throttle Embedding API Call)
requestEmbeddingSimilarity(Compute Similarity)
mergeResults(Merge Completions)
postProcessCompletions(Post Process Completions)
sendDataToBigQuery(Send Data to BigQuery)
readCode
--> filterCode
--> chunkCode
--> applyPromptTransformations
applyPromptTransformations --> code-gecko_batchCodeChunks --> code-gecko_throttleCompletion --> code-gecko_requestCompletion --> mergeResults
applyPromptTransformations --> text-bison_batchCodeChunks --> text-bison_throttleCompletion --> text-bison_requestCompletion --> mergeResults
applyPromptTransformations --> CodeLlama13b_batchCodeChunks --> CodeLlama13b_throttleCompletion --> CodeLlama13b_requestCompletion --> mergeResults
applyPromptTransformations --> Phind-CodeLlama-34B-v2_batchCodeChunks --> Phind-CodeLlama-34B-v2_throttleCompletion --> Phind-CodeLlama-34B-v2_requestCompletion --> mergeResults
mergeResults
--> postProcessCompletions
--> batchCompleitions
--> throttleEmbedding
--> requestEmbeddingSimilarity
--> sendDataToBigQuery
end