Add pipeline to create code explanation dataset
What does this merge request do and why?
This MR introduces a new pipeline to create code-explanation dataset using gitlab codebase.
- Read raw code files from https://console.cloud.google.com/bigquery?authuser=0&project=dev-ai-research-0e2f8974&ws=!1m5!1m4!4m3!1sdev-ai-research-0e2f8974!2scode_suggestion!3sinput_raw_v1
- Extract whole functions and classes from the raw code
- Store the code snippet in BQ as our code-explanation dataset.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Closes #131 (closed)
Edited by Tan Le