Expand the raw dataset by additionally checking the number of commits for each repo
This MR expands the raw dataset by introducing new heuristics: number of commits in the repo. We're trying to increase the size of our dataset with good source code quality (estimated empirically). After merging this MR, we're going to rely on the number of stars, watchers, and commits when selecting repos to include in the raw dataset.
The unreview-poc-390200e5.gl_code_suggestions.repo_contents_v2
table contains the results of running the updated SQL scripts.
Edited by Alexander Chueshev