Allow Zoekt to index private repositories
Problem
There is some security details at !107891 (comment 1248586481).
Zoekt is just using the normal HTTPS clone API in GitLab to clone repositories. This means that it can't access private repositories without a valid token that gives it permission and so cloning just fails.
Proposal
We should generate some token on the GitLab side that Zoekt can use to clone the repository. This token should ideally not be regenerated too frequently as it will add unnecessary load on Postgres. Also it may be best if the token is expired with some frequency for security reasons. We could use project access tokens so that each token generated is only useful for that project or we could implement a new kind of token that can be used to clone any repo. We need to get Appsec to help us design a good scheme.
sequenceDiagram
participant user as User
participant gitlab_git as GitLab Git
participant gitlab_sidekiq as GitLab Sidekiq
participant zoekt as Zoekt
user->>gitlab_git: git push git@gitlab.com:gitlab-org/gitlab.git
gitlab_git->>gitlab_sidekiq: ZoektIndexerWorker.perform_async(278964)
gitlab_sidekiq->>zoekt: POST /index {"RepoUrl":"https://zoekt:SECRET_TOKEN@gitlab.com/gitlab-org/gitlab.git","RepoId":278964}'
zoekt->>gitlab_git: git clone https://zoekt:SECRET_TOKEN@gitlab.com/gitlab-org/gitlab.git
We also need to consider if there might need to be updates to the zoekt-dynamic-indexserver
to support this. Some things that come to mind:
- It uses
git clone
thengit fetch
. If the token is used on the initial clone is it still available duringgit fetch
? If the token changes will it get updated for subsequent fetches? - Zoekt usually logs the full commands being executed (also again when there are errors). We don't want these tokens to appear in logs so they may need to be redacted in Zoekt.