Scrub URLs from log keys (!1541) · Merge requests · GitLab.org / gitlab-runner

Steve Xuereb requested to merge 4625-confidential-issue into master Aug 19, 2019

What does this MR do?

Scrub URLs from log keys

Why was this MR needed?

SecretsCleanupHook removes secrets from log messages such as X-AMZ-Signature. It does not remove them for the keys, so when a URL with tokens are logged they are not scrubbed as expected.

Iterate over the keys and scrub them as as well, since this is quite a hot path (every log uses this hook) a benchmark was added to test performance to calculate the before and after and there is a 3x performance hit. After an attempt of adding concurrency the performance was even worse since a mutex had to be introduced.

Before

pkg: gitlab.com/gitlab-org/gitlab-runner/log BenchmarkSecretsCleanupHook_Fire-8 1000000 1326 ns/op PASS

After

pkg: gitlab.com/gitlab-org/gitlab-runner/log BenchmarkSecretsCleanupHook_Fire-8 300000 4813 ns/op PASS

Looking at a flame graph of the CPU this is because of the multiple calls to regex since is quite expensive. After trying to add concurrency the performance as even worse:

pkg: gitlab.com/gitlab-org/gitlab-runner/log
BenchmarkSecretsCleanupHook_Fire-8   	  200000	      9990 ns/op
PASS

Diff of the concurrenct change

diff --git a/log/secrets_cleanup.go b/log/secrets_cleanup.go
index d9929748f..e47be1709 100644
--- a/log/secrets_cleanup.go
+++ b/log/secrets_cleanup.go
@@ -1,12 +1,15 @@
 package log

 import (
+       "sync"
+
        "github.com/sirupsen/logrus"

-       "gitlab.com/gitlab-org/gitlab-runner/helpers/url"
+       url_helpers "gitlab.com/gitlab-org/gitlab-runner/helpers/url"
 )

-type SecretsCleanupHook struct{}
+type SecretsCleanupHook struct {
+}

 func (s *SecretsCleanupHook) Levels() []logrus.Level {
        return logrus.AllLevels
@@ -14,9 +17,38 @@ func (s *SecretsCleanupHook) Levels() []logrus.Level {

 func (s *SecretsCleanupHook) Fire(entry *logrus.Entry) error {
        entry.Message = url_helpers.ScrubSecrets(entry.Message)
+
+       wg := sync.WaitGroup{}
+       wg.Add(len(entry.Data))
+
+       for key, value := range entry.Data {
+               safeEntry := &entrySafe{
+                       Entry: entry,
+                       Mutex: sync.Mutex{},
+               }
+
+               go s.scrubKey(&wg, safeEntry, key, value)
+       }
+
+       wg.Wait()
+
        return nil
 }

+func (s *SecretsCleanupHook) scrubKey(wg *sync.WaitGroup, entry *entrySafe, key string, value interface{}) {
+       defer wg.Done()
+       entry.Lock()
+       defer entry.Unlock()
+
+       str, ok := value.(string)
+       if !ok {
+               // If we fail to parse it to remove it to be safe.
+               delete(entry.Data, key)
+       }
+
+       entry.Data[key] = url_helpers.ScrubSecrets(str)
+}
+
 func AddSecretsCleanupLogHook(logger *logrus.Logger) {
        if logger == nil {
                logger = logrus.StandardLogger()
@@ -24,3 +56,8 @@ func AddSecretsCleanupLogHook(logger *logrus.Logger) {

        logger.AddHook(new(SecretsCleanupHook))
 }
+
+type entrySafe struct {
+       *logrus.Entry
+       sync.Mutex
+}

Scrub URLs from log keys

What does this MR do?

Why was this MR needed?

Before

After

Are there points in the code the reviewer needs to double check?

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Scrub URLs from log keys

What does this MR do?

Why was this MR needed?

Before

After

Are there points in the code the reviewer needs to double check?

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Merge request reports