Add Snowplow event tracker and client wrapper
What does this MR do and why?
This adds the ability to track code suggestions events using Python Snowplow tracker.
-
Introduce Snowplow standard synchronous tracker and emitter -
Introduce some environment variables to configure Snowplow tracking -
SNOWPLOW_ENABLED
- default to false, only enable in prod -
SNOWPLOW_ENDPOINT
- default to none, only set in prod
-
-
Add specs and ensure the dependency is stubbed -
Use AsyncEmitter
instead of synchronous emitter
Related to #192 (closed)
How to validate and test locally
We need to setup Snowplow Micro locally to instropect the event emitted from Model Gateway.
- Clone Snowplow Micro repo
- Run
bash ./snowplow-micro.sh
- In another terminal session, build and run a Model Gateway container
$ docker buildx build --platform linux/amd64 -t code-suggestions-api:dev . $ docker run -it --platform linux/amd64 --rm -v $PWD:/app -it code-suggestions-api:dev bash $ poetry run python
- Run the follow script in the Poetry shell session
>>> from codesuggestions.tracking import * None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. >>> client = SnowplowClient(configuration=SnowplowClientConfiguration(endpoint="http://host.docker.internal:9090")) INFO:snowplow_tracker.emitters:Emitter initialized with endpoint http://host.docker.internal:9090/com.snowplowanalytics.snowplow/tp2 >>> client.track(SnowplowEvent(context=SnowplowEventContext(request_counts=[RequestCount(requests=1, errors=0, accepts=1, lang="python", model_engine="vertex-ai", model_name="code-gecko")],prefix_length=2048, suffix_length=1024, language="python", user_agent="vs-code-gitlab-workflow", gitlab_realm="saas"))) INFO:snowplow_tracker.emitters:Attempting to send 1 events INFO:snowplow_tracker.emitters:Sending POST request to http://host.docker.internal:9090/com.snowplowanalytics.snowplow/tp2...
- Verify the events from Snowplow micro via
http://localhost:9090/micro/good
$ curl -s http://localhost:9090/micro/good | jq .[0].rawEvent { "api": { "vendor": "com.snowplowanalytics.snowplow", "version": "tp2" }, "parameters": { "e": "se", "eid": "e86c72b2-8272-476c-98b2-0a9a4b575e07", "aid": "gitlab_ai_gateway", "cx": "eyJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvY29udGV4dHMvanNvbnNjaGVtYS8xLTAtMSIsICJkYXRhIjogW3sic2NoZW1hIjogImlnbHU6Y29tLmdpdGxhYi9jb2RlX3N1Z2dlc3Rpb25zX2NvbnRleHQvanNvbnNjaGVtYS8xLTAtMCIsICJkYXRhIjogeyJyZXF1ZXN0X2NvdW50cyI6IFt7InJlcXVlc3RzIjogMSwgImVycm9ycyI6IDAsICJhY2NlcHRzIjogMSwgImxhbmciOiAicHl0aG9uIiwgIm1vZGVsX2VuZ2luZSI6ICJ2ZXJ0ZXgtYWkiLCAibW9kZWxfbmFtZSI6ICJjb2RlLWdlY2tvIn1dLCAicHJlZml4X2xlbmd0aCI6IDIwNDgsICJzdWZmaXhfbGVuZ3RoIjogMTAyNCwgImxhbmd1YWdlIjogInB5dGhvbiIsICJ1c2VyX2FnZW50IjogInZzLWNvZGUtZ2l0bGFiLXdvcmtmbG93IiwgImdpdGxhYl9yZWFsbSI6ICJzYWFzIn19XX0=", "tna": "gl", "stm": "1691159810000", "tv": "py-1.0.1", "se_ac": "suggestions_requested", "se_ca": "code_suggestions", "p": "pc", "dtm": "1691159810428" }, "contentType": "application/json", "source": { "name": "snowplow-micro-1.7.2-stdout$", "encoding": "UTF-8", "hostname": "host.docker.internal" }, "context": { "timestamp": "2023-08-04T14:36:50.597Z", "ipAddress": "172.17.0.1", "useragent": "python-requests/2.31.0", "refererUri": null, "headers": [ "Timeout-Access: <function1>", "Host: host.docker.internal:9090", "User-Agent: python-requests/2.31.0", "Accept-Encoding: gzip, deflate", "Accept: */*", "Connection: keep-alive", "application/json" ], "userId": "e4d1fce8-2737-4e5d-af85-878fda9a3267" } }
- The context payload
cx
can be Base64-decoded❯ echo -n "eyJzY2hlbWEiOiAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvY29udGV4dHMvanNvbnNjaGVtYS8xLTAtMSIsICJkYXRhIjogW3sic2NoZW1hIjogImlnbHU6Y29tLmdpdGxhYi9jb2RlX3N1Z2dlc3Rpb25zX2NvbnRleHQvanNvbnNjaGVtYS8xLTAtMCIsICJkYXRhIjogeyJyZXF1ZXN0X2NvdW50cyI6IFt7InJlcXVlc3RzIjogMSwgImVycm9ycyI6IDAsICJhY2NlcHRzIjogMSwgImxhbmciOiAicHl0aG9uIiwgIm1vZGVsX2VuZ2luZSI6ICJ2ZXJ0ZXgtYWkiLCAibW9kZWxfbmFtZSI6ICJjb2RlLWdlY2tvIn1dLCAicHJlZml4X2xlbmd0aCI6IDIwNDgsICJzdWZmaXhfbGVuZ3RoIjogMTAyNCwgImxhbmd1YWdlIjogInB5dGhvbiIsICJ1c2VyX2FnZW50IjogInZzLWNvZGUtZ2l0bGFiLXdvcmtmbG93IiwgImdpdGxhYl9yZWFsbSI6ICJzYWFzIn19XX0=" | base64 --decode {"schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1", "data": [{"schema": "iglu:com.gitlab/code_suggestions_context/jsonschema/1-0-0", "data": {"request_counts": [{"requests": 1, "errors": 0, "accepts": 1, "lang": "python", "model_engine": "vertex-ai", "model_name": "code-gecko"}], "prefix_length": 2048, "suffix_length": 1024, "language": "python", "user_agent": "vs-code-gitlab-workflow", "gitlab_realm": "saas"}}]}
Edited by Tan Le