Integrated Browser Performance Testing
Customer Problem
While a load testing tool can help provide specific insights to the performance and responsiveness of the server, it does not provide insight into the full end-to-end experience. This is an important gap in coverage, especially when the user's experience is the most critical metric.
In some cases a server could be responding quickly to some requests, but not all, for a given page. Or a page could be structured where paint is delayed until a subsequent request has fully completed. There also other important metrics to consider, such as overall page download size, which can have a significant impact on the mobile browsing experience.
These tests can be run during a CI job, to gather early data on UX impact prior to it reaching an actual customer. These can also be ran continuously on production, as changes outside of code can have a large impact on these metrics.
Potential GitLab solution
There are a few different tools that can be used to gather this information:
Prometheus has the capability of running these tests for us, utilizing the chrome web driver. This is essentially a Prometheus endpoint which runs the headless version of Chrome. It provides timings across all phases of loading and rendering the page, allowing us visibility into the time it took for each.
Alternatively, SiteSpeed also is a contender although we would need the Graphite to get data into Prometheus. The benefit here is that it includes a "Coach" which could potentially provide additional insights into fixes.
We could also look at other methods of generating these results. The benefit however of ones with Prometheus support is that we can use the same integration and technology to also monitor these environments continuously over time, in Production and other long running environments.
The general steps would be:
-
Pick the monitoring tool, and build a CI job (and container if needed) which can easily run these tests and funnel data into Prometheus. -
Stitch together a CI job which would: stand up a temporary environment, run the monitoring tool, and collect results -
Automatically generate a CI job against /
for Auto DevOps -
Display the results on the MR widget -
Persist the results, and use for later comparison -
Allow customers to specify additional URL's to test (https://gitlab.com/gitlab-org/gitlab-ee/issues/3540)
And the next step:
-
Execute the tests periodically over time for monitoring long running environments like staging or production
An example of the results of longer monitoring can be seen below. Here we see results of polling a few different pages in production, and you can see how much slower the MR pages are responding than issues are other workflows.
An example of the detailed events that can be collected:
instance:navigation_timing_back_end_seconds{} = navigation_timing_response_start_seconds{job="webdriver"} - navigation_timing_start_seconds{job="webdriver"}
instance:navigation_timing_dom_content_loaded_seconds{} = navigation_timing_dom_content_loaded_event_start_seconds{job="webdriver"} - navigation_timing_start_seconds{job="webdriver"}
instance:navigation_timing_dom_interactive_seconds{} = navigation_timing_dom_interactive_seconds{job="webdriver"} - navigation_timing_start_seconds{job="webdriver"}
instance:navigation_timing_domain_lookup_seconds{} = navigation_timing_domain_lookup_end_seconds{job="webdriver"} - navigation_timing_domain_lookup_start_seconds{job="webdriver"}
instance:navigation_timing_front_end_seconds{} = navigation_timing_load_event_start_seconds{job="webdriver"} - navigation_timing_response_end_seconds{job="webdriver"}
instance:navigation_timing_page_download_seconds{} = navigation_timing_response_end_seconds{job="webdriver"} - navigation_timing_response_start_seconds{job="webdriver"}
instance:navigation_timing_page_load_seconds{} = navigation_timing_load_event_start_seconds{job="webdriver"} - navigation_timing_start_seconds{job="webdriver"}
instance:navigation_timing_redirection_seconds{} = navigation_timing_fetch_start_seconds{job="webdriver"} - navigation_timing_start_seconds{job="webdriver"}
instance:navigation_timing_server_connection_seconds{} = navigation_timing_connect_end_seconds{job="webdriver"} - navigation_timing_connect_start_seconds{job="webdriver"}
instance:navigation_timing_latency_seconds{} = navigation_timing_response_start_seconds{job="webdriver"} - navigation_timing_fetch_start_seconds{job="webdriver"}
instance:navigation_timing_transfer_seconds{} = navigation_timing_response_end_seconds{job="webdriver"} - navigation_timing_response_start_seconds{job="webdriver"}
instance:navigation_timing_dom_processing_interactive_seconds{} = navigation_timing_dom_interactive_seconds{job="webdriver"} - navigation_timing_dom_loading_seconds{job="webdriver"}
instance:navigation_timing_dom_processing_complete_seconds{} = navigation_timing_dom_complete_seconds{job="webdriver"} - navigation_timing_dom_interactive_seconds{job="webdriver"}
instance:navigation_timing_onload_seconds{} = navigation_timing_load_event_end_seconds{job="webdriver"} - navigation_timing_load_event_start_seconds{job="webdriver"}
Design
Design should use the same component used for both Code Quality and Security (https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3207)