The missing behavioral API documentation regarding delays, background tasks and caching
While working on the GitLab Terraform Provider we've noticed that a lot of REST API endpoints execute things in the background or rely on caching.
This leads to the issue that for example when a project is created, sometimes the default branch hasn't been created or the README is not initialized by the time the API responded.
A few concrete examples which I was able to recollect (there are probably many more):
- The group
POST
API responds with a201 CREATED
, before the group is actually created The groupDELETE
API responds with a202 ACCEPTED
, before the group is actually deleted- The PAT list API still lists already deleted tokens for a few seconds
- When a project is created with
container_registry_enabled
/container_expiration_policy_attributes.enabled
= true
thecontainer_expiration_policy_attributes.enabled
flag is only returned astrue
after some delay - Apparently there is a delay when rotating runner keys
- When creating a project with
initialize_with_readme = true
theREADME.md
may not exist when the API returns - When creating a project the
default_branch
may not be protected when the API returns - When changing the application settings, they may be cached for a while and don't take effect, sometimes for minutes
- probably all these
git grep async_execute -- lib/api
, too
To me, this indicates a short-coming of the current state of API documentation (I suppose GraphQL and REST API suffer alike), which more or less documents the inputs and outputs, but most often don't document the behavior. Which may lead to many problems and confusions of all sorts.
As mentioned above, here I mainly want to address the absence of behavior documentation regarding delays, background tasks and caching. This is especially important for automation (which is arguably why there is an API in the first place), like with Terraform, for which these "unknown behaviors" can be a big problem, because very often API calls follow each other within milliseconds. If those delays and behaviors I've described above are not known, this will eventually fail the automation in nasty sporadic ways.
Besides of the missing documentation, most often the intuitive way people (my assumption!) would assume the API works is in a blocking way - meaning that e.g. if I create a new project with initialize_with_readme = true
, the README.md
file actually exist by the time the API responded and is ready to be used. Even if that is not the case and the behavior is documented, how would the automation efficiently wait for consistency?
I think that these behaviors can be challenged and changed per API endpoint individually, but I think that we all would benefit a lot when we would start rigorously documenting not only API syntax, but also semantics and it's behavior - especially when things are async in some way, have side-effects or are subject to caching.
My intention of this issue is to start a discussion around this and hopefully come to some agreement / plan how to solve the general problem I described here :) (if this isn't already addressed?!) (... and maybe even get some help on one or the other concrete issue :))
/cc @nagyv-gitlab @nmezzopera this is quite an important recurring topic for the provider. Could you help me find the right people to address it, please? Or if such a discussion already exists, maybe point me to it?
/cc @g.hickman I remember that we've been talking a few months ago about the API and the needs of the Terraform provider. Thus, maybe you can help out here, too?
/cc @PatrickRice @armsnyder hello fellow provider maintainers - you maybe interested in that, too