Labels as a facet: Index labels for Issues
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This is the first step of &8514 (closed): Indexing labels of issues for Advanced Search.
Details
We're going to index labels for issues, which would allow us to efficiently aggregate them and improve Advanced Search facets.
Here's the technical plan we have right now:
- Index label_ids as an array of keywords
["123", "124", "201"]
- Use aggregations to load label_ids sorted by popularity for the issue results. We need to determine the size (limit) for that
- Load labels from the database
where(id: label_ids)
to get meta fields like color and group/project ids - Filter (search) labels on the frontend side
This is essentially the original option 2, but with a faster iteration (using database with a limited number of ids). Also, this would allow us to update the labels field in the index only when labels are removed or added. As our next iteration we might want to consider creating a separate Elasticsearch index for labels and using it instead of the database.
I think with a reasonable limit (maybe 100-300?) it should be quite fast for the database and the frontend filtering.
Technical challenges
note: labels cannot be transferred out of projects or groups
- want to avoid database queries so store all information needed in index
- do not want to store description (too big)
- how to handle global searches when many labels are potentially returned
- return top X labels (100? 50?)
- how to propagate label title or color change if stored in an array with those fields
- update by query is an option
- reindexing all issues with the labels is another option
Future iteration
- Use a separate labels index for global searches
- Use update_by_query for updated labels if reindexing is not performant enough
- Adding the 3rd queue (
operational
) for updating labels for existing documents