Improve LatinTerms test
What does this MR do?
In this Slack thread, @eread noted a false positive for the word viable
:
For example, doc/development/contributing/merge_request_workflow.md
contains edge cases that would've been flagged before: viable
and trivial
.
First iteration
I looked at LatinTerms.yml
and can see why it's happening. Periods and spaces are word boundaries, so we created a non-word rule to try to catch variants of e.g.
and i.e.
. I did some digging and found another approach (https://github.com/errata-ai/Google/blob/master/Google/Latin.yml) but I can't figure out how to add in the version of each abbreviation with spaces between the period and the second letter:
# Won't catch 'e. g.' with a space in between
'\b(?:eg|e\.g\.)(?=[\s,;])': for example
# Won't catch 'i. e.' with a space in between
'\b(?:ie|i\.e\.)(?=[\s,;])': that is
However, @cynthia noted the fix for via
is simpler. We don't want it when it's part of a longer word, so test for word boundaries, like this: '\bvia\b'
Second iteration
@marcel.amirault found ways to improve the regex further, and capture more variations on the phrases.
Before and after
vale --no-wrap --filter='.Name=="gitlab.LatinTerms"' doc/**/*.md
Small variances for e.g. and i.e., but (as expected) a larger drop in findings for via
:
type of finding | before | round 1 | round 2 |
---|---|---|---|
e.g. variants | 133 | 128 | 155 |
i.e. variants | 27 | 25 | 37 |
via. | 791 | 705 | 704 |
Related issues
Author's checklist
-
Optional. Consider taking the GitLab Technical Writing Fundamentals course. -
Follow the: -
If you're adding a new page, add the product availability details under the H1 topic title. -
If you are a GitLab team member, request a review based on: - The documentation page's metadata.
- The associated Technical Writer.
If you are a GitLab team member and only adding documentation, do not add any of the following labels:
~"frontend"
~"backend"
~"type::bug"
~"database"
These labels cause the MR to be added to code verification QA issues.
Reviewer's checklist
Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.
If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.
-
If the content requires it, ensure the information is reviewed by a subject matter expert. - Technical writer review items:
-
Ensure docs metadata is present and up-to-date. -
Ensure the appropriate labels are added to this MR. -
Ensure a release milestone is set. - If relevant to this MR, ensure content topic type principles are in use, including:
-
The headings should be something you'd do a Google search for. Instead of Default behavior
, say something likeDefault behavior when you close an issue
. -
The headings (other than the page title) should be active. Instead of Configuring GDK
, say something likeConfigure GDK
. -
Any task steps should be written as a numbered list. - If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
-
-
-
Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.