Explain this Vulnerability: Content moderation blocking

Why are we doing this work

Vertex introduced Content Moderation Blocking around 2023-06-04. Both the prompt and response are evaluated now, and 1/2 of requests this week are being blocked. The blocked responses include "safetyAttributes"=>{"blocked"=>true} in the returned data object.

47 of the 98 uses of Explain this Vulnerability failed the moderation during 06-04 and 06-08. It's notable (likely due to temperature/non-determinism) that a vulnerability prompt/response will be blocked on one try, and succeed on the next. It's random.

All requests (note that ~15 on 2023-06-08 were from testing)
Blocked responses

Filters used above:

json.message: "Broadcasting AI response"
json.meta.caller_id: LLm:Completion worker
json.data.model_name: Vulnerability
json.data.response_body: "I'm not able to help with that..." (for moderation blocked)

We need to determine:

What modifications we can make to improve the success rate (we can use the export tool to A/B test)
Are their resources we can follow to stay ahead of feature changes like content moderation?

An example prompt that was blocked. We are speculating that the word exploit is what did it. But as stated above, the same vulnerability can both fail and succeed in succession.

You are a software vulnerability developer.
Explain the vulnerability "CWE-502 in SQLiteProfileProvider.cs -  (SCS0028, CWE-502, security_code_scan.SCS0028-1)".
The file "SQLiteProfileProvider.cs" has this vulnerable code:

``
                                return xs.Deserialize (sr);
``

Provide a code example with syntax highlighting on how to exploit it.
Provide a code example with syntax highlighting on how to fix it.
Provide the response in markdown format with headers.

Vertex documentation: https://cloud.google.com/vertex-ai/docs/generative-ai/learn/responsible-ai#safety_filter_threshold

Explain this Vulnerability: Content moderation blocking

Why are we doing this work

Relevant links

Non-functional requirements

Implementation plan

Verification steps