Explain this Vulnerability: Content moderation blocking
Why are we doing this work
Vertex introduced Content Moderation Blocking around 2023-06-04. Both the prompt and response are evaluated now, and 1/2 of requests this week are being blocked. The blocked responses include "safetyAttributes"=>{"blocked"=>true}
in the returned data object.
47 of the 98 uses of Explain this Vulnerability failed the moderation during 06-04 and 06-08. It's notable (likely due to temperature/non-determinism) that a vulnerability prompt/response will be blocked on one try, and succeed on the next. It's random.
Filters used above:
- json.message: "Broadcasting AI response"
- json.meta.caller_id: LLm:Completion worker
- json.data.model_name: Vulnerability
- json.data.response_body: "I'm not able to help with that..." (for moderation blocked)
We need to determine:
- What modifications we can make to improve the success rate (we can use the export tool to A/B test)
- Are their resources we can follow to stay ahead of feature changes like content moderation?
An example prompt that was blocked. We are speculating that the word exploit
is what did it. But as stated above, the same vulnerability can both fail and succeed in succession.
You are a software vulnerability developer.
Explain the vulnerability "CWE-502 in SQLiteProfileProvider.cs - (SCS0028, CWE-502, security_code_scan.SCS0028-1)".
The file "SQLiteProfileProvider.cs" has this vulnerable code:
``
return xs.Deserialize (sr);
``
Provide a code example with syntax highlighting on how to exploit it.
Provide a code example with syntax highlighting on how to fix it.
Provide the response in markdown format with headers.
Vertex documentation: https://cloud.google.com/vertex-ai/docs/generative-ai/learn/responsible-ai#safety_filter_threshold
Relevant links
Non-functional requirements
-
Documentation: -
Feature flag: -
Performance: -
Testing: