Add Benchmarks for Validators (!1572) · Merge requests · SoSy-Lab / Benchmarking / SV-Benchmarks

Marian Lingsch-Rosenfeld requested to merge witness-benchmark-set into main Oct 24, 2024

This MR adds benchmarks for validators to SV-COMP. The goal of these benchmarks is to aid validator developers by having benchmarks to evaluate their tools. To achieve this, the benchmarks try to have a wide variety of witnesses making exhaustive use of the features presented in the witness format. In particular this should hopefully separate the concerns of export and validation of witnesses in tools, since now validators can be benchmarked independently of verifiers.

Each proposed benchmark is composed of a witness, a program and a specification and has an expected verdict according to the semantics of witness validation (validation verdict == verification verdict means the witness is correct i.e. confirmed). This information is encoded in a task definition file which is distinguished from a verification task definition file by having an option witness. This option is used by tools to distinguish the witness from other input files. An example of how such a task definition looks below. This task definition file indicates that the witness is not valid for the given program i.e. should be rejected by a validator.

format_version: '2.0'

input_files: 
  - '../linear-inequality-inv-a.c'
  - 'linear-inequality-inv-a.1.witness-2.0.yml'

properties:
  - property_file: ../../properties/unreach-call.prp
    expected_verdict: false

options:
  language: C
  data_model: ILP32
  witness: 'linear-inequality-inv-a.1.witness-2.0.yml'

The goal of these and future benchmarks is to have high-quality hand-crafted benchmarks that explore new or complex aspects of validators and not to keep track of all witnesses which tools can generate on the whole of SV-Benchmarks.

To indicate a separation between the benchmarks for validators and the ones for verifiers the validation benchmarks and the corresponding witnesses are kept as sub-directories of the verification benchmarks. For example if loop-invariants contains some verification tasks, then loop-invariants/witnesses will contain the witnesses and validation task definition files.

Please note that the CI will remain red until the new container for the sanity checks has been pushed, since WitnessLint has been added to it to validate the syntax of the witnesses in the sanity checks.

Further references:

Pull request adjusting the all validators to handle this type of benchmarks while keeping them backwards compatible with the previous way of passing witnesses through benchmark definition files into BenchExec: https://github.com/sosy-lab/benchexec/pull/1093
SV-COMP community meeting notes, where this MR was decided on (2024-10-22 point 6): https://docs.google.com/document/d/1A-GSaImjWSMbspERfOVAInmsjw3_tkIuAcHshn98Ro8/edit?tab=t.0

The check list required when adding new benchmarks:

programs added to new and appropriately named directory
license present and acceptable (in machine-readable comment at beginning of program as specified by the REUSE project)
contributed-by present (either in README file or as comment at beginning of program)
programs added to a .set file of an existing category, or new sub-category established (if justified)
suggest to add it to benchmark definitions (.XML) in repositories for benchmark-definitions
intended property matches the corresponding .prp file
programs and expected answer added to a .yml file according to task definitions

data model present in task-definition file
original (ideally not preprocessed) sources present
Makefile added with correct content and without overly broad suppression of warnings

Add Benchmarks for Validators

Merge request reports