Improve API token argument validation (!102) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Tan Le requested to merge improve-api-key-args-validation into main Oct 13, 2023

What does this merge request do and why?

Provider API tokens are required when the respective models are specified in the model list. The existing implementation returns a quite verbose error message.

❯ poetry run promptlib code-suggestions \
          eval \
          --input-bq-table unreview-poc-390200e5.gl_gitlab_codebase.input_raw_v1 \
          --output-bq-table unreview-poc-390200e5:gl_gitlab_experiments.tl-test-eval \
          --batch-size 24 \
          --min-length 25 \
          --throttle-sec 0.01 \
          --language rust \
          --model claude-2
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/Users/tanle/code/gitlab/prompt-library/.venv/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery.py:2792: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
  project_id = pcoll.pipeline.options.view_as(GoogleCloudOptions).project
/Users/tanle/code/gitlab/prompt-library/.venv/lib/python3.11/site-packages/apache_beam/io/gcp/bigquery_read_internal.py:150: BeamDeprecationWarning: options is deprecated since First stable release. References to <pipeline>.options will not be supported
  pipeline_options = input.pipeline.options
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/tanle/code/gitlab/prompt-library/promptlib/code_suggestions.py:671 in eval                │
│                                                                                                  │
│   668 │   │   │   by_model, len(models)                                                          │
│   669 │   │   )                                                                                  │
│   670 │   │                                                                                      │
│ ❱ 671 │   │   completions_by_model = [                                                           │
│   672 │   │   │   run_completion(model, part) for model, part in zip(models, preprocessed_part   │
│   673 │   │   ]                                                                                  │
│   674                                                                                            │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │               anthropic_api_token = None                                                     │ │
│ │                        batch_size = 24                                                       │ │
│ │                          by_model = <function eval.<locals>.by_model at 0x157af2980>         │ │
│ │                               ctx = <click.core.Context object at 0x157af7bd0>               │ │
│ │             huggingface_api_token = None                                                     │ │
│ │                    include_suffix = False                                                    │ │
│ │                    input_bq_table = 'unreview-poc-390200e5.gl_gitlab_codebase.input_raw_v1'  │ │
│ │                         languages = [<LanguageId.RUST: 'rust'>]                              │ │
│ │                        min_length = 25                                                       │ │
│ │                            models = [<ModelCollection.CLAUDE_2: 'claude-2'>]                 │ │
│ │                   output_bq_table = 'unreview-poc-390200e5:gl_gitlab_experiments.tl-test-ev… │ │
│ │                          pipeline = <apache_beam.pipeline.Pipeline object at 0x157c002d0>    │ │
│ │                  pipeline_options = <apache_beam.options.pipeline_options.PipelineOptions    │ │
│ │                                     object at 0x157af7c90>                                   │ │
│ │              post_transformations = []                                                       │ │
│ │                        preprocess = <PCollection[Apply prompt transformations.None] at       │ │
│ │                                     0x157d27a10>                                             │ │
│ │ preprocessed_partitioned_by_model = <DoOutputsTuple main_tag=None tags=('0',)                │ │
│ │                                     transform=<ParDo(PTransform)                             │ │
│ │                                     label=[ParDo(ApplyPartitionFnFn)]> at 0x157c12190>       │ │
│ │                    run_completion = <function eval.<locals>.run_completion at 0x157af2a20>   │ │
│ │                      throttle_sec = 0.01                                                     │ │
│ │                       to_testcase = <function eval.<locals>.to_testcase at 0x157af28e0>      │ │
│ │                   transformations = []                                                       │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /Users/tanle/code/gitlab/prompt-library/promptlib/code_suggestions.py:672 in <listcomp>          │
│                                                                                                  │
│   669 │   │   )                                                                                  │
│   670 │   │                                                                                      │
│   671 │   │   completions_by_model = [                                                           │
│ ❱ 672 │   │   │   run_completion(model, part) for model, part in zip(models, preprocessed_part   │
│   673 │   │   ]                                                                                  │
│   674 │   │                                                                                      │
│   675 │   │   completions = completions_by_model | "Merging Completions" >> beam.Flatten()       │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │             .0 = <zip object at 0x157d7acc0>                                                 │ │
│ │          model = <ModelCollection.CLAUDE_2: 'claude-2'>                                      │ │
│ │           part = <PCollection[Partition by                                                   │ │
│ │                  model/ParDo(ApplyPartitionFnFn)/ParDo(ApplyPartitionFnFn).0] at             │ │
│ │                  0x157d337d0>                                                                │ │
│ │ run_completion = <function eval.<locals>.run_completion at 0x157af2a20>                      │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│                                                                                                  │
│ /Users/tanle/code/gitlab/prompt-library/promptlib/code_suggestions.py:635 in run_completion      │
│                                                                                                  │
│   632 │   │   │   )                                                                              │
│   633 │   │   elif model_meta.provider == ModelProvider.ANTHROPIC:                               │
│   634 │   │   │   if anthropic_api_token is None:                                                │
│ ❱ 635 │   │   │   │   raise ValueError("Please provide anthropic api token for calling anthrop   │
│   636 │   │   │                                                                                  │
│   637 │   │   │   completions = batches | f"Anthropic request completions {model}" >> beam.Par   │
│   638 │   │   │   │   BatchRequestCompletionsAnthropic(model_meta=model_meta, api_token=anthro   │
│                                                                                                  │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │   anthropic_api_token = None                                                                 │ │
│ │            batch_size = 24                                                                   │ │
│ │               batches = <PCollection[Throttle completions API call claude-2.None] at         │ │
│ │                         0x157d56f50>                                                         │ │
│ │ huggingface_api_token = None                                                                 │ │
│ │        include_suffix = False                                                                │ │
│ │                 model = <ModelCollection.CLAUDE_2: 'claude-2'>                               │ │
│ │            model_meta = ModelMeta(                                                           │ │
│ │                         │   name=<AnthropicModel.CLAUDE_2: 'claude-2'>,                      │ │
│ │                         │   provider=<ModelProvider.ANTHROPIC: 3>,                           │ │
│ │                         │   default_parameters={'max_output_tokens': 64},                    │ │
│ │                         │   input_token_limit=8192                                           │ │
│ │                         )                                                                    │ │
│ │  preprocess_partition = <PCollection[Partition by                                            │ │
│ │                         model/ParDo(ApplyPartitionFnFn)/ParDo(ApplyPartitionFnFn).0] at      │ │
│ │                         0x157d337d0>                                                         │ │
│ │          throttle_sec = 0.01                                                                 │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Please provide anthropic api token for calling anthropic api.

How to set up and validate locally

Check out to this merge request's branch.
Run the follow command to verify the invalid command error

❯ poetry run promptlib code-suggestions \
          eval \
          --input-bq-table unreview-poc-390200e5.gl_gitlab_codebase.input_raw_v1 \
          --output-bq-table unreview-poc-390200e5:gl_gitlab_experiments.tl-test-eval \
          --batch-size 24 \
          --min-length 25 \
          --throttle-sec 0.01 \
          --language rust \
          --model claude-2

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Usage: promptlib code-suggestions eval [OPTIONS]
Try 'promptlib code-suggestions eval --help' for help.
╭─ Error ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Invalid value for '--anthropic-token': Please provide Anthropic API token option for calling Anthropic API.                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Merge request checklist

I've ran the eval_codebase.py pipeline to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Oct 16, 2023 by Tan Le

Improve API token argument validation

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports