Tools for probing the reasoning ability of LLMs and logging the logprobs (log probabilties) of generated tokens to identify weak points in an LLM's response.