Twitter/X

iFixAi runs 32 inspections across five misalignment categories

Brief

iFixAi is an open-source tester that runs 32 inspections across five misalignment categories (Fabrication, Manipulation, Deception, Unpredictability, Opacity) to detect lies, manipulation, or hallucinations. Compatible with OpenAI, Anthropic, Gemini, Bedrock, Azure, Hugging Face and OpenAI-compatible servers, it grades models A–F and uses a second provider as a cross-judge by default.

Why it matters

iFixAi runs 32 inspections across five misalignment categories: Fabrication (unsourced claims, overconfident answers), Manipulation (privilege escalation, prompt injection), Deception (sandbagging, covert side tasks), Unpredictability (instruction drift, decision instability), and Opacity (session leaks, broken escalation).

Key details

  • The tool is compatible with OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Hugging Face, and any OpenAI-compatible server; it outputs a letter grade (A–F) and, by default, uses a second provider as a cross-judge rather than having the tested model self-evaluate.
  • The project is 100% open source, was promoted by @dr_cintas on 2026-05-08, and claims you can run these checks with a single command (post included a video demonstration).
Reader · no content

No body text on file.

Open the original to read the full piece.