Maybe very little folks noticed that DS4 supports both generating the refusal vector (or whatever behavior vector you can extract with prompt pairs) and then applying it with different strengths to the model activations at runtime.
Maybe very little folks noticed that DS4 supports both generating the refusal vector (or whatever behavior vector you can extract with prompt pairs) and then applying it with different strengths to the model activations at runtime.