除了分数,打出分数背后的理由对于大模型对齐更具价值。
论文标题:Reasons to Reject? Aligning Language Models with Judgments 论文链接:https://arxiv.org/abs/2312.14591 Github 链接:https://github.com/wwxu21/CUT
优点:训练稳定;实现简单。 缺点:收集高质量、多样化的示例数据成本高;无法从错误回复中学习;示例数据往往和模型无关。
优点:能同时利用正确回复和错误回复;反馈信号与模型相关。 缺点:反馈信号稀疏;训练过程往往比较复杂。
![图片](https://image.jiqizhixin.com/uploads/editor/948ad10a-fb31-4e94-a772-83075194a5da/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/b53dc156-ca89-4d6f-982d-0827e4be76e7/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/4afb71d1-692c-4f21-b601-d0ec20022e10/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/b01654dd-b8e2-41e7-8f85-445b753061bc/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/10571886-6ddf-42fd-afff-41a53d8b5cf0/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/b840cd08-8e10-4dac-9a7c-81b0adf7c599/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/80acd583-9de7-4d06-bfeb-cf7702b9d73d/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/8294523b-83ec-4578-b6d7-bd0606cfca57/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/915631ea-51d0-4a70-bb10-17753d835631/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/f4695a28-0afa-4b27-a72c-113853b2379e/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/9324a1c0-fdaf-4f11-bb26-c7eeda93022a/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/69436451-18d4-4db9-a85d-3b56a4f70098/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/96811741-d16e-429e-b3ac-3d04097fcaf3/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/678a1169-8c54-4db7-b9ff-694a8083b9f1/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/762e432a-b24c-4ffc-8661-da36a41a36e5/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/44d0d655-7c16-4190-93dc-2a4f78803f32/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/3522cf83-22b5-4c24-bfce-c5b166fb4ef7/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/b8851b7a-d7d7-416a-90d2-9f94ad1885a5/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/ee2a2434-856e-4607-a996-b1f8fd5de02b/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/cae4adbb-b9c7-4b25-930e-ba83477201a3/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/0b5a38e0-c0e2-4ae1-b65c-25dde8fdd11c/640.png)
步骤 1:收集指令 ,并获得目标大模型的回复
。
步骤 2:针对上述指令 - 回复对,标注语言反馈 。
步骤 3:采用 CUT,基于收集到的三元组数据 微调目标大模型。