「被门夹过的核桃,还能补脑吗?」
![图片](https://image.jiqizhixin.com/uploads/editor/531c0daf-ddf5-4b06-a883-a04a5ee57337/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/a873e1b3-a851-41b3-a0a8-b19e652ad1bf/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/63571ce0-8d22-4009-a0e5-80ec97e44552/640.png)
提出了一个高质量的中文指令调优数据集,专门用于与人类交互保持一致,并通过严格的过滤程序实现; 探讨了各种数据源(包括社交媒体、百科全书和传统 NLP 任务)对模型性能的影响。为从中国互联网中选择训练数据提供了重要见解; 各种基准测试和人工评估证实,在 CQIA 数据集上微调的模型表现出卓越的性能,从而使 CQIA 成为中国 NLP 社区的宝贵资源。
![图片](https://image.jiqizhixin.com/uploads/editor/1b1d458c-d643-4b09-9541-0199ed29a7bc/640.png)
论文地址:https://arxiv.org/pdf/2403.18058.pdf 数据地址:https://huggingface.co/datasets/m-a-p/COIG-CQIA 论文标题:COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
社交媒体和论坛:包括知乎、SegmentFault 、豆瓣、小红书、弱智吧。 世界知识:百科全书、四个特定领域的数据(医学、经济管理、电子学和农业)。 NLP 数据集:COIG-PC 、COIG Human Value 等。 考试试题:中学和大学入学考试、研究生入学考试、逻辑推理测试、中国传统文化。
![图片](https://image.jiqizhixin.com/uploads/editor/2be17484-ed54-4099-9a26-3ff1780b8bdf/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/30570569-0776-4402-a519-bc73e7fa3b80/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/0f660f38-adb1-44d2-b404-55535e7f190f/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/eeaa006d-5809-4df7-b677-06ee621e90b4/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/243ab0b2-92cf-46b0-ac28-4110b4fd6951/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/13fb48d6-d73e-4fc0-9ac6-8d1979d61805/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/041f67cd-d686-451d-9cbb-93a8aad4420a/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/458079da-8c70-4a8f-8717-91755d9ab6d1/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/2ba755af-4096-4266-b16e-28948ac05996/640.png)