![图片](https://image.jiqizhixin.com/uploads/editor/723bccfd-e572-4add-9ca2-2f7faa4d69bf/640.png)
AIxiv专栏是机器之心发布学术、技术内容的栏目。过去数年,机器之心AIxiv专栏接收报道了2000多篇内容,覆盖全球各大高校与企业的顶级实验室,有效促进了学术交流与传播。如果您有优秀的工作想要分享,欢迎投稿或者联系报道。投稿邮箱:liyazhou@jiqizhixin.com;zhaoyunfeng@jiqizhixin.com
如果单从图像识别角度而言,两种犬类在外观上极为相似,拥有相近的色块像素,仅凭数据内蕴信息(即图像自身)可能难以对二者进行区分,但如果借助外部数据和知识,情况可能会大幅改观。
论文题目:Image Clustering with External Guidance 论文地址:https://arxiv.org/abs/2310.11989 代码地址:https://github.com/XLearning-SCU/2024-ICML-TAC
如何构建图像的文本表征; 如何协同图像和文本进行聚类。
![图片](https://image.jiqizhixin.com/uploads/editor/06525c20-582a-4235-95d2-0a4f03c44a38/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/965abc14-1b53-4c87-bc26-71590480c471/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/6c1ebe32-6aaf-496b-a513-216b1e81265e/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/77efb11d-e816-4ae6-93dd-467d11fae77b/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/f0555ad0-75e7-4744-91d2-0e34d0b5a65b/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/5ae1f27e-9fec-4a4b-acc3-cbf8549f6a44/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/18882ea4-896b-4aa2-b77e-fd9ba0c48071/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/4d894188-e1ed-40f4-aba0-043e68742cf2/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/5b417bf8-289e-4d0c-a2a0-2ee38ae19bf7/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/a8a5e349-6141-46f0-a8a2-110a430fffa6/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/af0e82bf-617a-48d1-ba68-54fee48f258c/640.png)
其中
和
分别对应图像i及其邻居的聚类指派,P和
均为n*K的矩阵,其中K表示目标聚类个数。
![图片](https://image.jiqizhixin.com/uploads/editor/6b2e636b-418b-4202-88ba-32ed0620c794/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/1118dca2-0bca-46ce-99d2-ba27f3541433/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/eb8841da-0b98-4358-8120-ad407830d887/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/a60ce37e-74f2-4549-aa9e-37c398669afa/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/4d42f509-2d25-4f3a-a210-d04c5305faef/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/1d464554-4848-4a82-87e9-69d368a73f6f/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/ed24b94b-a810-4e76-9e7d-0b69ddce76ff/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/bc4a2cb4-bf5b-4622-8a69-eb994f72ae49/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/d5bc97c8-ecef-473c-93fa-08bad72f850f/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/0b194e62-155f-481f-8b34-05cb9e68af3e/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/208f5ad1-02c5-40d5-ab1c-68cbda8e3217/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/d3a37911-9c9a-4c26-bbbe-637962d58af5/640.png)
![图片](https://image.jiqizhixin.com/uploads/editor/ca8858a3-fdb9-41af-bcb6-047f30cc435c/640.png)
其中
为权重参数。需要指出的是,上述损失函数只用来优化额外引入的聚类网络,并不修改CLIP预训练好的文本和图像编码器,因此其整体训练开销较小,实验表明所提出的方法在CIFAR-10的6万张图像上训练仅需使用1分钟。
如何选择合适的外部知识; 如何有效的整合外部知识以辅助聚类。