Auto Byte



Science AI



ArXiv Weekly Radiostation:本周NLP、CV、ML精选论文30篇(12.29-1.4)

机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,精选每周NLP、CV、ML领域各10篇重要论文,本周详情如下:

10 NLP Papers You May Want to Read


Why you may want to read this: Newest paper from Jason Weston (Facebook).

All-in-One Image-Grounded Conversational Agents. 

Da Ju, Kurt Shuster, Y-Lan Boureau, Jason Weston

As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging existing individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We explore architectures that combine state-of-the-art Transformer and ResNeXt modules fed into a multimodal module to produce a combined model trained on many tasks. We provide a thorough analysis of the components of the model, and transfer performance when training on one, some, or all of the tasks. Our final models provide a single system that obtains good results on all vision and language tasks considered, and improves the state of the art in image-grounded conversational applications.


Why you may want to read this: Newest paper from Katia Sycara (Professor School of Computer Science, Carnegie Mellon University).

Simultaneous Identification of Tweet Purpose and Position. 

Rahul Radhakrishnan Iyer, Yulong Pei, Katia Sycara

Tweet classification has attracted considerable attention recently. Most of the existing work on tweet classification focuses on topic classification, which classifies tweets into several predefined categories, and sentiment classification, which classifies tweets into positive, negative and neutral. Since tweets are different from conventional text in that they generally are of limited length and contain informal, irregular or new words, so it is difficult to determine user intention to publish a tweet and user attitude towards certain topic. In this paper, we aim to simultaneously classify tweet purpose, i.e., the intention for user to publish a tweet, and position, i.e., supporting, opposing or being neutral to a given topic. By transforming this problem to a multi-label classification problem, a multi-label classification method with post-processing is proposed. Experiments on real-world data sets demonstrate the effectiveness of this method and the results outperform the individual classification methods.


Why you may want to read this: Newest paper from Gary D. Bader (Professor of Molecular Genetics and Computer Science, The Donnelly Centre, University of …).

End-to-end Named Entity Recognition and Relation Extraction using Pre-trained Language Models. 

John Giorgi, Xindi Wang, Nicola Sahar, Won Young Shin, Gary D. Bader, Bo Wang

Named entity recognition (NER) and relation extraction (RE) are two important tasks in information extraction and retrieval (IE \& IR). Recent work has demonstrated that it is beneficial to learn these tasks jointly, which avoids the propagation of error inherent in pipeline-based systems and improves performance. However, state-of-the-art joint models typically rely on external natural language processing (NLP) tools, such as dependency parsers, limiting their usefulness to domains (e.g. news) where those tools perform well. The few neural, end-to-end models that have been proposed are trained almost completely from scratch. In this paper, we propose a neural, end-to-end model for jointly extracting entities and their relations which does not rely on external NLP tools and which integrates a large, pre-trained language model. Because the bulk of our model's parameters are pre-trained and we eschew recurrence for self-attention, our model is fast to train. On 5 datasets across 3 domains, our model matches or exceeds state-of-the-art performance, sometimes by a large margin.


Why you may want to read this: Newest paper from Pawan Kumar (Assistant Professor).

Deep Attentive Ranking Networks for Learning to Order Sentences. 

Pawan Kumar, Dhanajit Brahma, Harish Karnick, Piyush Rai

We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning.


Why you may want to read this: Newest paper from Tie-Yan Liu (Assistant Managing Director, Microsoft Research Asia; IEEE Fellow, ACM Distinguished …).

A Study of Multilingual Neural Machine Translation. 

Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e.g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e.g., rich resource and low resource, one-to-many, and many-to-one translation). This paper concentrates on a deep understanding of multilingual NMT and conducts a comprehensive study on a multilingual dataset with more than 20 languages. Our results show that (1) low-resource language pairs benefit much from multilingual training, while rich-resource language pairs may get hurt under limited model capacity and training with similar languages benefits more than dissimilar languages; (2) fine-tuning performs better than training from scratch in the one-to-many setting while training from scratch performs better in the many-to-one setting; (3) the bottom layers of the encoder and top layers of the decoder capture more language-specific information, and just fine-tuning these parts can achieve good accuracy for low-resource language pairs; (4) direct translation is better than pivot translation when the source language is similar to the target language (e.g., in the same language branch), even when the size of direct training data is much smaller; (5) given a fixed training data budget, it is better to introduce more languages into multilingual training for zero-shot translation.


Why you may want to read this: Newest paper from Kenneth Loparo (Case Western Reserve University).

Knowledge-guided Text Structuring in Clinical Trials. 

Yingcheng Sun, Kenneth Loparo

Clinical trial records are variable resources or the analysis of patients and diseases. Information extraction from free text such as eligibility criteria and summary of results and conclusions in clinical trials would better support computer-based eligibility query formulation and electronic patient screening. Previous research has focused on extracting information from eligibility criteria, with usually a single pair of medical entity and attribute, but seldom considering other kinds of free text with multiple entities, attributes and relations that are more complex for parsing. In this paper, we propose a knowledge-guided text structuring framework with an automatically generated knowledge base as training corpus and word dependency relations as context information to transfer free text into formal, computer-interpretable representations. Experimental results show that our method can achieve overall high precision and recall, demonstrating the effectiveness and efficiency of the proposed method.


Why you may want to read this: Newest paper from Abhishek Kumar Singh (Masters student at IIIT Hyderabad).

Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization. 

Abhishek Kumar Singh, Manish Gupta, Vasudeva Varma

Automated multi-document extractive text summarization is a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute in some form the worthiness of a sentence to be included into the summary. While the conventional approaches rely on human crafted document-independent features to generate a summary, we develop a data-driven novel summary system called HNet, which exploits the various semantic and compositional aspects latent in a sentence to capture document independent features. The network learns sentence representation in a way that, salient sentences are closer in the vector space than non-salient sentences. This semantic and compositional feature vector is then concatenated with the document-dependent features for sentence ranking. Experiments on the DUC benchmark datasets (DUC-2001, DUC-2002 and DUC-2004) indicate that our model shows significant performance gain of around 1.5-2 points in terms of ROUGE score compared with the state-of-the-art baselines.


Why you may want to read this: Newest paper from Abhishek Kumar Singh (Masters student at IIIT Hyderabad).

Hybrid MemNet for Extractive Summarization. 

Abhishek Kumar Singh, Manish Gupta, Vasudeva Varma

Extractive text summarization has been an extensive research problem in the field of natural language understanding. While the conventional approaches rely mostly on manually compiled features to generate the summary, few attempts have been made in developing data-driven systems for extractive summarization. To this end, we present a fully data-driven end-to-end deep network which we call as Hybrid MemNet for single document summarization task. The network learns the continuous unified representation of a document before generating its summary. It jointly captures local and global sentential information along with the notion of summary worthy sentences. Experimental results on two different corpora confirm that our model shows significant performance gains compared with the state-of-the-art baselines.


Why you may want to read this: Newest paper from Sanja Fidler (University of Toronto, NVIDIA).

The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries. 

Atef Chaudhury, Makarand Tapaswi, Seung Wook Kim, Sanja Fidler

Understanding stories is a challenging reading comprehension problem for machines as it requires reading a large volume of text and following long-range dependencies. In this paper, we introduce the Shmoop Corpus: a dataset of 231 stories that are paired with detailed multi-paragraph summaries for each individual chapter (7,234 chapters), where the summary is chronologically aligned with respect to the story chapter. From the corpus, we construct a set of common NLP tasks, including Cloze-form question answering and a simplified form of abstractive summarization, as benchmarks for reading comprehension on stories. We then show that the chronological alignment provides a strong supervisory signal that learning-based methods can exploit leading to significant improvements on these tasks. We believe that the unique structure of this corpus provides an important foothold towards making machine story comprehension more approachable.


Why you may want to read this: Newest paper from Yoav Goldberg (Professor, Bar Ilan University).

oLMpics -- On what Language Model Pre-training Captures.

Alon Talmor, Yanai Elazar, Yoav Goldberg, Jonathan Berant

Recent success of pre-trained language models (LMs) has spurred widespread interest in the language capabilities that they possess. However, efforts to understand whether LM representations are useful for symbolic reasoning tasks have been limited and scattered. In this work, we propose eight reasoning tasks, which conceptually require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data. To address this, we propose an evaluation protocol that includes both zero-shot evaluation (no fine-tuning), as well as comparing the learning curve of a fine-tuned LM to the learning curve of multiple controls, which paints a rich picture of the LM capabilities. Our main findings are that: (a) different LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely; (b) LMs do not reason in an abstract manner and are context-dependent, e.g., while RoBERTa can compare ages, it can do so only when the ages are in the typical range of human ages; (c) On half of our reasoning tasks all models fail completely. Our findings and infrastructure can help future work on designing new datasets, models and objective functions for pre-training.

10 CV Papers You May Want to Read


Why you may want to read this: Newest paper from Pietro Perona (California Institute of Technology).

HMM-guided frame querying for bandwidth-constrained video search. 

Bhairav Chidambaram, Mason McGill, Pietro Perona

We design an agent to search for frames of interest in video stored on a remote server, under bandwidth constraints. Using a convolutional neural network to score individual frames and a hidden Markov model to propagate predictions across frames, our agent accurately identifies temporal regions of interest based on sparse, strategically sampled frames. On a subset of the ImageNet-VID dataset, we demonstrate that using a hidden Markov model to interpolate between frame scores allows requests of 98% of frames to be omitted, without compromising frame-of-interest classification accuracy.


Why you may want to read this: Newest paper from Leonidas Guibas (Professor of Computer Science, Stanford University).

Category-Level Articulated Object Pose Estimation. 

Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

This paper addresses the task of category-level pose estimation for articulated objects from a single depth image. We present a novel category-level approach that correctly accommodates object instances not previously seen during training. A key aspect of the work is the new Articulation-Aware Normalized Coordinate Space Hierarchy (A-NCSH), which represents the different articulated objects for a given object category. This approach not only provides the canonical representation of each rigid part, but also normalizes the joint parameters and joint states. We developed a deep network based on PointNet++ that is capable of predicting an A-NCSH representation for unseen object instances from single depth input. The predicted A-NCSH representation is then used for global pose optimization using kinematic constraints. We demonstrate that constraints associated with joints in the kinematic chain lead to improved performance in estimating pose and relative scale for each part of the object. We also demonstrate that the approach can tolerate cases of severe occlusion in the observed data. Project webpage


Why you may want to read this: Newest paper from Larry S. Davis (Professor of Computer Science, University of Maryland).

Recognizing Instagram Filtered Images with Feature De-stylization. 

Zhe Wu, Zuxuan Wu, Bharat Singh, Larry S. Davis

Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters. This paper presents a study on how popular pretrained models are affected by commonly used Instagram filters. To this end, we introduce ImageNet-Instagram, a filtered version of ImageNet, where 20 popular Instagram filters are applied to each image in ImageNet. Our analysis suggests that simple structure preserving filters which only alter the global appearance of an image can lead to large differences in the convolutional feature space. To improve generalization, we introduce a lightweight de-stylization module that predicts parameters used for scaling and shifting feature maps to "undo" the changes incurred by filters, inverting the process of style transfer tasks. We further demonstrate the module can be readily plugged into modern CNN architectures together with skip connections. We conduct extensive studies on ImageNet-Instagram, and show quantitatively and qualitatively, that the proposed module, among other things, can effectively improve generalization by simply learning normalization parameters without retraining the entire network, thus recovering the alterations in the feature space caused by the filters.


Why you may want to read this: Newest paper from Ming-Hsuan Yang (University of California at Merced), Shuicheng Yan (Yitu Tech, CTO;  National University of Singapore).

RC-DARTS: Resource Constrained Differentiable Architecture Search. 

Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng

Recent advances show that Neural Architectural Search (NAS) method is able to find state-of-the-art image classification deep architectures. In this paper, we consider the one-shot NAS problem for resource constrained applications. This problem is of great interest because it is critical to choose different architectures according to task complexity when the resource is constrained. Previous techniques are either too slow for one-shot learning or does not take the resource constraint into consideration. In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy. Specifically, we propose to formulate the RC-DARTS task as a constrained optimization problem by adding the resource constraint. An iterative projection method is proposed to solve the given constrained optimization problem. We also propose a multi-level search strategy to enable layers at different depths to adaptively learn different types of neural architectures. Through extensive experiments on the Cifar10 and ImageNet datasets, we show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity while achieving comparable or better performances than the state-of-the-art methods.


Why you may want to read this: Newest paper from Yi Yang (University of Technology Sydney), Shuicheng Yan (Yitu Tech, CTO;  National University of Singapore).

Very Long Natural Scenery Image Prediction by Outpainting. 

Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from


Why you may want to read this: Newest paper from Shuicheng Yan (Yitu Tech, CTO;  National University of Singapore).

Asymmetric GAN for Unpaired Image-to-image Translation. 

Yu Li, Sheng Tang, Rui Zhang, Yongdong Zhang, Jintao Li, Shuicheng Yan

Unpaired image-to-image translation problem aims to model the mapping from one domain to another with unpaired training data. Current works like the well-acknowledged Cycle GAN provide a general solution for any two domains through modeling injective mappings with a symmetric structure. While in situations where two domains are asymmetric in complexity, i.e., the amount of information between two domains is different, these approaches pose problems of poor generation quality, mapping ambiguity, and model sensitivity. To address these issues, we propose Asymmetric GAN (AsymGAN) to adapt the asymmetric domains by introducing an auxiliary variable (aux) to learn the extra information for transferring from the information-poor domain to the information-rich domain, which improves the performance of state-of-the-art approaches in the following ways. First, aux better balances the information between two domains which benefits the quality of generation. Second, the imbalance of information commonly leads to mapping ambiguity, where we are able to model one-to-many mappings by tuning aux, and furthermore, our aux is controllable. Third, the training of Cycle GAN can easily make the generator pair sensitive to small disturbances and variations while our model decouples the ill-conditioned relevance of generators by injecting aux during training. We verify the effectiveness of our proposed method both qualitatively and quantitatively on asymmetric situation, label-photo task, on Cityscapes and Helen datasets, and show many applications of asymmetric image translations. In conclusion, our AsymGAN provides a better solution for unpaired image-to-image translation in asymmetric domains.


Why you may want to read this: Newest paper from Yang Wang (Research Scientist, Siemens Corporate Research), Shuicheng Yan (Yitu Tech, CTO;  National University of Singapore), Meng Wang (Hefei University of Technology).

Convolutional Dictionary Pair Learning Network for Image Representation Learning. 

Zhao Zhang, Yulin Sun, Yang Wang, Zhengjun Zha, Shuicheng Yan, Meng Wang

Both Convolutional Neural Networks (CNN) and Dictionary Learning (DL) are powerful image representation learning sys-tems based on different mechanisms and principles, so whether we can integrate them to improve the performance is notewor-thy exploring. To address this issue, we propose a novel general-ized end-to-end representation learning architecture, dubbed Convolutional Dictionary Pair Learning Network (CDPL-Net) in this paper, which seamlessly integrates the learning schemes of CNN and dictionary pair learning into a unified framework. Generally, the architecture of CDPL-Net includes two convolu-tional/pooling layers and two dictionary pair learning (DPL) layers in the representation learning module. Besides, it uses two fully-connected layers as the multi-layer perception layer in the nonlinear classification module. In particular, the DPL layer can jointly formulate the discriminative synthesis and analysis representations driven by minimizing the batch based recon-struction error over the flatted feature maps from the convolu-tion/pooling layer. Moreover, DPL layer uses the l1-norm on the analysis dictionary so that sparse representation can be delivered, and the embedding process will also be robust to noise. To speed up the training process of DPL layer, the efficient stochastic gradient descent is used. Extensive simulations on public data-bases show that our CDPL-Net can deliver enhanced perfor-mance over other state-of-the-art methods.


Why you may want to read this: Newest paper from Ming-Hsuan Yang (University of California at Merced).

Controllable and Progressive Image Extrapolation. 

Yijun Li, Lu Jiang, Ming-Hsuan Yang

Image extrapolation aims at expanding the narrow field of view of a given image patch. Existing models mainly deal with natural scene images of homogeneous regions and have no control of the content generation process. In this work, we study conditional image extrapolation to synthesize new images guided by the input structured text. The text is represented as a graph to specify the objects and their spatial relation to the unknown regions of the image. Inspired by drawing techniques, we propose a progressive generative model of three stages, i.e., generating a coarse bounding-boxes layout, refining it to a finer segmentation layout, and mapping the layout to a realistic output. Such a multi-stage design is shown to facilitate the training process and generate more controllable results. We validate the effectiveness of the proposed method on the face and human clothing dataset in terms of visual results, quantitative evaluations and flexible controls.


Why you may want to read this: Newest paper from Samuel Madden (MIT).

RoadTagger: Robust Road Attribute Inference with Graph Neural Networks. 

Songtao He, Favyen Bastani, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Samuel Madden, Mohammad Amin Sadeghi

Inferring road attributes such as lane count and road type from satellite imagery is challenging. Often, due to the occlusion in satellite imagery and the spatial correlation of road attributes, a road attribute at one position on a road may only be apparent when considering far-away segments of the road. Thus, to robustly infer road attributes, the model must integrate scattered information and capture the spatial correlation of features along roads. Existing solutions that rely on image classifiers fail to capture this correlation, resulting in poor accuracy. We find this failure is caused by a fundamental limitation -- the limited effective receptive field of image classifiers. To overcome this limitation, we propose RoadTagger, an end-to-end architecture which combines both Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) to infer road attributes. The usage of graph neural networks allows information propagation on the road network graph and eliminates the receptive field limitation of image classifiers. We evaluate RoadTagger on both a large real-world dataset covering 688 km^2 area in 20 U.S. cities and a synthesized micro-dataset. In the evaluation, RoadTagger improves inference accuracy over the CNN image classifier based approaches. RoadTagger also demonstrates strong robustness against different disruptions in the satellite imagery and the ability to learn complicated inductive rules for aggregating scattered information along the road network.


Why you may want to read this: Newest paper from Joshua B. Tenenbaum (MIT).

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation. 

Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

A crucial aspect of mobile intelligent agents is their ability to integrate the evidence from multiple sensory inputs in an environment and plan a sequence of actions to achieve their goals. In this paper, we attempt to address the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data. To accomplish this task, the agent is required to learn from various modalities, i.e. relating the audio signal to the visual environment. Here we describe an approach to the audio-visual embodied navigation that can take advantage of both visual and audio pieces of evidence. Our solution is based on three key ideas: a visual perception mapper module that can construct its spatial memory of the environment, a sound perception module that infers the relative location of the sound source from the agent, and a dynamic path planner that plans a sequence of actions based on the visual-audio observations and the spatial memory of the environment, and then navigates towards the goal. Experimental results on a newly collected Visual-Audio-Room dataset using the simulated multi-modal environment demonstrate the effectiveness of our approach over several competitive baselines.

10 ML Papers You May Want to Read


Why you may want to read this: Newest paper from Leonidas Guibas (Professor of Computer Science, Stanford University), Jitendra Malik (Professor of EECS, UC Berkeley).

Side-Tuning: Network Adaptation via Additive Side Networks. 

Jeffrey O Zhang, Alexander Sax, Amir Zamir, Leonidas Guibas, Jitendra Malik

When training a neural network for a desired task, one may prefer to adapt a pre-trained network rather than start with a randomly initialized one -- due to lacking enough training data, performing lifelong learning where the system has to learn a new task while being previously trained for other tasks, or wishing to encode priors in the network via preset weights. The most commonly employed approaches for network adaptation are fine-tuning and using the pre-trained network as a fixed feature extractor, among others.

In this paper, we propose a straightforward alternative: Side-Tuning. Side-tuning adapts a pre-trained network by training a lightweight "side" network that is fused with the (unchanged) pre-trained network using summation. This simple method works as well as or better than existing solutions while it resolves some of the basic issues with fine-tuning, fixed features, and several other common baselines. In particular, side-tuning is less prone to overfitting when little training data is available, yields better results than using a fixed feature extractor, and does not suffer from catastrophic forgetting in lifelong learning. We demonstrate the performance of side-tuning under a diverse set of scenarios, including lifelong learning (iCIFAR, Taskonomy), reinforcement learning, imitation learning (visual navigation in Habitat), NLP question-answering (SQuAD v2), and single-task transfer learning (Taskonomy), with consistently promising results.


Why you may want to read this: Newest paper from Philip S. Yu (Professor of Computer Science, University of Illinons at Chicago).

Leveraging Semi-Supervised Learning for Fairness using Neural Networks. 

Vahid Noroozi, Sara Bahaadini, Samira Sheikhi, Nooshin Mojab, Philip S. Yu

There has been a growing concern about the fairness of decision-making systems based on machine learning. The shortage of labeled data has been always a challenging problem facing machine learning based systems. In such scenarios, semi-supervised learning has shown to be an effective way of exploiting unlabeled data to improve upon the performance of model. Notably, unlabeled data do not contain label information which itself can be a significant source of bias in training machine learning systems. This inspired us to tackle the challenge of fairness by formulating the problem in a semi-supervised framework. In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data to not just improve the performance but also improve the fairness of the decision-making process. The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.


Why you may want to read this: Newest paper from Philip S. Yu (Professor of Computer Science, University of Illinons at Chicago).

Deep Graph Similarity Learning: A Survey. 

Guixiang Ma, Nesreen K. Ahmed, Theodore L. Willke, Philip S. Yu

In many domains where data are represented as graphs, learning a similarity metric among graphs is considered a key problem, which can further facilitate various learning tasks, such as classification, clustering, and similarity search. Recently, there has been an increasing interest in deep graph similarity learning, where the key idea is to learn a deep learning model that maps input graphs to a target space such that the distance in the target space approximates the structural distance in the input space. Here, we provide a comprehensive review of the existing literature of deep graph similarity learning. We propose a systematic taxonomy for the methods and applications. Finally, we discuss the challenges and future directions for this problem.


Why you may want to read this: Newest paper from Bernhard Pfahringer (Professor of Computer Science, University of Waikato), Eibe Frank (Professor, Department of Computer Science, University of Waikato).

Classifier Chains: A Review and Perspectives. 

Jesse Read, Bernhard Pfahringer, Geoff Holmes, Eibe Frank

The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves linking together off-the-shelf binary classifiers in a chain structure, such that class label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of how exactly it works, and how it could be improved, and in the recent decade numerous studies have explored classifier chains mechanisms on a theoretical level, and many improvements have been made to the training and inference procedures, such that this method remains among the state-of-the-art options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining a number of areas for future research.


Why you may want to read this: Newest paper from Georgios B. Giannakis (Endowed Chair Prof., Dept. of ECE and DTC, University of Minnesota).

Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks. 

Zhaoxian Wu, Qing Ling, Tianyi Chen, Georgios B. Giannakis

This paper deals with distributed finite-sum optimization for learning over networks in the presence of malicious Byzantine attacks. To cope with such attacks, most resilient approaches so far combine stochastic gradient descent (SGD) with different robust aggregation rules. However, the sizeable SGD-induced stochastic gradient noise makes it challenging to distinguish malicious messages sent by the Byzantine attackers from noisy stochastic gradients sent by the 'honest' workers. This motivates us to reduce the variance of stochastic gradients as a means of robustifying SGD in the presence of Byzantine attacks. To this end, the present work puts forth a Byzantine attack resilient distributed (Byrd-) SAGA approach for learning tasks involving finite-sum optimization over networks. Rather than the mean employed by distributed SAGA, the novel Byrd- SAGA relies on the geometric median to aggregate the corrected stochastic gradients sent by the workers. When less than half of the workers are Byzantine attackers, the robustness of geometric median to outliers enables Byrd-SAGA to attain provably linear convergence to a neighborhood of the optimal solution, with the asymptotic learning error determined by the number of Byzantine workers. Numerical tests corroborate the robustness to various Byzantine attacks, as well as the merits of Byrd- SAGA over Byzantine attack resilient distributed SGD.


Why you may want to read this: Newest paper from Andrew McCallum (Distinguished Professor of Computer Science, University of Massachusetts Amherst).

Scalable Hierarchical Clustering with Tree Grafting. 

Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael Glass, Andrew McCallum

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.


Why you may want to read this: Newest paper from Leonidas Guibas (Professor of Computer Science, Stanford University).

Quaternion Equivariant Capsule Networks for 3D Point Clouds. 

Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari

We present a 3D capsule architecture for processing of point clouds that is equivariant with respect to the SO(3) rotation group, translation and permutation of the unordered input sets. The network operates on a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end equivariance through a novel 3D quaternion group capsule layer, including an equivariant dynamic routing procedure. The capsule layer enables us to disentangle geometry from pose, paving the way for more informative descriptions and a structured latent space. In the process, we theoretically connect the process of dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving \emph{iterative re-weighted least squares (IRLS)} problems with provable convergence properties, enabling robust pose estimation between capsule layers. Due to the sparse equivariant quaternion capsules, our architecture allows joint object classification and orientation estimation, which we validate empirically on common benchmark datasets.


Why you may want to read this: Newest paper from Yang Wang (Research Scientist, Siemens Corporate Research), Shuicheng Yan (Yitu Tech, CTO;  National University of Singapore), Meng Wang (Hefei University of Technology).

Learning Hybrid Representation by Robust Dictionary Learning in Factorized Compressed Space. 

Jiahuan Ren, Zhao Zhang, Sheng Li, Yang Wang, Guangcan Liu, Shuicheng Yan, Meng Wang

In this paper, we investigate the robust dictionary learning (DL) to discover the hybrid salient low-rank and sparse representation in a factorized compressed space. A Joint Robust Factorization and Projective Dictionary Learning (J-RFDL) model is presented. The setting of J-RFDL aims at improving the data representations by enhancing the robustness to outliers and noise in data, encoding the reconstruction error more accurately and obtaining hybrid salient coefficients with accurate reconstruction ability. Specifically, J-RFDL performs the robust representation by DL in a factorized compressed space to eliminate the negative effects of noise and outliers on the results, which can also make the DL process efficient. To make the encoding process robust to noise in data, J-RFDL clearly uses sparse L2, 1-norm that can potentially minimize the factorization and reconstruction errors jointly by forcing rows of the reconstruction errors to be zeros. To deliver salient coefficients with good structures to reconstruct given data well, J-RFDL imposes the joint low-rank and sparse constraints on the embedded coefficients with a synthesis dictionary. Based on the hybrid salient coefficients, we also extend J-RFDL for the joint classification and propose a discriminative J-RFDL model, which can improve the discriminating abilities of learnt coeffi-cients by minimizing the classification error jointly. Extensive experiments on public datasets demonstrate that our formulations can deliver superior performance over other state-of-the-art methods.


Why you may want to read this: Newest paper from Zhihua Zhang (Professor of Computer Science, Shanghai Jiao Tong University), Tong Zhang (HKUST).

Fast Generalized Matrix Regression with Applications in Machine Learning. 

Haishan Ye, Shusen Wang, Zhihua Zhang, Tong Zhang

Fast matrix algorithms have become the fundamental tools of machine learning in big data era.

The generalized matrix regression problem is widely used in the matrix approximation such as CUR decomposition, kernel matrix approximation, and stream singular value decomposition (SVD), etc.

In this paper, we propose a fast generalized matrix regression algorithm (Fast GMR) which utilizes sketching technique to solve the GMR problem efficiently.

Given error parameter 0<\epsilon<1, the Fast GMR algorithm can achieve a (1+\epsilon) relative error with the sketching sizes being of order \cO(\epsilon^{-1/2}) for a large group of GMR problems.

We apply the Fast GMR algorithm to the symmetric positive definite matrix approximation and single pass singular value decomposition and they achieve a better performance than conventional algorithms.

Our empirical study also validates the effectiveness and efficiency of our proposed algorithms.


Why you may want to read this: Newest paper from Pieter Abbeel (UC Berkeley | Covariant.AI).

Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards. 

Xingyu Lu, Stas Tiomkin, Pieter Abbeel

While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge. In this work, we propose an effective reward shaping method through predictive coding to tackle sparse reward problems. By learning predictive representations offline and using these representations for reward shaping, we gain access to reward signals that understand the structure and dynamics of the environment. In particular, our method achieves better learning by providing reward signals that 1) understand environment dynamics 2) emphasize on features most useful for learning 3) resist noise in learned representations through reward accumulation. We demonstrate the usefulness of this approach in different domains ranging from robotic manipulation to navigation, and we show that reward signals produced through predictive coding are as effective for learning as hand-crafted rewards.

ArXiv Weekly Radiostation
ArXiv Weekly Radiostation

Weekly selection and podcast of the latest CV,NLP, ML papers.