ArXiv Weekly Radiostation:本周NLP、CV、ML精选论文30篇(1.5-1.12)

机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,精选每周NLP、CV、ML领域各10篇重要论文,本周详情如下:

10 NLP Papers You May Want to Read


Why you may want to read this: Newest paper from Hongming Zhang (Technology Expert,

Multiplex Word Embeddings for Selectional Preference Acquisition. 

Hongming Zhang, Jiaxin Bai, Yan Song, Kun Xu, Changlong Yu, Yangqiu Song, Wilfred Ng, Dong Yu

Conventional word embeddings represent words with fixed vectors, which are usually trained based on co-occurrence patterns among words. In doing so, however, the power of such representations is limited, where the same word might be functionalized separately under different syntactic relations. To address this limitation, one solution is to incorporate relational dependencies of different words into their embeddings. Therefore, in this paper, we propose a multiplex word embedding model, which can be easily extended according to various relations among words. As a result, each word has a center embedding to represent its overall semantics, and several relational embeddings to represent its relational dependencies. Compared to existing models, our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness. Moreover, to accommodate various relations, we use a small dimension for relational embeddings and our model is able to keep their effectiveness. Experiments on selectional preference acquisition and word similarity demonstrate the effectiveness of the proposed model, and a further study of scalability also proves that our embeddings only need 1/20 of the original embedding size to achieve better performance.


Why you may want to read this: Newest paper from Michalis Faloutsos (UC Riverside).

REST: A thread embedding approach for identifying and classifying user-specified information in security forums. 

Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos

How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the "topics" of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest.


Why you may want to read this: Newest paper from Yang Liu (Computer science, Harbin institute of technology).

A Neural Approach to Discourse Relation Signal Detection. 

Amir Zeldes, Yang Liu

Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, {\Delta}s (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.


Why you may want to read this: Newest paper from Chin-Yew Lin (Principal Research Manager of Knowledge Computing Group, Microsoft Research Asia).

Improving Entity Linking by Modeling Latent Entity Type Information. 

Shuang Chen, Jinpeng Wang, Feng Jiang, Chin-Yew Lin

Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.


Why you may want to read this: Newest paper from Kai-Wei Chang (Assistant professor, UCLA).

Learning Bilingual Word Embeddings Using Lexical Definitions. 

Weijia Shi, Muhao Chen, Yingtao Tian, Kai-Wei Chang

Bilingual word embeddings, which representlexicons of different languages in a shared em-bedding space, are essential for supporting se-mantic and knowledge transfers in a variety ofcross-lingual NLP tasks. Existing approachesto training bilingual word embeddings requireoften require pre-defined seed lexicons that areexpensive to obtain, or parallel sentences thatcomprise coarse and noisy alignment. In con-trast, we propose BilLex that leverages pub-licly available lexical definitions for bilingualword embedding learning. Without the needof predefined seed lexicons, BilLex comprisesa novel word pairing strategy to automati-cally identify and propagate the precise fine-grained word alignment from lexical defini-tions. We evaluate BilLex in word-level andsentence-level translation tasks, which seek tofind the cross-lingual counterparts of wordsand sentences respectively.BilLex signifi-cantly outperforms previous embedding meth-ods on both tasks.


Why you may want to read this: Newest paper from Iryna Gurevych (Professor of Computer Science, Technische Universität Darmstadt).

Stance Detection Benchmark: How Robust Is Your Stance Detection? 

Benjamin Schiller, Johannes Daxenberger, Iryna Gurevych

Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim and has become a key component in applications like fake news detection, claim validation, and argument search. However, while stance is easily detected by humans, machine learning models are clearly falling short of this task. Given the major differences in dataset sizes and framing of StD (e.g. number of classes and inputs), we introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning (MDL) setting, as well as from related tasks via transfer learning. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets. Yet, the models still perform well below human capabilities and even simple adversarial attacks severely hurt the performance of MDL models. Deeper investigation into this phenomenon suggests the existence of biases inherited from multiple datasets by design. Our analysis emphasizes the need of focus on robustness and de-biasing strategies in multi-task learning approaches. The benchmark dataset and code is made available.


Why you may want to read this: Newest paper from Zhuang Liu (PhD student at UC Berkeley).

Leveraging Prior Knowledge for Protein-Protein Interaction Extraction with Memory Network. 

Huiwei Zhou, Zhuang Liu, Shixian Ning, Yunlong Yang, Chengkun Lang, Yingyu Lin, Kun Ma

Automatically extracting Protein-Protein Interactions (PPI) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) for PPI extraction, which leverages prior knowledge about protein-protein pairs with memory networks. The proposed MNM captures important context clues related to knowledge representations learned from knowledge bases. Both entity embeddings and relation embeddings of prior knowledge are effective in improving the PPI extraction model, leading to a new state-of-the-art performance on the BioCreative VI PPI dataset. The paper also shows that multiple computational layers over an external memory are superior to long short-term memory networks with the local memories.


Why you may want to read this: Newest paper from Marie-Francine Moens (Professor of Computer Science KU Leuven).

Binary and Multitask Classification Model for Dutch Anaphora Resolution: Die/Dat Prediction. 

Liesbeth Allein, Artuur Leeuwenberg, Marie-Francine Moens

The correct use of Dutch pronouns 'die' and 'dat' is a stumbling block for both native and non-native speakers of Dutch due to the multiplicity of syntactic functions and the dependency on the antecedent's gender and number. Drawing on previous research conducted on neural context-dependent dt-mistake correction models (Heyman et al. 2018), this study constructs the first neural network model for Dutch demonstrative and relative pronoun resolution that specifically focuses on the correction and part-of-speech prediction of these two pronouns. Two separate datasets are built with sentences obtained from, respectively, the Dutch Europarl corpus (Koehn 2015) - which contains the proceedings of the European Parliament from 1996 to the present - and the SoNaR corpus (Oostdijk et al. 2013) - which contains Dutch texts from a variety of domains such as newspapers, blogs and legal texts. Firstly, a binary classification model solely predicts the correct 'die' or 'dat'. The classifier with a bidirectional long short-term memory architecture achieves 84.56% accuracy. Secondly, a multitask classification model simultaneously predicts the correct 'die' or 'dat' and its part-of-speech tag. The model containing a combination of a sentence and context encoder with both a bidirectional long short-term memory architecture results in 88.63% accuracy for die/dat prediction and 87.73% accuracy for part-of-speech prediction. More evenly-balanced data, larger word embeddings, an extra bidirectional long short-term memory layer and integrated part-of-speech knowledge positively affects die/dat prediction performance, while a context encoder architecture raises part-of-speech prediction performance. This study shows promising results and can serve as a starting point for future research on machine learning models for Dutch anaphora resolution.


Why you may want to read this: Newest paper from John P. Cunningham (Associate Professor, Columbia University).

Paraphrase Generation with Latent Bag of Words. 

Yao Fu, Yansong Feng, John P. Cunningham

Paraphrase generation is a longstanding important problem in natural language processing.In addition, recent progress in deep generative models has shown promising results on discrete latent variables for text generation.

Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation.We ground the semantics of a discrete latent variable by the BOW from the target sentences.We use this latent variable to build a fully differentiable content planning and surface realization model.Specifically, we use source words to predict their neighbors and model the target BOW with a mixture of softmax.We use Gumbel top-k reparameterization to perform differentiable subset sampling from the predicted BOW distribution.We retrieve the sampled word embeddings and use them to augment the decoder and guide its generation search space.

Our latent BOW model not only enhances the decoder, but also exhibits clear interpretability.

We show the model interpretability with regard to \emph{(i)} unsupervised learning of word neighbors \emph{(ii)} the step-by-step generation procedure.

Extensive experiments demonstrate the transparent and effective generation process of this model.\footnote{Our code can be found at \url{}}


Why you may want to read this: Newest paper from Asim Karim (Professor, Lahore University of Management Sciences (LUMS)).

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text. 

Muhammad Haroon Shakeel, Asim Karim

Nowadays, an abundance of short text is being generated that uses nonstandard writing styles influenced by regional languages. Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification. In this work, we (1) present a labeled dataset called MultiSenti for sentiment classification of code-switched informal short text, (2) explore the feasibility of adapting resources from a resource-rich language for an informal one, and (3) propose a deep learning-based model for sentiment classification of code-switched informal short text. We aim to achieve this without any lexical normalization, language translation, or code-switching indication. The performance of the proposed models is compared with three existing multilingual sentiment classification models. The results show that the proposed model performs better in general and adapting character-based embeddings yield equivalent performance while being computationally more efficient than training word-based domain-specific embeddings.

ArXiv Weekly: 10 CV Papers You May Want to Read


Why you may want to read this: Newest paper from Thomas S. Huang (University of Illinois, Urbana-Champaign).

Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis. 

Mang Tik Chiu, Xingqian Xu, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S. Huang, Honghui Shi

The success of deep learning in visual recognition tasks has driven advancements in multiple fields of research. Particularly, increasing attention has been drawn towards its application in agriculture. Nevertheless, while visual pattern recognition on farmlands carries enormous economic values, little progress has been made to merge computer vision and crop sciences due to the lack of suitable agricultural image datasets. Meanwhile, problems in agriculture also pose new challenges in computer vision. For example, semantic segmentation of aerial farmland images requires inference over extremely large-size images with extreme annotation sparsity. These challenges are not present in most of the common object datasets, and we show that they are more challenging than many other aerial image datasets. To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns. We collected 94,986 high-quality aerial images from 3,432 farmlands across the US, where each image consists of RGB and Near-infrared (NIR) channels with resolution as high as 10 cm per pixel. We annotate nine types of field anomaly patterns that are most important to farmers. As a pilot study of aerial agricultural semantic segmentation, we perform comprehensive experiments using popular semantic segmentation models; we also propose an effective model designed for aerial agricultural pattern recognition. Our experiments demonstrate several challenges Agriculture-Vision poses to both the computer vision and agriculture communities. Future versions of this dataset will include even more aerial images, anomaly patterns and image channels. More information at


Why you may want to read this: Newest paper from Shih-Fu Chang (Professor of Electrical Engineering and Computer Science, Columbia University).

Weakly Supervised Visual Semantic Parsing. 

Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene Graph Generation (SGG) aims to extract entities, predicates and their intrinsic structure from images, leading to a deep understanding of visual content, with many potential applications such as visual reasoning and image retrieval. Nevertheless, computer vision is still far from a practical solution for this task. Existing SGG methods require millions of manually annotated bounding boxes for scene graph entities in a large set of images. Moreover, they are computationally inefficient, as they exhaustively process all pairs of object proposals to predict their relationships. In this paper, we address those two limitations by first proposing a generalized formulation of SGG, namely Visual Semantic Parsing, which disentangles entity and predicate prediction, and enables sub-quadratic performance. Then we propose the Visual Semantic Parsing Network, \textsc{VSPNet}, based on a novel three-stage message propagation network, as well as a role-driven attention mechanism to route messages efficiently without a quadratic cost. Finally, we propose the first graph-based weakly supervised learning framework based on a novel graph alignment algorithm, which enables training without bounding box annotations. Through extensive experiments on the Visual Genome dataset, we show \textsc{VSPNet} outperforms weakly supervised baselines significantly and approaches fully supervised performance, while being five times faster.


Why you may want to read this: Newest paper from Shih-Fu Chang (Professor of Electrical Engineering and Computer Science, Columbia University).

Bridging Knowledge Graphs to Generate Scene Graphs. 

Alireza Zareian, Svebor Karaman, Shih-Fu Chang

Scene graphs are powerful representations that encode images into their abstract semantic elements, i.e, objects and their interactions, which facilitates visual comprehension and explainable reasoning. On the other hand, commonsense knowledge graphs are rich repositories that encode how the world is structured, and how general concepts interact. In this paper, we present a unified formulation of these two constructs, where a scene graph is seen as an image-conditioned instantiation of a commonsense knowledge graph. Based on this new perspective, we re-formulate scene graph generation as the inference of a bridge between the scene and commonsense graphs, where each entity or predicate instance in the scene graph has to be linked to its corresponding entity or predicate class in the commonsense graph. To this end, we propose a heterogeneous graph inference framework allowing to exploit the rich structure within the scene and commonsense at the same time. Through extensive experiments, we show the proposed method achieves significant improvement over the state of the art.


Why you may want to read this: Newest paper from Shih-Fu Chang (Professor of Electrical Engineering and Computer Science, Columbia University).

General Partial Label Learning via Dual Bipartite Graph Autoencoder. 

Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

We formulate a practical yet challenging problem: General Partial Label Learning (GPLL). Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level --- a label set partially labels an instance --- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed --- instances in a group may be partially linked to the label set from another group. Such ambiguous group-level supervision is more practical in real-world scenarios as additional annotation on the instance-level is no longer required, e.g., face-naming in videos where the group consists of faces in a frame, labeled by a name set in the corresponding caption. In this paper, we propose a novel graph convolutional network (GCN) called Dual Bipartite Graph Autoencoder (DB-GAE) to tackle the label ambiguity challenge of GPLL. First, we exploit the cross-group correlations to represent the instance groups as dual bipartite graphs: within-group and cross-group, which reciprocally complements each other to resolve the linking ambiguities. Second, we design a GCN autoencoder to encode and decode them, where the decodings are considered as the refined results. It is worth noting that DB-GAE is self-supervised and transductive, as it only uses the group-level supervision without a separate offline training stage. Extensive experiments on two real-world datasets demonstrate that DB-GAE significantly outperforms the best baseline over absolute 0.159 F1-score and 24.8% accuracy. We further offer analysis on various levels of label ambiguities.


Why you may want to read this: Newest paper from Luca Benini (ETH Zürich, Università di Bologna).

RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks. 

Lukas Cavigelli, Luca Benini

We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values. Starting from a pre-trained model, we quantize the weights and then relax random partitions of them to their continuous values for retraining before re-quantizing them and switching to another weight partition for further adaptation. We demonstrate binary and ternary-weight networks with accuracies beyond the state-of-the-art for GoogLeNet and competitive performance for ResNet-18 and ResNet-50 using an SGD-based training method that can easily be integrated into existing frameworks.


Why you may want to read this: Newest paper from Michael J. Black (Max Planck Institute for Intelligent Systems and Amazon), Konrad Schindler (Professor of Photogrammetry and Remote Sensing, ETH Zurich).

Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations. 

Nadine Rueegg, Christoph Lassner, Michael J. Black, Konrad Schindler

The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.


Why you may want to read this: Newest paper from Philip H. S. Torr (Professor, University of Oxford).

Few-shot Learning with Multi-scale Self-supervision. 

Hongguang Zhang, Philip H. S. Torr, Piotr Koniusz

Learning concepts from the limited number of datapoints is a challenging task usually addressed by the so-called one- or few-shot learning. Recently, an application of second-order pooling in few-shot learning demonstrated its superior performance due to the aggregation step handling varying image resolutions without the need of modifying CNNs to fit to specific image sizes, yet capturing highly descriptive co-occurrences. However, using a single resolution per image (even if the resolution varies across a dataset) is suboptimal as the importance of image contents varies across the coarse-to-fine levels depending on the object and its class label e. g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized texture patterns. Multi-scale representations are popular in image deblurring, super-resolution and image recognition but they have not been investigated in few-shot learning due to its relational nature complicating the use of standard techniques. In this paper, we propose a novel multi-scale relation network based on the properties of second-order pooling to estimate image relations in few-shot setting. To optimize the model, we leverage a scale selector to re-weight scale-wise representations based on their second-order features. Furthermore, we propose to a apply self-supervised scale prediction. Specifically, we leverage an extra discriminator to predict the scale labels and the scale discrepancy between pairs of images. Our model achieves state-of-the-art results on standard few-shot learning datasets.


Why you may want to read this: Newest paper from Krystian Mikolajczyk (Associate Professor, Imperial College London).

Compression of convolutional neural networks for high performance imagematching tasks on mobile devices. 

Roy Miles, Krystian Mikolajczyk

Deep neural networks have demonstrated state-of-the-art performance for feature-based image matching through the advent of new large and diverse datasets. However, there has been little work on evaluating the computational cost, model size, and matching accuracy tradeoffs for these models. This paper explicitly addresses these practical constraints by considering state-of-the-art L2Net architecture. We observe a significant redundancy in the L2Net architecture, which we exploit through the use of depthwise separable layers and an efficient Tucker decomposition. We demonstrate that a combination of these methods is more effective, but still sacrifices the top-end accuracy. We therefore propose the Convolution-Depthwise-Pointwise (CDP) layer, which provides a means of interpolating between the standard and depthwise separable convolutions. With this proposed layer, we are able to achieve up to 8 times reduction in the number of parameters on the L2Net architecture, 13 times reduction in the computational complexity, while sacrificing less than 1% on the overall accuracy across the HPatches benchmarks. To further demonstrate the generalisation of this approach, we apply it to the SuperPoint model. We show that CDP layers improve upon the accuracy while using significantly less parameters and floating-point operations for inference.


Why you may want to read this: Newest paper from Yang Gao (Beijing Institute of Technology).

MW-GAN: Multi-Warping GAN for Caricature Generation with Multi-Style Geometric Exaggeration. 

Haodi Hou, Jing Huo, Jing Wu, Yu-Kun Lai, Yang Gao

Given an input face photo, the goal of caricature generation is to produce stylized, exaggerated caricatures that share the same identity as the photo. It requires simultaneous style transfer and shape exaggeration with rich diversity, and meanwhile preserving the identity of the input. To address this challenging problem, we propose a novel framework called Multi-Warping GAN (MW-GAN), including a style network and a geometric network that are designed to conduct style transfer and geometric exaggeration respectively. We bridge the gap between the style and landmarks of an image with corresponding latent code spaces by a dual way design, so as to generate caricatures with arbitrary styles and geometric exaggeration, which can be specified either through random sampling of latent code or from a given caricature sample. Besides, we apply identity preserving loss to both image space and landmark space, leading to a great improvement in quality of generated caricatures. Experiments show that caricatures generated by MW-GAN have better quality than existing methods.


Why you may want to read this: Newest paper from Zheng Wang (Lakehead University).

FrequentNet : A New Deep Learning Baseline for Image Classification. 

Yifei Li, Zheng Wang, Kuangyan Song, Yiming Sun

In this paper, we generalize the idea from the method called "PCANet" to achieve a new baseline deep learning model for image classification. Instead of using principal component vectors as the filter vector in "PCANet", we use basis vectors in discrete Fourier analysis and wavelets analysis as our filter vectors. Both of them achieve comparable performance to "PCANet" in benchmark datasets. It is noticeable that our algorithms do not require any optimization techniques to get those basis.

ArXiv Weekly: 10 ML Papers You May Want to Read


Why you may want to read this: Newest paper from Vipin Kumar (University of Minnesota).

Semi-supervised Classification using Attention-based Regularization on Coarse-resolution Data. 

Guruprasad Nayak, Rahul Ghosh, Xiaowei Jia, Varun Mithal, Vipin Kumar

Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification algorithms that leverage supervision from coarser resolutions to help train models on finer resolutions. The different resolutions are modeled as different views of the data in a multi-view framework that exploits the complementarity of features across different views to improve models on both views. Unlike traditional multi-view learning problems, the key challenge in our case is that there is no one-to-one correspondence between instances across different views in our case, which requires explicit modeling of the correspondence of instances across resolutions. We propose to use the features of instances at different resolutions to learn the correspondence between instances across resolutions using an attention mechanism.Experiments on the real-world application of mapping urban areas using satellite observations and sentiment classification on text data show the effectiveness of the proposed methods.


Why you may want to read this: Newest paper from Ruslan Salakhutdinov (Associate Professor, Machine Learning Department, CMU), Louis-Philippe Morency (Associate professor,  Carnegie Mellon University).

Think Locally, Act Globally: Federated Learning with Local and Global Representations. 

Paul Pu Liang, Terrance Liu, Liu Ziyin, Ruslan Salakhutdinov, Louis-Philippe Morency

Federated learning is an emerging research paradigm to train models on private data distributed over multiple devices. A key challenge involves keeping private all the data on each device and training a global model only by communicating parameters and updates. Overcoming this problem relies on the global model being sufficiently compact so that the parameters can be efficiently sent over communication channels such as wireless internet. Given the recent trend towards building deeper and larger neural networks, deploying such models in federated settings on real-world tasks is becoming increasingly difficult. To this end, we propose to augment federated learning with local representation learning on each device to learn useful and compact features from raw data. As a result, the global model can be smaller since it only operates on higher-level local representations. We show that our proposed method achieves superior or competitive results when compared to traditional federated approaches on a suite of publicly available real-world datasets spanning image recognition (MNIST, CIFAR) and multimodal learning (VQA). Our choice of local representation learning also reduces the number of parameters and updates that need to be communicated to and from the global model, thereby reducing the bottleneck in terms of communication cost. Finally, we show that our local models provide flexibility in dealing with online heterogeneous data and can be easily modified to learn fair representations that obfuscate protected attributes such as race, age, and gender, a feature crucial to preserving the privacy of on-device data.


Why you may want to read this: Newest paper from Dieter Fox (Professor of Computer Science and Engineering, University of Washington, and Nvidia …).

Information Theoretic Model Predictive Q-Learning. 

Mohak Bhardwaj, Ankur Handa, Dieter Fox, Byron Boots

Model-free Reinforcement Learning (RL) algorithms work well in sequential decision-making problems when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both of these assumptions can be violated in real world problems such as robotics, where querying the system can be prohibitively expensive and real-world dynamics can be difficult to model accurately. Although sim-to-real approaches such as domain randomization attempt to mitigate the effects of biased simulation,they can still suffer from optimization challenges such as local minima and hand-designed distributions for randomization, making it difficult to learn an accurate global value function or policy that directly transfers to the real world. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real-robots in a systematic manner.


Why you may want to read this: Newest paper from Daniel H. Mathalon (Professor of Psychiatry, University of California San Francisco), Judith M. Ford (UCSF, Stanford,Yale), Jessica A. Turner (Professor of Psychology and Neuroscience, Georgia State University), Vince D. Calhoun (Director-Translational Research in Neuroimaging and Data Science (TReNDS;GSU/GAtech …).

Meta-modal Information Flow: A Method for Capturing Multimodal Modular Disconnectivity in Schizophrenia. 

Haleh Falakshahi, Victor M. Vergara, Jingyu Liu, Daniel H. Mathalon, Judith M. Ford, James Voyvodic, Bryon A. Mueller, Aysenil Belger, Sarah McEwen, Steven G. Potkin, Adrian Preda, Hooman Rokham, Jing Sui, Jessica A. Turner, Sergey Plis, Vince D. Calhoun

Objective: Multimodal measurements of the same phenomena provide complementary information and highlight different perspectives, albeit each with their own limitations. A focus on a single modality may lead to incorrect inferences, which is especially important when a studied phenomenon is a disease. In this paper, we introduce a method that takes advantage of multimodal data in addressing the hypotheses of disconnectivity and dysfunction within schizophrenia (SZ). Methods: We start with estimating and visualizing links within and among extracted multimodal data features using a Gaussian graphical model (GGM). We then propose a modularity-based method that can be applied to the GGM to identify links that are associated with mental illness across a multimodal data set. Through simulation and real data, we show our approach reveals important information about disease-related network disruptions that are missed with a focus on a single modality. We use functional MRI (fMRI), diffusion MRI (dMRI), and structural MRI (sMRI) to compute the fractional amplitude of low frequency fluctuations (fALFF), fractional anisotropy (FA), and gray matter (GM) concentration maps. These three modalities are analyzed using our modularity method. Results: Our results show missing links that are only captured by the cross-modal information that may play an important role in disconnectivity between the components. Conclusion: We identified multimodal (fALFF, FA and GM) disconnectivity in the default mode network area in patients with SZ, which would not have been detectable in a single modality. Significance: The proposed approach provides an important new tool for capturing information that is distributed among multiple imaging modalities.


Why you may want to read this: Newest paper from Ming Li (University Professor, University of Waterloo).

Deep Time-Stream Framework for Click-Through Rate Prediction by Tracking Interest Evolution. 

Shu-Ting Shi, Wenhao Zheng, Jun Tang, Qing-Guo Chen, Yao Hu, Jianke Zhu, Ming Li

Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation. Recently, deep learning models have been proposed to learn the representation of users' overall interests, while ignoring the fact that interests may dynamically change over time. We argue that it is necessary to consider the continuous-time information in CTR models to track user interest trend from rich historical behaviors. In this paper, we propose a novel Deep Time-Stream framework (DTS) which introduces the time information by an ordinary differential equations (ODE). DTS continuously models the evolution of interests using a neural network, and thus is able to tackle the challenge of dynamically representing users' interests based on their historical behaviors. In addition, our framework can be seamlessly applied to any existing deep CTR models by leveraging the additional Time-Stream Module, while no changes are made to the original CTR models. Experiments on public dataset as well as real industry dataset with billions of samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with existing methods.


Why you may want to read this: Newest paper from Ge Wang (Rensselaer Polytechnic Institute).

On Interpretability of Artificial Neural Networks. 

Fenglei Fan, Jinjun Xiong, Ge Wang

Deep learning has achieved great successes in many important areas to dealing with text, images, video, graphs, and so on. However, the black-box nature of deep artificial neural networks has become the primary obstacle to their public acceptance and wide popularity in critical applications such as diagnosis and therapy. Due to the huge potential of deep learning, interpreting neural networks has become one of the most critical research directions. In this paper, we systematically review recent studies in understanding the mechanism of neural networks and shed light on some future directions of interpretability research (This work is still in progress)


Why you may want to read this: Newest paper from Laurens van der Maaten (Research Scientist, Facebook AI Research), Kilian Q. Weinberger (Associate Professor of Computer Science, Cornell University, ASAPP Research).

Convolutional Networks with Dense Connectivity. 

Gao Huang, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten, Kilian Q. Weinberger

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion.Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance.


Why you may want to read this: Newest paper from Chin-Teng Lin (University of Technology Sydney).

Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction. 

Zhenhua Shi, Dongrui Wu, Yu-Kai Wang, Chin-Teng Lin

Dimensionality reduction is an important operation in information visualization, feature extraction, clustering, regression, and classification, especially for processing noisy high dimensional data. However, most existing approaches preserve either the global or the local structure of the data, but not both. Approaches that preserve only the global data structure, such as principal component analysis (PCA), are usually sensitive to outliers. Approaches that preserve only the local data structure, such as locality preserving projections, are usually unsupervised (and hence cannot use label information) and uses a fixed similarity graph. We propose a novel linear dimensionality reduction approach, supervised discriminative sparse PCA with adaptive neighbors (SDSPCAAN), to integrate neighborhood-free supervised discriminative sparse PCA and projected clustering with adaptive neighbors. As a result, both global and local data structures, as well as the label information, are used for better dimensionality reduction. Classification experiments on nine high-dimensional datasets validated the effectiveness and robustness of our proposed SDSPCAAN.


Why you may want to read this: Newest paper from Chin-Teng Lin (University of Technology Sydney).

EEG-based Drowsiness Estimation for Driving Safety using Deep Q-Learning. 

Yurui Ming, Dongrui Wu, Yu-Kai Wang, Yuhui Shi, Chin-Teng Lin

Fatigue is the most vital factor of road fatalities and one manifestation of fatigue during driving is drowsiness. In this paper, we propose using deep Q-learning to analyze an electroencephalogram (EEG) dataset captured during a simulated endurance driving test. By measuring the correlation between drowsiness and driving performance, this experiment represents an important brain-computer interface (BCI) paradigm especially from an application perspective. We adapt the terminologies in the driving test to fit the reinforcement learning framework, thus formulate the drowsiness estimation problem as an optimization of a Q-learning task. By referring to the latest deep Q-Learning technologies and attending to the characteristics of EEG data, we tailor a deep Q-network for action proposition that can indirectly estimate drowsiness. Our results show that the trained model can trace the variations of mind state in a satisfactory way against the testing EEG data, which demonstrates the feasibility and practicability of this new computation paradigm. We also show that our method outperforms the supervised learning counterpart and is superior for real applications. To the best of our knowledge, we are the first to introduce the deep reinforcement learning method to this BCI scenario, and our method can be potentially generalized to other BCI cases.


Why you may want to read this: Newest paper from Sebastien Ourselin (Professor of Healthcare Engineering, King's College London).

SGD with Hardness Weighted Sampling for Distributionally Robust Deep Learning. 

Lucas Fidon, Sebastien Ourselin, Tom Vercauteren

Distributionally Robust Optimization (DRO) has been proposed as an alternative to Empirical Risk Minimization (ERM) in order to account for potential biases in the training data distribution. However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread Stochastic Gradient Descent (SGD) based optimizers for deep learning with ERM. We propose SGD with Hardness weighted sampling, an efficient optimization method for machine learning with DRO with a focus on deep learning. In this work, we propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning. We show that our optimization method can be interpreted as a principled Hard Example Mining strategy. Similar to an online hard example mining strategy in essence and in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning. It only requires adding a softmax layer and maintaining a history of the loss values for each training example to compute adaptive sampling probabilities. In contrast to typical ad hoc hard mining approaches, and exploiting recent theoretical results in deep learning optimization, we We also prove the convergence of our DRO algorithm for over-parameterized deep learning networks with ReLU activation and finite number of layers and parameters. Preliminary results demonstrate the feasibility and usefulness of our approach.


ArXiv Weekly Radiostation
ArXiv Weekly Radiostation

Weekly selection and podcast of the latest CV,NLP, ML papers.