ArXiv Weekly Radiostation:本周NLP、CV、ML精选论文30篇(2.3-2.9)

机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,精选每周NLP、CV、ML领域各10篇重要论文,本周详情如下:

ArXiv Weekly: 10 NLP Papers You May Want to Read

NLP paper 1/10

Why you may want to read this: Newest paper from Erik Cambria (Associate Professor, Nanyang Technological University, Singapore), Philip S. Yu (Professor of Computer Science, University of Illinons at Chicago).

A Survey on Knowledge Graphs: Representation, Acquisition and Applications. 

Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, Philip S. Yu

Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review on knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference and logical rule reasoning are reviewed. We further explore several emerging topics including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions.

NLP paper 2/10

Why you may want to read this: Newest paper from Noah D. Goodman (Stanford University), Adele E. Goldberg (Professor of Psychology, Princeton University), Thomas L. Griffiths (Professor of Psychology and Computer Science, Princeton University).

Generalizing meanings from partners to populations: Hierarchical inference supports convention formation on networks. 

Robert D. Hawkins, Noah D. Goodman, Adele E. Goldberg, Thomas L. Griffiths

A key property of linguistic conventions is that they hold over an entire community of speakers, allowing us to communicate efficiently even with people we have never met before. At the same time, much of our language use is partner-specific: we know that words may be understood differently by different people based on local common ground. This poses a challenge for accounts of convention formation. Exactly how do agents make the inferential leap to community-wide expectations while maintaining partner-specific knowledge? We propose a hierarchical Bayesian model of convention to explain how speakers and listeners abstract away meanings that seem to be shared across partners. To evaluate our model's predictions, we conducted an experiment where participants played an extended natural-language communication game with different partners in a small community. We examine several measures of generalization across partners, and find key signatures of local adaptation as well as collective convergence. These results suggest that local partner-specific learning is not only compatible with global convention formation but may facilitate it when coupled with a powerful hierarchical inductive mechanism.

NLP paper 3/10

Why you may want to read this: Newest paper from Minlie Huang (computer science, Tsinghua University).

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation. 

Fei Huang, Dazhen Wan, Zhihong Shao, Pei Ke, Jian Guan, Yilin Niu, Xiaoyan Zhu, Minlie Huang

In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to support fast development and fair evaluation of text generation. In model development, CoTK helps handle the cumbersome issues, such as data processing, metric implementation, and reproduction. It standardizes the development steps and reduces human errors which may lead to inconsistent experimental settings. In model evaluation, CoTK provides implementation for many commonly used metrics and benchmark models across different experimental settings. As a unique feature, CoTK can signify when and which metric cannot be fairly compared. We demonstrate that it is convenient to use CoTK for model development and evaluation, particularly across different experimental settings.

NLP paper 4/10

Why you may want to read this: Newest paper from Maarten de Rijke (University of Amsterdam), Ryen W. White (Partner Research Manager, Microsoft).

Conversations with Documents. An Exploration of Document-Centered Assistance. 

Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, Ryen W. White

The role of conversational assistants has become more prevalent in helping people increase their productivity. Document-centered assistance, for example to help an individual quickly review a document, has seen less significant progress, even though it has the potential to tremendously increase a user's productivity. This type of document-centered assistance is the focus of this paper. Our contributions are three-fold: (1) We first present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario. (2) We investigate the types of queries that users will pose while seeking assistance with documents, and show that document-centered questions form the majority of these queries. (3) We present a set of initial machine learned models that show that (a) we can accurately detect document-centered questions, and (b) we can build reasonably accurate models for answering such questions. These positive results are encouraging, and suggest that even greater results may be attained with continued study of this interesting and novel problem space. Our findings have implications for the design of intelligent systems to support task completion via natural interactions with documents.

NLP paper 5/10

Why you may want to read this: Newest paper from Hongyuan Zha (College of Computing, Georgia Institute of Technology).

Improving Domain-Adapted Sentiment Classification by Deep Adversarial Mutual Learning. 

Qianming Xue, Wei Zhang, Hongyuan Zha

Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain. Most existing relevant models involve a feature extractor and a sentiment classifier, where the feature extractor works towards learning domain-invariant features from both domains, and the sentiment classifier is trained only on the source domain to guide the feature extractor. As such, they lack a mechanism to use sentiment polarity lying in the target domain. To improve domain-adapted sentiment classification by learning sentiment from the target domain as well, we devise a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers. The domain discriminators enable the feature extractors to obtain domain-invariant features. Meanwhile, the label prober in each group explores document sentiment polarity of the target domain through the sentiment prediction generated by the classifier in the peer group, and guides the learning of the feature extractor in its own group. The proposed approach achieves the mutual learning of the two groups in an end-to-end manner. Experiments on multiple public datasets indicate our method obtains the state-of-the-art performance, validating the effectiveness of mutual learning through label probers.

NLP paper 6/10

Why you may want to read this: Newest paper from Xunying Liu (Chinese University of Hong Kong).

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech. 

Xu Li, Xixin Wu, Xunying Liu, Helen Meng

Second language (L2) speech is often labeled with the native, phone categories. However, in many cases, it is difficult to decide on a categorical phone that an L2 segment belongs to. These segments are regarded as non-categories. Most existing approaches for Mispronunciation Detection and Diagnosis (MDD) are only concerned with categorical errors, i.e. a phone category is inserted, deleted or substituted by another. However, non-categorical errors are not considered. To model these non-categorical errors, this work aims at exploring non-categorical patterns to extend the categorical phone set. We apply a phonetic segment classifier to generate segmental phonetic posterior-grams (SPPGs) to represent phone segment-level information. And then we explore the non-categories by looking for the SPPGs with more than one peak. Compared with the baseline system, this approach explores more non-categorical patterns, and also perceptual experimental results show that the explored non-categories are more accurate with increased confusion degree by 7.3% and 7.5% under two different measures. Finally, we preliminarily analyze the reason behind those non-categories.

NLP paper 7/10

Why you may want to read this: Newest paper from Noah A. Smith (University of Washington; Allen Institute for Artificial Intelligence).

Citation Text Generation. 

Kelvin Luu, Rik Koncel-Kedziorski, Kyle Lo, Isabel Cachola, Noah A. Smith

We introduce the task of citation text generation: given a pair of scientific documents, explain their relationship in natural language text in the manner of a citation from one text to the other. This task encourages systems to learn rich relationships between scientific texts and to express them concretely in natural language. Models for citation text generation will require robust document understanding including the capacity to quickly adapt to new vocabulary and to reason about document content. We believe this challenging direction of research will benefit high-impact applications such as automatic literature review or scientific writing assistance systems. In this paper we establish the task of citation text generation with a standard evaluation corpus and explore several baseline models.

NLP paper 8/10

Why you may want to read this: Newest paper from Abhinav Gupta (Associate Professor, Robotics Institute, Carnegie Mellon University), Joelle Pineau (School of Computer Science, McGill University).

On the interaction between supervision and self-play in emergent communication. 

Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imitating human language data via supervised learning, and maximizing reward in a simulated multi-agent environment via self-play (as done in emergent communication), and introduce the term supervised self-play (S2P) for algorithms using both of these signals. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse, suggesting that it is not beneficial to emerge languages from scratch. We then empirically investigate various S2P schedules that begin with supervised learning in two environments: a Lewis signaling game with symbolic inputs, and an image-based referential game with natural language descriptions. Lastly, we introduce population based approaches to S2P, which further improves the performance over single-agent methods.

NLP paper 9/10

Why you may want to read this: Newest paper from Abhinav Gupta (Associate Professor, Robotics Institute, Carnegie Mellon University).

Exploring Structural Inductive Biases in Emergent Communication. 

Agnieszka Słowik, Abhinav Gupta, William L. Hamilton, Mateja Jamnik, Sean B. Holden, Christopher Pal

Human language and thought are characterized by the ability to systematically generate a potentially infinite number of complex structures (e.g., sentences) from a finite set of familiar components (e.g., words). Recent works in emergent communication have discussed the propensity of artificial agents to develop a systematically compositional language through playing co-operative referential games. The degree of structure in the input data was found to affect the compositionality of the emerged communication protocols. Thus, we explore various structural priors in multi-agent communication and propose a novel graph referential game. We compare the effect of structural inductive bias (bag-of-words, sequences and graphs) on the emergence of compositional understanding of the input concepts measured by topographic similarity and generalization to unseen combinations of familiar properties. We empirically show that graph neural networks induce a better compositional language prior and a stronger generalization to out-of-domain data. We further perform ablation studies that show the robustness of the emerged protocol in graph referential games.

NLP paper 10/10

Why you may want to read this: Newest paper from Jimmy Lin (University of Waterloo).

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents. 

Ruixue Zhang, Wei Yang, Luyun Lin, Zhengkai Tu, Yuqing Xie, Zihang Fu, Yuhao Xie, Luchen Tan, Kun Xiong, Jimmy Lin

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a sequence labeling task, and we demonstrate the adaption of BERT to two types of business documents: regulatory filings and property lease agreements. There are aspects of this problem that make it easier than "standard" information extraction tasks and other aspects that make it more difficult, but on balance we find that modest amounts of annotated data (less than 100 documents) are sufficient to achieve reasonable accuracy. We integrate our models into an end-to-end cloud platform that provides both an easy-to-use annotation interface as well as an inference interface that allows users to upload documents and inspect model outputs.

ArXiv Weekly: 10 CV Papers You May Want to Read

CV paper 1/10

Why you may want to read this: Newest paper from Pietro Perona (California Institute of Technology).

Geocoding of trees from street addresses and street-level images. 

Daniel Laumer, Nico Lang, Natalie van Doorn, Oisin Mac Aodha, Pietro Perona, Jan Dirk Wegner

We introduce an approach for updating older tree inventories with geographic coordinates using street-level panorama images and a global optimization framework for tree instance matching. Geolocations of trees in inventories until the early 2000s where recorded using street addresses whereas newer inventories use GPS. Our method retrofits older inventories with geographic coordinates to allow connecting them with newer inventories to facilitate long-term studies on tree mortality etc. What makes this problem challenging is the different number of trees per street address, the heterogeneous appearance of different tree instances in the images, ambiguous tree positions if viewed from multiple images and occlusions. To solve this assignment problem, we (i) detect trees in Google street-view panoramas using deep learning, (ii) combine multi-view detections per tree into a single representation, (iii) and match detected trees with given trees per street address with a global optimization approach. Experiments for > 50000 trees in 5 cities in California, USA, show that we are able to assign geographic coordinates to 38 % of the street trees, which is a good starting point for long-term studies on the ecosystem services value of street trees at large scale.

CV paper 2/10

Why you may want to read this: Newest paper from Joshua B. Tenenbaum (MIT).

Visual Concept-Metaconcept Learning. 

Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i.e., the color). In this paper, we propose the visual concept-metaconcept learner (VCML) for joint learning of concepts and metaconcepts from images and associated question-answer pairs. The key is to exploit the bidirectional connection between visual concepts and metaconcepts. Visual representations provide grounding cues for predicting relations between unseen pairs of concepts. Knowing that red and green describe the same property of objects, we generalize to the fact that cube and sphere also describe the same property of objects, since they both categorize the shape of objects. Meanwhile, knowledge about metaconcepts empowers visual concept learning from limited, noisy, and even biased data. From just a few examples of purple cubes we can understand a new color purple, which resembles the hue of the cubes instead of the shape of them. Evaluation on both synthetic and real-world datasets validates our claims.

CV paper 3/10

Why you may want to read this: Newest paper from Bernt Schiele (Professor, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarland …).

Analyzing the Dependency of ConvNets on Spatial Information. 

Yue Fan, Yongqin Xian, Max Maria Losch, Bernt Schiele

Intuitively, image classification should profit from using spatial information. Recent work, however, suggests that this might be overrated in standard CNNs. In this paper, we are pushing the envelope and aim to further investigate the reliance on spatial information. We propose spatial shuffling and GAP+FC to destroy spatial information during both training and testing phases. Interestingly, we observe that spatial information can be deleted from later layers with small performance drops, which indicates spatial information at later layers is not necessary for good performance. For example, test accuracy of VGG-16 only drops by 0.03% and 2.66% with spatial information completely removed from the last 30% and 53% layers on CIFAR100, respectively. Evaluation on several object recognition datasets (CIFAR100, Small-ImageNet, ImageNet) with a wide range of CNN architectures (VGG16, ResNet50, ResNet152) shows an overall consistent pattern.

CV paper 4/10

Why you may want to read this: Newest paper from Tinne Tuytelaars (KU Leuven - PSI, Belgium).

Deep-Geometric 6 DoF Localization from a Single Image in Topo-metric Maps. 

Tom Roussel, Punarjay Chakravarty, Gaurav Pandey, Tinne Tuytelaars, Luc Van Eycken

We describe a Deep-Geometric Localizer that is able to estimate the full 6 Degree of Freedom (DoF) global pose of the camera from a single image in a previously mapped environment. Our map is a topo-metric one, with discrete topological nodes whose 6 DoF poses are known. Each topo-node in our map also comprises of a set of points, whose 2D features and 3D locations are stored as part of the mapping process. For the mapping phase, we utilise a stereo camera and a regular stereo visual SLAM pipeline. During the localization phase, we take a single camera image, localize it to a topological node using Deep Learning, and use a geometric algorithm (PnP) on the matched 2D features (and their 3D positions in the topo map) to determine the full 6 DoF globally consistent pose of the camera. Our method divorces the mapping and the localization algorithms and sensors (stereo and mono), and allows accurate 6 DoF pose estimation in a previously mapped environment using a single camera. With potential VR/AR and localization applications in single camera devices such as mobile phones and drones, our hybrid algorithm compares favourably with the fully Deep-Learning based Pose-Net that regresses pose from a single image in simulated as well as real environments.

CV paper 5/10

Why you may want to read this: Newest paper from Dacheng Tao (The University of Sydney).

Towards High Performance Human Keypoint Detection. 

Jing Zhang, Zhe Chen, Dacheng Tao

Human keypoint detection from a single image is very challenging due to occlusion, blur, illumination and scale variance. In this paper, we address this problem from three aspects by devising an efficient network structure, proposing three effective training strategies, and exploiting four useful postprocessing techniques. First, we find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer (CCM), which efficiently integrates spatial and channel context information and progressively refines them. Then, to maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy by exploiting abundant unlabeled data. It enables CCM to learn discriminative features from massive diverse poses. Third, we present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy. Extensive experiments on the MS COCO keypoint detection benchmark demonstrate the superiority of the proposed method over representative state-of-the-art (SOTA) methods. Our single model achieves comparable performance with the winner of the 2018 COCO Keypoint Detection Challenge. The final ensemble model sets a new SOTA on this benchmark.

CV paper 6/10

Why you may want to read this: Newest paper from Kevin W. Bowyer (Schubmehl-Prein Family Professor of Computer Science and Engineering, University of …).

Analysis of Gender Inequality In Face Recognition Accuracy. 

Vítor Albiero, Krishnapriya K.S., Kushal Vangara, Kai Zhang, Michael C. King, Kevin W. Bowyer

We present a comprehensive analysis of how and why face recognition accuracy differs between men and women. We show that accuracy is lower for women due to the combination of (1) the impostor distribution for women having a skew toward higher similarity scores, and (2) the genuine distribution for women having a skew toward lower similarity scores. We show that this phenomenon of the impostor and genuine distributions for women shifting closer towards each other is general across datasets of African-American, Caucasian, and Asian faces. We show that the distribution of facial expressions may differ between male/female, but that the accuracy difference persists for image subsets rated confidently as neutral expression. The accuracy difference also persists for image subsets rated as close to zero pitch angle. Even when removing images with forehead partially occluded by hair/hat, the same impostor/genuine accuracy difference persists. We show that the female genuine distribution improves when only female images without facial cosmetics are used, but that the female impostor distribution also degrades at the same time. Lastly, we show that the accuracy difference persists even if a state-of-the-art deep learning method is trained from scratch using training data explicitly balanced between male and female images and subjects.

CV paper 7/10

Why you may want to read this: Newest paper from Jiaya Jia (Distinguished Scientist, Tencent; Professor, CUHK), Philip Torr (Professor, University of Oxford).

Global Texture Enhancement for Fake Face Detection in the Wild. 

Zhengzhe Liu, Xiaojuan Qi, Jiaya Jia, Philip Torr

Generative Adversarial Networks (GANs) can generate realistic fake face images that can easily fool human beings.On the contrary, a common Convolutional Neural Network(CNN) discriminator can achieve more than 99.9% accuracyin discerning fake/real images. In this paper, we conduct an empirical study on fake/real faces, and have two important observations: firstly, the texture of fake faces is substantially different from real ones; secondly, global texture statistics are more robust to image editing and transferable to fake faces from different GANs and datasets. Motivated by the above observations, we propose a new architecture coined as Gram-Net, which leverages global image texture representations for robust fake image detection. Experimental results on several datasets demonstrate that our Gram-Net outperforms existing approaches. Especially, our Gram-Netis more robust to image editings, e.g. down-sampling, JPEG compression, blur, and noise. More importantly, our Gram-Net generalizes significantly better in detecting fake faces from GAN models not seen in the training phase and can perform decently in detecting fake natural images.

CV paper 8/10

Why you may want to read this: Newest paper from Xiaogang Wang (Associate Professor of Electronic Engineering, the Chinese University of Hong Kong).

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation. 

Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, Xiaogang Wang

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

CV paper 9/10

Why you may want to read this: Newest paper from Philip H.S. Torr (Professor, University of Oxford), Nicu Sebe (University of Trento).

Multi-Channel Attention Selection GANs for Guided Image-to-Image Translation. 

Hao Tang, Dan Xu, Yan Yan, Jason J. Corso, Philip H.S. Torr, Nicu Sebe

We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling \& channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks, such as semantic image synthesis. The code is available at https://github.com/Ha0Tang/SelectionGAN.

CV paper 10/10

Why you may want to read this: Newest paper from P. Jonathon Phillips (National Institute of Standards and Technology).

Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms. 

P. Jonathon Phillips, Mark Przybocki

Traditionally, researchers in automatic face recognition and biometric technologies have focused on developing accurate algorithms. With this technology being integrated into operational systems, engineers and scientists are being asked, do these systems meet societal norms? The origin of this line of inquiry is `trust' of artificial intelligence (AI) systems. In this paper, we concentrate on adapting explainable AI to face recognition and biometrics, and we present four principles of explainable AI to face recognition and biometrics. The principles are illustrated by \it{four} case studies, which show the challenges and issues in developing algorithms that can produce explanations.

ArXiv Weekly: 10 ML Papers You May Want to Read

ML paper 1/10

Why you may want to read this: Newest paper from Toru Shimizu (産業技術総合研究所).

Learning Fine Grained Place Embeddings with Spatial Hierarchy from Human Mobility Trajectories. 

Toru Shimizu, Takahiro Yabe, Kota Tsubouchi

Place embeddings generated from human mobility trajectories have become a popular method to understand the functionality of places. Place embeddings with high spatial resolution are desirable for many applications, however, downscaling the spatial resolution deteriorates the quality of embeddings due to data sparsity, especially in less populated areas. We address this issue by proposing a method that generates fine grained place embeddings, which leverages spatial hierarchical information according to the local density of observed data points. The effectiveness of our fine grained place embeddings are compared to baseline methods via next place prediction tasks using real world trajectory data from 3 cities in Japan. In addition, we demonstrate the value of our fine grained place embeddings for land use classification applications. We believe that our technique of incorporating spatial hierarchical information can complement and reinforce various place embedding generating methods.

ML paper 2/10

Why you may want to read this: Newest paper from Francisco Herrera (Professor Computer Science and AI, Granada Univ.; Senior Associate Researcher in …).

LUNAR: Cellular Automata for Drifting Data Streams. 

Jesus L. Lobo, Javier Del Ser, Francisco Herrera

With the advent of huges volumes of data produced in the form of fast streams, real-time machine learning has become a challenge of relevance emerging in a plethora of real-world applications. Processing such fast streams often demands high memory and processing resources. In addition, they can be affected by non-stationary phenomena (concept drift), by which learning methods have to detect changes in the distribution of streaming data, and adapt to these evolving conditions. A lack of efficient and scalable solutions is particularly noted in real-time scenarios where computing resources are severely constrained, as it occurs in networks of small, numerous, interconnected processing units (such as the so-called Smart Dust, Utility Fog, or Swarm Robotics paradigms). In this work we propose LUNAR, a streamified version of cellular automata devised to successfully meet the aforementioned requirements. It is able to act as a real incremental learner while adapting to drifting conditions. Extensive simulations with synthetic and real data will provide evidence of its competitive behavior in terms of classification performance when compared to long-established and successful online learning methods.

ML paper 3/10

Why you may want to read this: Newest paper from David Heckerman (Microsoft Research).

A Tutorial on Learning With Bayesian Networks. 

David Heckerman

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

ML paper 4/10

Why you may want to read this: Newest paper from Klaus-Robert Müller (Professor for Machine Learning, TU Berlin, Germany and Korea University, Seoul, Korea …).

Forecasting Industrial Aging Processes with Machine Learning Methods. 

Mihail Bogojeski, Simeon Sauer, Franziska Horn, Klaus-Robert Müller

By accurately predicting industrial aging processes (IAPs), it is possible to schedule maintenance events further in advance, thereby ensuring a cost-efficient and reliable operation of the plant. So far, these degradation processes were usually described by mechanistic models or simple empirical prediction models. In this paper, we evaluate a wider range of data-driven models for this task, comparing some traditional stateless models (linear and kernel ridge regression, feed-forward neural networks) to more complex recurrent neural networks (echo state networks and LSTMs). To examine how much historical data is needed to train each of the models, we first examine their performance on a synthetic dataset with known dynamics. Next, the models are tested on real-world data from a large scale chemical plant. Our results show that LSTMs produce near perfect predictions when trained on a large enough dataset, while linear models may generalize better given small datasets with changing conditions.

ML paper 5/10

Why you may want to read this: Newest paper from Guido Imbens (Stanford University).

Decoupling Learning Rates Using Empirical Bayes Priors. 

Sareh Nabi, Houssam Nassif, Joseph Hong, Hamed Mamani, Guido Imbens

In this work, we propose an Empirical Bayes approach to decouple the learning rates of first order and second order features (or any other feature grouping) in a Generalized Linear Model. Such needs arise in small-batch or low-traffic use-cases. As the first order features are likely to have a more pronounced effect on the outcome, focusing on learning first order weights first is likely to improve performance and convergence time. Our Empirical Bayes method clamps features in each group together and uses the observed data for the deployed model to empirically compute a hierarchical prior in hindsight. We apply our method to a standard classification setting, as well as a contextual bandit setting in an Amazon production system. Both during simulations and live experiments, our method shows marked improvements, especially in cases of small traffic. Our findings are promising, as optimizing over sparse data is often a challenge. Furthermore, our approach can be applied to any problem instance modeled as a Bayesian framework.

ML paper 6/10

Why you may want to read this: Newest paper from William Stafford Noble (Professor of Genome Sciences, University of Washington).

Robust saliency maps with decoy-enhanced saliency score. 

Yang Lu, Wenbo Guo, Xinyu Xing, William Stafford Noble

Saliency methods help to make deep neural network predictions more interpretable by identifying particular features, such as pixels in an image, that contribute most strongly to the network's prediction. Unfortunately, recent evidence suggests that many saliency methods perform poorly when gradients are saturated or in the presence of strong inter-feature dependence or noise injected by an adversarial attack. In this work, we propose to infer robust saliency scores by integrating the saliency scores of a set of decoys with a novel decoy-enhanced saliency score, in which the decoys are generated by either solving an optimization problem or blurring the original input. We theoretically analyze that our method compensates for gradient saturation and considers joint activation patterns of pixels. We also apply our method to three different CNNs---VGGNet, AlexNet, and ResNet trained on ImageNet data set. The empirical results show both qualitatively and quantitatively that our method outperforms raw scores produced by three existing saliency methods, even in the presence of adversarial attacks.

ML paper 7/10

Why you may want to read this: Newest paper from Noga Alon (Princeton University and Tel Aviv University), Elad Hazan (Professor at Princeton University and Co-Director Google AI Princeton).

Boosting Simple Learners. 

Noga Alon, Alon Gonen, Elad Hazan, Shay Moran

We consider boosting algorithms under the restriction that the weak learners come from a class of bounded VC-dimension. In this setting, we focus on two main questions: (i) \underline{Oracle Complexity:} we show that the restriction on the complexity of the weak learner significantly improves the number of calls to the weak learner. We describe a boosting procedure which makes only~\tilde O(1/\gamma) calls to the weak learner, where \gamma denotes the weak learner's advantage. This circumvents a lower bound of \Omega(1/\gamma^2) due to Freund and Schapire ('95, '12) for the general case. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, our method use more complex aggregation rules, and we show this to be necessary.

(ii) \underline{Expressivity:} we consider the question of what can be learned by boosting weak hypotheses of bounded VC-dimension?

Towards this end we identify a combinatorial-geometric parameter called the \gamma-VC dimension which quantifies the expressivity of a class of weak hypotheses when used as part of a boosting procedure.

We explore the limits of the \gamma-VC dimension and compute it for well-studied classes such as halfspaces and decision stumps.

Along the way, we establish and exploit connections with {\it Discrepancy theory}.

ML paper 8/10

Why you may want to read this: Newest paper from Lei Zhang (Chair Professor, Dept. of Computing, The Hong Kong Polytechnic University), Xian-Sheng Hua (Alibaba DAMO Academy), Cuntai Guan (Professor, Nanyang Technological University (NTU), Singapore; IEEE Fellow).

Towards a Fast Steady-State Visual Evoked Potentials (SSVEP) Brain-Computer Interface (BCI). 

Aung Aung Phyo Wai, Yangsong Zhang, Heng Guo, Ying Chi, Lei Zhang, Xian-Sheng Hua, Seong Whan Lee, Cuntai Guan

Steady-state visual evoked potentials (SSVEP) brain-computer interface (BCI) provides reliable responses leading to high accuracy and information throughput. But achieving high accuracy typically requires a relatively long time window of one second or more. Various methods were proposed to improve sub-second response accuracy through subject-specific training and calibration. Substantial performance improvements were achieved with tedious calibration and subject-specific training; resulting in the user's discomfort. So, we propose a training-free method by combining spatial-filtering and temporal alignment (CSTA) to recognize SSVEP responses in sub-second response time. CSTA exploits linear correlation and non-linear similarity between steady-state responses and stimulus templates with complementary fusion to achieve desirable performance improvements. We evaluated the performance of CSTA in terms of accuracy and Information Transfer Rate (ITR) in comparison with both training-based and training-free methods using two SSVEP data-sets. We observed that CSTA achieves the maximum mean accuracy of 97.43\pm2.26 % and 85.71\pm13.41 % with four-class and forty-class SSVEP data-sets respectively in sub-second response time in offline analysis. CSTA yields significantly higher mean performance (p<0.001) than the training-free method on both data-sets. Compared with training-based methods, CSTA shows 29.33\pm19.65 % higher mean accuracy with statistically significant differences in time window less than 0.5 s. In longer time windows, CSTA exhibits either better or comparable performance though not statistically significantly better than training-based methods. We show that the proposed method brings advantages of subject-independent SSVEP classification without requiring training while enabling high target recognition performance in sub-second response time.

ML paper 9/10

Why you may want to read this: Newest paper from Zhi-Hua Zhou (Nanjing University).

Exploratory Machine Learning with Unknown Unknowns. 

Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou

In conventional supervised learning, a training dataset is given with ground-truth labels from a known label set, and the learned model will classify unseen instances to the known labels. In this paper, we study a new problem setting in which there are unknown classes in the training dataset misperceived as other labels, and thus their existence appears unknown from the given supervision. We attribute the unknown unknowns to the fact that the training dataset is badly advised by the incompletely perceived label space due to the insufficient feature information. To this end, we propose the exploratory machine learning, which examines and investigates the training dataset by actively augmenting the feature space to discover potentially unknown labels. Our approach consists of three ingredients including rejection model, feature acquisition, and model cascade. The effectiveness is validated on both synthetic and real datasets.

ML paper 10/10

Why you may want to read this: Newest paper from Dacheng Tao (The University of Sydney).

On Positive-Unlabeled Classification in GAN. 

Tianyu Guo, Chang Xu, Jiajun Huang, Yunhe Wang, Boxin Shi, Chao Xu, Dacheng Tao

This paper defines a positive and unlabeled classification problem for standard GANs, which then leads to a novel technique to stabilize the training of the discriminator in GANs. Traditionally, real data are taken as positive while generated data are negative. This positive-negative classification criterion was kept fixed all through the learning process of the discriminator without considering the gradually improved quality of generated data, even if they could be more realistic than real data at times. In contrast, it is more reasonable to treat the generated data as unlabeled, which could be positive or negative according to their quality. The discriminator is thus a classifier for this positive and unlabeled classification problem, and we derive a new Positive-Unlabeled GAN (PUGAN). We theoretically discuss the global optimality the proposed model will achieve and the equivalent optimization goal. Empirically, we find that PUGAN can achieve comparable or even better performance than those sophisticated discriminator stabilization methods.

ArXiv Weekly Radiostation
ArXiv Weekly Radiostation

Weekly selection and podcast of the latest CV,NLP, ML papers.