ArXiv Weekly Radiostation:本周NLP、CV、ML精选论文30篇(3.15-3.21)

机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation,精选每周NLP、CV、ML领域各10篇重要论文,本周详情如下:

ArXiv Weekly: 10 NLP Papers You May Want to Read

[NLP paper 1/10]

Why you may want to read this: Newest paper from Christopher D. Manning (Professor of Computer Science and Linguistics, Stanford University).

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages.

Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, Christopher D. Manning

We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionalities to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at

[NLP paper 2/10]

Why you may want to read this: Newest paper from Minlie Huang (computer science, Tsinghua University).

Recent Advances and Challenges in Task-oriented Dialog System.

Zheng Zhang, Ryuichi Takanobu, Minlie Huang, Xiaoyan Zhu

Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in an issue-specific manner. We discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog system modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model in both pipeline and end-to-end models. We also review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey can shed a light on future research in task-oriented dialog systems.

[NLP paper 3/10]

Why you may want to read this: Newest paper from James Pustejovsky (TJX Feldberg Chair of Computer Science, Brandeis University).

A Formal Analysis of Multimodal Referring Strategies Under Common Ground.

Nikhil Krishnaswamy, James Pustejovsky

In this paper, we present an analysis of computationally generated mixed-modality definite referring expressions using combinations of gesture and linguistic descriptions. In doing so, we expose some striking formal semantic properties of the interactions between gesture and language, conditioned on the introduction of content into the common ground between the (computational) speaker and (human) viewer, and demonstrate how these formal features can contribute to training better models to predict viewer judgment of referring expressions, and potentially to the generation of more natural and informative referring expressions.

 [NLP paper 4/10]

Why you may want to read this: Newest paper from Michael Mahoney (Professor of Statistics, UC Berkeley), Kurt Keutzer (Professor of the Graduate School, EECS, University of California, Berkeley).

Rethinking Batch Normalization in Transformers.

Sheng Shen, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

The standard normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN). This is different than batch normalization (BN), which is widely-adopted in Computer Vision. The preferred use of LN in NLP is principally due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks; however, a thorough understanding of the underlying reasons for this is not always evident. In this paper, we perform a systematic study of NLP transformer models to understand why BN has a poor performance, as compared to LN. We find that the statistics of NLP data across the batch dimension exhibit large fluctuations throughout training. This results in instability, if BN is naively implemented. To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass. We show theoretically, under mild assumptions, that PN leads to a smaller Lipschitz constant for the loss, compared with BN. Furthermore, we prove that the approximate backpropagation scheme leads to bounded gradients. We extensively test PN for transformers on a range of NLP tasks, and we show that it significantly outperforms both LN and BN. In particular, PN outperforms LN by 0.4/0.6 BLEU on IWSLT14/WMT14 and 5.6/3.0 PPL on PTB/WikiText-103.

[NLP paper 5/10]

Why you may want to read this: Newest paper from Jimmy Lin (University of Waterloo).

TTTTTackling WinoGrande Schemas.

Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis. Our first (and only) submission to the official leaderboard yielded 0.7673 AUC on March 13, 2020, which is the best known result at this time and beats the previous state of the art by over five points.

[NLP paper 6/10]

Why you may want to read this: Newest paper from Siddharth Singh (Graduate Student, Ohio State University).

Developing a Multilingual Annotated Corpus of Misogyny and Aggression.

Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha

In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project). The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments. The comments are annotated at two levels - aggression (overtly aggressive, covertly aggressive, and non-aggressive) and misogyny (gendered and non-gendered). We describe the process of data collection, the tagset used for annotation, and issues and challenges faced during the process of annotation. Finally, we discuss the results of the baseline experiments conducted to develop a classifier for misogyny in the three languages.

[NLP paper 7/10]

Why you may want to read this: Newest paper from Maosong Sun (Professor of Computer Science and Technology, Tsinghua University).

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space.

Xiaoyuan Yi, Ruoyu Li, Cheng Yang, Wenhao Li, Maosong Sun

As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc., would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

[NLP paper 8/10]

Why you may want to read this: Newest paper from Mona Diab (Professor of Computer Science, George Washington University).

Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections.

Yi-An Lai, Xuan Zhu, Yi Zhang, Mona Diab

Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to describe a collection of texts in terms of the words, sentences, or paragraphs they comprise. In this work, we propose metrics of diversity, density, and homogeneity that quantitatively measure the dispersion, sparsity, and uniformity of a text collection. We conduct a series of simulations to verify that each metric holds desired properties and resonates with human intuitions. Experiments on real-world datasets demonstrate that the proposed characteristic metrics are highly correlated with text classification performance of a renowned model, BERT, which could inspire future applications.

[NLP paper 9/10]

Why you may want to read this: Newest paper from Simone Paolo Ponzetto (Professor of Information Systems, University of Mannheim).

Word Sense Disambiguation for 158 Languages using Word Embeddings Only.

Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

[NLP paper 10/10]

Why you may want to read this: Newest paper from Pushpak Bhattacharyya (Professor of Computer Science and Engineering, IIT Bombay).

Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent.

Anoop Kunchukuttan, Pushpak Bhattacharyya

In this work, we present an extensive study of statistical machine translation involving languages of the Indian subcontinent. These languages are related by genetic and contact relationships. We describe the similarities between Indic languages arising from these relationships. We explore how lexical and orthographic similarity among these languages can be utilized to improve translation quality between Indic languages when limited parallel corpora is available. We also explore how the structural correspondence between Indic languages can be utilized to re-use linguistic resources for English to Indic language translation. Our observations span 90 language pairs from 9 Indic languages and English. To the best of our knowledge, this is the first large-scale study specifically devoted to utilizing language relatedness to improve translation between related languages.

ArXiv Weekly: 10 CV Papers You May Want to Read

[CV paper 1/10]

Why you may want to read this: Newest paper from Anil K. Jain (Michigan State University).

Child Face Age-Progression via Deep Feature Aging. 

Debayan Deb, Divyansh Aggarwal, Anil K. Jain

Given a gallery of face images of missing children, state-of-the-art face recognition systems fall short in identifying a child (probe) recovered at a later age. We propose a feature aging module that can age-progress deep face features output by a face matcher. In addition, the feature aging module guides age-progression in the image space such that synthesized aged faces can be utilized to enhance longitudinal face recognition performance of any face matcher without requiring any explicit training. For time lapses larger than 10 years (the missing child is found after 10 or more years), the proposed age-progression module improves the closed-set identification accuracy of FaceNet from 16.53% to 21.44% and CosFace from 60.72% to 66.12% on a child celebrity dataset, namely ITWCC. The proposed method also outperforms state-of-the-art approaches with a rank-1 identification rate of 95.91%, compared to 94.91%, on a public aging dataset, FG-NET, and 99.58%, compared to 99.50%, on CACD-VS. These results suggest that aging face features enhances the ability to identify young children who are possible victims of child trafficking or abduction.

 [CV paper 2/10]

Why you may want to read this: Newest paper from Anil K. Jain (Michigan State University).

Generalizing Face Representation with Unlabeled Data.

Yichun Shi, Anil K. Jain

In recent years, significant progress has been made in face recognition due to the availability of large-scale labeled face datasets. However, since the faces in these datasets usually contain limited degree and types of variation, the models trained on them generalize poorly to more realistic unconstrained face datasets. While collecting labeled faces with larger variations could be helpful, it is practically infeasible due to privacy and labor cost. In comparison, it is easier to acquire a large number of unlabeled faces from different domains which would better represent the testing scenarios in real-world problems. We present an approach to use such unlabeled faces to learn generalizable face representations, which can be viewed as an unsupervised domain generalization framework. Experimental results on unconstrained datasets show that a small amount of unlabeled data with sufficient diversity can (i) lead to an appreciable gain in recognition performance and (ii) outperform the supervised baseline when combined with less than half of the labeled data. Compared with the state-of-the-art face recognition methods, our method further improves their performance on challenging benchmarks, such as IJB-B, IJB-C and IJB-S.

[CV paper 3/10]

Why you may want to read this: Newest paper from Wen-mei Hwu (Professor and Sanders-AMD Chair of Electrical and Computer Engineering, University of …), Thomas S. Huang (University of Illinois, Urbana-Champaign).

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation.

Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerior Feris, Jinjun Xiong, Wen-mei Hwu, Thomas S. Huang, Honghui Shi

We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work. State-of-the-art approaches prove that performing semantic-level alignment is helpful in tackling the domain shift issue. Based on the observation that stuff categories usually share similar appearances across images of different domains while things (i.e. object instances) have much larger differences, we propose to improve the semantic-level alignment with different strategies for stuff regions and for things: 1) for the stuff categories, we generate feature representation for each class and conduct the alignment operation from the target domain to the source domain; 2) for the thing categories, we generate feature representation for each individual instance and encourage the instance in the target domain to align with the most similar one in the source domain. In this way, the individual differences within thing categories will also be considered to alleviate over-alignment. In addition to our proposed method, we further reveal the reason why the current adversarial loss is often unstable in minimizing the distribution discrepancy and show that our method can help ease this issue by minimizing the most similar stuff and instance features between the source and the target domains. We conduct extensive experiments in two unsupervised domain adaptation tasks, i.e. GTA5 to Cityscapes and SYNTHIA to Cityscapes, and achieve the new state-of-the-art segmentation accuracy.

 [CV paper 4/10]

Why you may want to read this: Newest paper from Thomas S. Huang (University of Illinois, Urbana-Champaign).

Deep Affinity Net: Instance Segmentation via Affinity.

Xingqian Xu, Mang Tik Chiu, Thomas S. Huang, Honghui Shi

Most of the modern instance segmentation approaches fall into two categories: region-based approaches in which object bounding boxes are detected first and later used in cropping and segmenting instances; and keypoint-based approaches in which individual instances are represented by a set of keypoints followed by a dense pixel clustering around those keypoints. Despite the maturity of these two paradigms, we would like to report an alternative affinity-based paradigm where instances are segmented based on densely predicted affinities and graph partitioning algorithms. Such affinity-based approaches indicate that high-level graph features other than regions or keypoints can be directly applied in the instance segmentation task. In this work, we propose Deep Affinity Net, an effective affinity-based approach accompanied with a new graph partitioning algorithm Cascade-GAEC. Without bells and whistles, our end-to-end model results in 32.4% AP on Cityscapes val and 27.5% AP on test. It achieves the best single-shot result as well as the fastest running time among all affinity-based models. It also outperforms the region-based method Mask R-CNN.

[CV paper 5/10]

Why you may want to read this: Newest paper from Jian Sun (Chief Scientist | Managing Director of Research, Megvii (Face++)).

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification.

Guan'an Wang, Shuo Yang, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, ErjinZhou, Jian Sun

Occluded person re-identification (ReID) aims to match occluded person images to holistic ones across dis-joint cameras. In this paper, we propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. At first, we use a CNN backbone and a key-points estimation model to extract semantic local features. Even so, occluded images still suffer from occlusion and outliers. Then, we view the local features of an image as nodes of a graph and propose an adaptive direction graph convolutional (ADGC)layer to pass relation information between nodes. The proposed ADGC layer can automatically suppress the message-passing of meaningless features by dynamically learning di-rection and degree of linkage. When aligning two groups of local features from two images, we view it as a graph matching problem and propose a cross-graph embedded-alignment (CGEA) layer to jointly learn and embed topology information to local features, and straightly predict similarity score. The proposed CGEA layer not only take full use of alignment learned by graph matching but also re-place sensitive one-to-one matching with a robust soft one. Finally, extensive experiments on occluded, partial, and holistic ReID tasks show the effectiveness of our proposed method. Specifically, our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.

 [CV paper 6/10]

Why you may want to read this: Newest paper from Xiangyu Zhang (Research Leader, Megvii Technology), Jian Sun (Chief Scientist | Managing Director of Research, Megvii (Face++)), Jiaya Jia (Distinguished Scientist, Tencent; Professor, CUHK).

PointINS: Point-based Instance Segmentation.

Lu Qi, Xiangyu Zhang, Yingcong Chen, Yukang Chen, Jian Sun, Jiaya Jia

A single-point feature has shown its effectiveness in object detection. However, for instance segmentation, it does not lead to satisfactory results. The reasons are two folds. Firstly, it has limited representation capacity. Secondly, it could be misaligned with potential instances. To address the above issues, we propose a new point-based framework, namely PointINS, to segment instances from single points. The core module of our framework is instance-aware convolution, including the instance-agnostic feature and instance-aware weights. Instance-agnostic feature for each Point-of-Interest (PoI) serves as a template for potential instance masks. In this way, instance-aware features are computed by convolving this template with instance-aware weights for following mask prediction. Given the independence of instance-aware convolution, PointINS is general and practical as a one-stage detector for anchor-based and anchor-free frameworks. In our extensive experiments, we show the effectiveness of our framework on RetinaNet and FCOS. With ResNet101 backbone, PointINS achieves 38.3 mask mAP on challenging COCO dataset, outperforming its competitors by a large margin. The code will be made publicly available.

[CV paper 7/10]

Why you may want to read this: Newest paper from Trevor Darrell (Professor of Computer Science, UC Berkeley).

Frustratingly Simple Few-Shot Object Detection.

Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu

Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at

 [CV paper 8/10]

Why you may want to read this: Newest paper from Ramesh Jain (Professor of Computer Science, University of California, Irvine).

Personalized Taste and Cuisine Preference Modeling via Images.

Nitish Nag, Bindu Rajanna, Ramesh Jain

With the exponential growth in the usage of social media to share live updates about life, taking pictures has become an unavoidable phenomenon. Individuals unknowingly create a unique knowledge base with these images. The food images, in particular, are of interest as they contain a plethora of information. From the image metadata and using computer vision tools, we can extract distinct insights for each user to build a personal profile. Using the underlying connection between cuisines and their inherent tastes, we attempt to develop such a profile for an individual based solely on the images of his food. Our study provides insights about an individual's inclination towards particular cuisines. Interpreting these insights can lead to the development of a more precise recommendation system. Such a system would avoid the generic approach in favor of a personalized recommendation system.

 [CV paper 9/10]

Why you may want to read this: Newest paper from Michal Irani (Professor of Computer Science, Weizmann Institute), William T. Freeman (Professor of Computer Science, MIT).

Semantic Pyramid for Image Generation.

Assaf Shocher, Yossi Gandelsmam, Inbar Mosseri, Michal Yarom, Michal Irani, William T. Freeman, Tali Dekel

We present a novel GAN-based model that utilizes the space of deep features learned by a pre-trained classification model. Inspired by classical image pyramid representations, we construct our model as a Semantic Generation Pyramid -- a hierarchical framework which leverages the continuum of semantic information encapsulated in such deep features; this ranges from low level information contained in fine features to high level, semantic information contained in deeper features. More specifically, given a set of features extracted from a reference image, our model generates diverse image samples, each with matching features at each semantic level of the classification model. We demonstrate that our model results in a versatile and flexible framework that can be used in various classic and novel image generation tasks. These include: generating images with a controllable extent of semantic similarity to a reference image, and different manipulation tasks such as semantically-controlled inpainting and compositing; all achieved with the same model, with no further training.

[CV paper 10/10]

Why you may want to read this: Newest paper from Ram Nevatia (), Leonidas J. Guibas (Professor of Computer Science, Stanford University).

Curriculum DeepSDF.

Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas

When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions. In this paper, we design a "shape curriculum" for learning continuous Signed Distance Function (SDF) on shapes, namely Curriculum DeepSDF. Inspired by how humans learn, Curriculum DeepSDF organizes the learning task in ascending order of difficulty according to the following two criteria: surface accuracy and sample difficulty. The former considers stringency in supervising with ground truth, while the latter regards the weights of hard training samples near complex geometry and fine structure. More specifically, Curriculum DeepSDF learns to reconstruct coarse shapes at first, and then gradually increases the accuracy and focuses more on complex local details. Experimental results show that a carefully-designed curriculum leads to significantly better shape reconstructions with the same training data, training epochs and network architecture as DeepSDF. We believe that the application of shape curricula can benefit the training process of a wide variety of 3D shape representation learning methods.

ArXiv Weekly: 10 ML Papers You May Want to Read

[ML paper 1/10]

Why you may want to read this: Newest paper from Michael I. Jordan (Professor of EECS and Professor of Statistics, University of California, Berkeley).

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information.

Esther Rolf, Michael I. Jordan, Benjamin Recht

Observational data are often accompanied by natural structural indices, such as time stamps or geographic locations, which are meaningful to prediction tasks but are often discarded. We leverage semantically meaningful indexing data while ensuring robustness to potentially uninformative or misleading indices. We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction. Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks, with no need to retrain models. Our theoretical analysis details simple conditions under which post-estimation smoothing will improve accuracy over that of the original predictor. Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice. Together, these results illuminate a novel way to consider and incorporate the natural structure of index variables in machine learning.

[ML paper 2/10]

Why you may want to read this: Newest paper from Maurizio Pierini (CERN), Zhenbin Wu (Baylor University, University of Illinois at Chicago).

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML.

Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo Jindariani, Edward Kreinar, Mia Liu, Vladimir Loncar, Jennifer Ngadiuba, Kevin Pedro, Maurizio Pierini, Dylan Rankin, Sheila Sagear, Sioni Summers, Nhan Tran, Zhenbin Wu

We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with FPGA firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.

 [ML paper 3/10]

Why you may want to read this: Newest paper from Soo-Young Lee (Korea Advanced Institute ofScience and Technology).

Semi-supervised Disentanglement with Independent Vector Variational Autoencoders.

Bo-Kyeong Kim, Sungjin Park, Geonmin Kim, Soo-Young Lee

We aim to separate the generative factors of data into two latent vectors in a variational autoencoder. One vector captures class factors relevant to target classification tasks, while the other vector captures style factors relevant to the remaining information. To learn the discrete class features, we introduce supervision using a small amount of labeled data, which can simply yet effectively reduce the effort required for hyperparameter tuning performed in existing unsupervised methods. Furthermore, we introduce a learning objective to encourage statistical independence between the vectors. We show that (i) this vector independence term exists within the result obtained on decomposing the evidence lower bound with multiple latent vectors, and (ii) encouraging such independence along with reducing the total correlation within the vectors enhances disentanglement performance. Experiments conducted on several image datasets demonstrate that the disentanglement achieved via our method can improve classification performance and generation controllability.

 [ML paper 4/10]

Why you may want to read this: Newest paper from Georgios B. Giannakis (Endowed Chair Prof., Dept. of ECE and DTC, University of Minnesota).

Tensor Graph Convolutional Networks for Multi-relational and Robust Learning.

Vassilis N. Ioannidis, Antonio G. Marques, Georgios B. Giannakis

The era of "data deluge" has sparked renewed interest in graph-based learning methods and their widespread applications ranging from sociology and biology to transportation and communications. In this context of graph-aware methods, the present paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor. Key aspects of the novel TGCN architecture are the dynamic adaptation to different relations in the tensor graph via learnable weights, and the consideration of graph-based regularizers to promote smoothness and alleviate over-parameterization. The ultimate goal is to design a powerful learning architecture able to: discover complex and highly nonlinear data associations, combine (and select) multiple types of relations, scale gracefully with the graph size, and remain robust to perturbations on the graph edges. The proposed architecture is relevant not only in applications where the nodes are naturally involved in different relations (e.g., a multi-relational graph capturing family, friendship and work relations in a social network), but also in robust learning setups where the graph entails a certain level of uncertainty, and the different tensor slabs correspond to different versions (realizations) of the nominal graph. Numerical tests showcase that the proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.

[ML paper 5/10]

Why you may want to read this: Newest paper from Klaus-Robert Müller (Professor for Machine Learning, TU Berlin, Germany and Korea University, Seoul, Korea …).

Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond.

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J. Anders, Klaus-Robert Müller

With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem solving abilities and strategies of nonlinear Machine Learning such as Deep Learning (DL), LSTMs, and kernel methods are therefore receiving increased attention. In this work we aim to (1) provide a timely overview of this active emerging field and explain its theoretical foundations, (2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations, (3) outline best practice aspects i.e. how to best include interpretation methods into the standard usage of machine learning and (4) demonstrate successful usage of explainable AI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of machine learning.

 [ML paper 6/10]

Why you may want to read this: Newest paper from Richard Hartley (Australian National University, National ICT Australia (NICTA)).

Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks.

Amir Rahimi, Amirreza Shaban, Ching-An Cheng, Byron Boots, Richard Hartley

Predicting calibrated confidence scores for multi-class deep networks is important for avoiding rare but costly mistakes. A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy. However, previous post-hoc calibration techniques work only with simple calibration functions, potentially lacking sufficient representation to calibrate the complex function landscape of deep networks. In this work, we aim to learn general post-hoc calibration functions that can preserve the top-k predictions of any deep network. We call this family of functions intra order-preserving functions. We propose a new neural network architecture that represents a class of intra order-preserving functions by combining common neural network components. Additionally, we introduce order-invariant and diagonal sub-families, which can act as regularization for better generalization when the training data size is small. We show the effectiveness of the proposed method across a wide range of datasets and classifiers. Our method outperforms state-of-the-art post-hoc calibration methods, namely temperature scaling and Dirichlet calibration, in multiple settings.

[ML paper 7/10]

Why you may want to read this: Newest paper from Philip Torr (Professor, University of Oxford).

Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control.

Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer, Shimon Whiteson

Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-world cooperative robotic manipulation and transportation tasks. Nevertheless, decentralised cooperative robotic control has received less attention from the deep reinforcement learning community, as compared to single-agent robotics and multi-agent games with discrete actions. To address this gap, this paper introduces Multi-Agent Mujoco, an easily extensible multi-agent benchmark suite for robotic control in continuous action spaces. The benchmark tasks are diverse and admit easily configurable partially observable settings. Inspired by the success of single-agent continuous value-based algorithms in robotic control, we also introduce COMIX, a novel extension to a common discrete action multi-agent Q-learning algorithm. We show that COMIX significantly outperforms state-of-the-art MADDPG on a partially observable variant of a popular particle environment and matches or surpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite and method, we can now pose an interesting question: what is the key to performance in such settings, the use of value-based methods instead of policy gradients, or the factorisation of the joint Q-function? To answer this question, we propose a second new method, FacMADDPG, which factors MADDPG's critic. Experimental results on Multi-Agent Mujoco suggest that factorisation is the key to performance.

[ML paper 8/10]

Why you may want to read this: Newest paper from Martin Ester (Professor of Computer Science, Simon Fraser University).

ParKCa: Causal Inference with Partially Known Causes.

Raquel Aoki, Martin Ester

Causal Inference methods based on observational data are an alternative for applications where collecting the counterfactual data or realizing a more standard experiment is not possible. In this work, our goal is to combine several observational causal inference methods to learn new causes in applications where some causes are well known. We validate the proposed method on The Cancer Genome Atlas (TCGA) dataset to identify genes that potentially cause metastasis.

[ML paper 9/10]

Why you may want to read this: Newest paper from Pramod K. Varshney (Distinguished Professor of EECS, Syracuse University).

Anomalous Instance Detection in Deep Learning: A Survey.

Saikiran Bulusu, Bhavya Kailkhura, Bo Li, Pramod K. Varshney, Dawn Song

Deep Learning (DL) is vulnerable to out-of-distribution and adversarial examples resulting in incorrect outputs. To make DL more robust, several posthoc anomaly detection techniques to detect (and discard) these anomalous samples have been proposed in the recent past. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection for DL based applications. We provide a taxonomy for existing techniques based on their underlying assumptions and adopted approaches. We discuss various techniques in each of the categories and provide the relative strengths and weaknesses of the approaches. Our goal in this survey is to provide an easier yet better understanding of the techniques belonging to different categories in which research has been done on this topic. Finally, we highlight the unsolved research challenges while applying anomaly detection techniques in DL systems and present some high-impact future research directions.

 [ML paper 10/10]

Why you may want to read this: Newest paper from Babak Hassibi (Mose and Lilian S. Bohn Professor of Electrical Engineering).

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems.

Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar

We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and propose a new approach for closed-loop system identification, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt, and prove the regret upper bound of \tilde{\mathcal{O}}(\sqrt{T}) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem.


ArXiv Weekly Radiostation
ArXiv Weekly Radiostation

Weekly selection and podcast of the latest CV,NLP, ML papers.