AI Weekly Update Preview — April 12th, 2021

Major Themes of the Latest Papers

  • Self-Supervised Learning
  • Vision-Language Learning
  • Generative Modeling (mostly GANs)
  • Meta-Learning
  • NLP
  • Generalization
  • Model-based RL
  • Code Examples (GPT-Neo, RAPIDS+Determined, PT Lightning+DeepSpeed)
  • Meta (Ruder Newsletter, Commentary on Medical AI approval)

Self-Supervised Learning

Large-scale forecasting: Self-supervised learning framework for hyperparameter tuning

  • Learn more about this bottleneck in a video I recently made explaining Determined’s ASHA algorithm and other ideas related to HP optimization.
  1. Offline training data preparation. We obtain (a). time series features for each time series, and (b) the best performing model for each time series via offline exhaustive hyperparameter tuning.
  2. Offline training. A classifier (self-supervised learner) is trained with the data from Step (1), where the input feature (predictor) is the time series feature, the label is the best performing model.
  3. Offline model prediction. In our online services, for a new time series data, we first extract features, then make inference with our pre-trained classifier, such as random forest.”
  1. Offline training data preparation. Similar to SSL-MS, we also need to obtain the time series features, then perform offline exhaustive parameter tuning to get the best performed hyper-parameters for each model and data combination.
  2. Offline training. A multi-task neural network (self-supervised learner) is trained with the datasets from Step(1) for each model.
  3. Online hyper-parameters tuning. In our online system, for a new time series data, we first extract features, then make inference with our pre-trained multi-task neural network.”
“We notice that a sudden change of gradients (a ‘spike’ in Fig. 4) causes a ‘dip’ in the training curve.”
  • “By comparing all layers’ gradients, we observe that the gradient spikes happen earlier in the first layer (patch projection), and are delayed by couples of iterations in the last layers (see Fig. 4).
  • Based on this observation, we hypothesize that the instability happens earlier in the shallower layers.
  • Motivated by this, we explore freezing the patch projection layer during training.
  • We use a fixed random patch projection layer to embed the patches, which is not learned.
  • This can be easily done by applying a stop-gradient operation right after this layer.

Vision-Language Learning

Towards General Purpose Vision Systems

“GPV-I can be trained end-to-end on any task that demands a box or text output without any architecture modifications such as adding a new task-head.”
  1. Generality of architecture: The system can learn and perform any task within a broad domain without change to network structure (e.g. learn to classify bird species, without adding new output heads, by re-using ability to encode images, interpret task from text, and produce words)
  2. Generality of concepts across skills: The system can perform tasks in skill-concept combinations not seen during training (e.g. localize ‘muskrat’ after learning to answer questions about ‘muskrats’)
  3. Generality of learning: The system can learn new tasks sample-efficiently with minimal loss to performance on previously learned tasks”
“GPV-I consisting of a visual encoder, language encoder, vision-language co-attention module, and output heads for the supported output modalities — boxes, relevance scores, and text.”

Generative Modeling

Regularizing Generative Adversarial Networks under Limited Data

  1. Improves the generalization performance and stabilizes the learning dynamics of GAN models under limited training data
  2. Complements the recent data augmentation methods.”
  • Preserve the underlying global structure of the target character
  • Present the diverse local reference style”


Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Natural Language Processing

What will it Take to Fix Benchmarking in Natural Language Understanding?

Proposed criteria for future NLP benchmarks
Example of a prompt (red text) applied to a sample of the BoolQ dataset
  • We believe that prompt-based fine-tuning should become a standard tool: especially for small- and middle-sized task-specific datasets, designing a prompt yourself is a small effort for a sizable data advantage.
  • Why is the same prompt worth 3500 MNLI data points but only 282 RTE data points?
  • How are prompts related to standard ML supervision?
  • Do they react differently to adversarial or out-of-domain examples, since they have some zero-shot behaviour?”

Deep Learning with Code Data

CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing

  • Code Documentation generation
  • Source Code Summarization
  • Code Comment Generation
  • Git Commit Message Generation
  • API Sequence Recommendation
  • Program Synthesis.”


Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

Model-Based RL

Debugging Deep Model-based Reinforcement Learning Systems

  1. Overview of model-based RL,
  2. Core things to tinker with in these systems
  3. Other considerations that may come up (e.g. when working with robotics),
  4. Practical tips: quick things to change or run for a big potential improvement, and
  5. Conclusion”



  • How to load GPT-Neo
  • Generate from “My name is Zack and I like to”
  • Generate from “Below is React code for a to-do list app:”
  • Read location and historical sales CSVs into cuDF DataFrames residing in GPU memory.
  • Join these data sets into a denormalized DataFrame. This GPU-accelerated join is handled by cuDF.
  • Construct a PyTorch Dataset from the denormalized DataFrame.
  • Train with Determined!”


Sebastian Ruder’s Newsletter

  • ICLR 2021 Outstanding Papers
  • Char Wars
  • Speech-first NLP
  • Virtual conference ideas



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store