AI Weekly Update Preview — April 12th, 2021

This article presents a few salient quotes from each of the papers that will be covered on the next AI Weekly Update on Henry AI Labs!

Major Themes of the Latest Papers

  • Self-Supervised Learning
  • Vision-Language Learning
  • Generative Modeling (mostly GANs)
  • Meta-Learning
  • NLP
  • Generalization
  • Model-based RL
  • Code Examples (GPT-Neo, RAPIDS+Determined, PT Lightning+DeepSpeed)
  • Meta (Ruder Newsletter, Commentary on Medical AI approval)

Self-Supervised Learning

Large-scale forecasting: Self-supervised learning framework for hyperparameter tuning

  1. Offline training. A classifier (self-supervised learner) is trained with the data from Step (1), where the input feature (predictor) is the time series feature, the label is the best performing model.
  2. Offline model prediction. In our online services, for a new time series data, we first extract features, then make inference with our pre-trained classifier, such as random forest.”
  1. Offline training. A multi-task neural network (self-supervised learner) is trained with the datasets from Step(1) for each model.
  2. Online hyper-parameters tuning. In our online system, for a new time series data, we first extract features, then make inference with our pre-trained multi-task neural network.”
“We notice that a sudden change of gradients (a ‘spike’ in Fig. 4) causes a ‘dip’ in the training curve.”
  • Based on this observation, we hypothesize that the instability happens earlier in the shallower layers.
  • Motivated by this, we explore freezing the patch projection layer during training.
  • We use a fixed random patch projection layer to embed the patches, which is not learned.
  • This can be easily done by applying a stop-gradient operation right after this layer.

Vision-Language Learning

Towards General Purpose Vision Systems

“GPV-I can be trained end-to-end on any task that demands a box or text output without any architecture modifications such as adding a new task-head.”
  1. Generality of concepts across skills: The system can perform tasks in skill-concept combinations not seen during training (e.g. localize ‘muskrat’ after learning to answer questions about ‘muskrats’)
  2. Generality of learning: The system can learn new tasks sample-efficiently with minimal loss to performance on previously learned tasks”
“GPV-I consisting of a visual encoder, language encoder, vision-language co-attention module, and output heads for the supported output modalities — boxes, relevance scores, and text.”

Generative Modeling

Regularizing Generative Adversarial Networks under Limited Data

  1. Complements the recent data augmentation methods.”
  • Present the diverse local reference style”


Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Natural Language Processing

What will it Take to Fix Benchmarking in Natural Language Understanding?

Proposed criteria for future NLP benchmarks
Example of a prompt (red text) applied to a sample of the BoolQ dataset
  • How are prompts related to standard ML supervision?
  • Do they react differently to adversarial or out-of-domain examples, since they have some zero-shot behaviour?”

Deep Learning with Code Data

CodeTrans: Towards Cracking the Language of Silicone’s Code Through Self-Supervised Deep Learning and High Performance Computing

  • Source Code Summarization
  • Code Comment Generation
  • Git Commit Message Generation
  • API Sequence Recommendation
  • Program Synthesis.”


Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

Model-Based RL

Debugging Deep Model-based Reinforcement Learning Systems

  1. Core things to tinker with in these systems
  2. Other considerations that may come up (e.g. when working with robotics),
  3. Practical tips: quick things to change or run for a big potential improvement, and
  4. Conclusion”



  • Generate from “My name is Zack and I like to”
  • Generate from “Below is React code for a to-do list app:”
  • Join these data sets into a denormalized DataFrame. This GPU-accelerated join is handled by cuDF.
  • Construct a PyTorch Dataset from the denormalized DataFrame.
  • Train with Determined!”


Sebastian Ruder’s Newsletter

  • Char Wars
  • Speech-first NLP
  • Virtual conference ideas

Check out my Deep Learning YouTube Channel!