Code languages, such as Python or Java, have become a core application area of Deep Learning. OpenAI and GitHub have recently unveiled “Copilot” and the corresponding paper describing the technology and underlying “Codex” models. Copilot is powered by taking the GPT-3 language modeling show on the road to datasets made of code. These datasets are typically scraped from the GitHub repository of open-source code. Platforms used to help prospective Software Engineers prepare for the coding interview have also been used as well, with Codeforces as a notable source of this data. More particularly Codex collects a filtered 159 GB data…


This article presents a few salient quotes from each of the papers that will be covered on the next AI Weekly Update on Henry AI Labs!

Major Themes of the Latest Papers

  • Self-Supervised Learning
  • Vision-Language Learning
  • Generative Modeling (mostly GANs)
  • Meta-Learning
  • NLP
  • Generalization
  • Model-based RL
  • Code Examples (GPT-Neo, RAPIDS+Determined, PT Lightning+DeepSpeed)
  • Meta (Ruder Newsletter, Commentary on Medical AI approval)

Self-Supervised Learning

Large-scale forecasting: Self-supervised learning framework for hyperparameter tuning

“The SSL-HPT algorithm estimates hyperparameters 6–20x faster when compared with baseline search-based algorithms, while producing comparably accurate forecasting results in various applications.”

“Most existing hyperparameter tuning methods — such as grid search, random search, and Bayesian optimal search —…


Computer Vision, taken over by Transformers

Dear Readers,

Thank you for checking out the AI Weekly Update Newsletter from Henry AI Labs! This newsletter tours updates in Deep Learning and Artificial Intelligence, providing quotes and images that tell each story.

I am working on publishing my first experimental paper in Contrastive Learning. More than anything else, this has really tested my ability to manage a large code repository. This inspired a quick video explaining why (in my opinion) you should get away from exclusively writing code in Jupyter notebooks as soon as possible.

I’ve also started a cohort to walkthrough MIT’s open source “Machine Learning for Healthcare” course


Scientific overload is one of the toughest challenges facing scientists today. As Machine Learning researchers, we constantly complain about the fast pace of arxiv uploads and praise organization tools like Arxiv Sanity Preserver. The scientific response to COVID-19 is another example of information overload. The CORD-19 dataset documents over 100K papers containing relevant information. No single or group of human beings could be expected to interpret this amount of information.

We need better search engines for Scientific Papers. Deep Learning powered search engines, question answering systems, or even chatbots and summarizers seem possible. We have data, and we have a…


This article will explain an exciting development in Natural Language Processing. The paper presents a Semi-Supervised Learning algorithm that significantly improves RoBERTa’s performance with Self-Training. If you prefer a video explanation of the paper, please check this out!

Transfer Learning has been extremely successful in Deep Learning. This describes initializing a Deep Neural Network with weights learned from another task. In Computer Vision, this other task is commonly ImageNet Supervised Learning. In Natural Language Processing, this other task is commonly Self-Supervised Language Modeling with an internet-scale corpus.

The success of Transfer Learning has inspired Deep Learning researchers to explore…


DeepMind’s MuZero algorithm reaches superhuman ability in 57 different Atari games. This article will explain the context leading up to it!

DeepMind recently released their MuZero algorithm, headlined by superhuman ability in 57 different Atari games.

Reinforcement Learning agents that can play Atari games are interesting because, in addition to a visually complex state space, agents playing Atari games don’t have a perfect simulator they can use for planning as in Chess, Shogi, and Go.

This idea of a “perfect simulator” is one of the key limitations that keep AlphaGo and subsequent improvements such as AlphaGo Zero and AlphaZero, limited to Chess, Shogi and Go and useless for certain real-world applications such as Robotic Control.

Reinforcement Learning problems are framed within…


This article explores changes made in StyleGAN2 such as weight demodulation, path length regularization and removing progressive growing!

The first version of the StyleGAN architecture yielded incredibly impressive results on the facial image dataset known as Flicker-Faces-HQ (FFHQ). The most impressive characteristic of these results, compared to early iterations of GANs such as Conditional GANs or DCGANs, is the high resolution (1024²) of the generated images. In addition to resolution, GANs are compared along dimensions such as the diversity of images generated (avoiding mode collapse) and a suite of quantitative metrics comparing real and generated images such as FID, Inception Score, and Precision and Recall.

Facial images generated from StyleGAN2

Frechet Inception Distance (FID) is one of the most common automated metrics used…


GPU accelerations are commonly associated with Deep Learning. GPUs power Convolutional Neural Networks for Computer Vision and Transformers for Natural Language Processing. They do this through parallel computation, making them much faster for certain tasks compared to CPUs.

RAPIDS is expanding the utilization of GPUs by bringing traditional Machine Learning and Data Science algorithms, such as t-SNE or XGBoost, to GPUs.

This article will compare t-SNE implementations between RAPIDS-cuml (GPU) and Sklearn (CPU): resulting in 3 seconds vs. 30 minutes.

Hardware: This experiment was run on the Data Science PC by Digital Storm

t-SNE visualization of intermediate CNN features

t-SNE is an algorithm for visualizing high-dimensional data. Shown above is an example of transforming the 512-dimensional vectors from intermediate CNN activations into 2-dimensional vectors. Each…


Before starting this article, I want to ease your skepticism of switching from pandas to RAPIDS cudf, RAPIDS cudf uses the same API as pandas!

RAPIDS is moving traditional Data Science workflows on tabular datasets to GPUs. Recently, George Sief posted an article on Towards Data Science showing that the RAPIDS cudf library can compute the mean value in a given column containing 100M rows in 5.12ms vs. 82.2ms on pandas. This article will further explore the speedups achieved with RAPIDS and cudf in the context of feature engineering for the Kaggle NFL Data Bowl challenge. …


Data Science PC by Digital Storm

The recently announced Data Science PC from Digital Storm is a very interesting step forward in the future of Artificial Intelligence and Deep Learning. This article will highlight the power of the 2 Titan RTX GPUs on the PC in tangent with the easy syntax of Tensorflow 2.0’s new Distributed Training API for Computer Vision applications! In this example, distributed training achieves a surprising ~2.5x speedup, averaging 63s / epoch vs. 143s / epoch compared to training a single GPU on the same machine.

This end-to-end tutorial will build a binary image classifier to process video frames. The motivation behind…

Connor Shorten

Check out my Deep Learning YouTube Channel! https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store