Code languages, such as Python or Java, have become a core application area of Deep Learning. OpenAI and GitHub have recently unveiled “Copilot” and the corresponding paper describing the technology and underlying “Codex” models. Copilot is powered by taking the GPT-3 language modeling show on the road to datasets made of code. These datasets are typically scraped from the GitHub repository of open-source code. Platforms used to help prospective Software Engineers prepare for the coding interview have also been used as well, with Codeforces as a notable source of this data. More particularly Codex collects a filtered 159 GB data…

Connor Shorten

Check out my Deep Learning YouTube Channel! https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store