This repo contains links and instructional materials to help prepare for industry machine learning (ML) interviews (data/applied/research scientist, ML engineer, etc). It is primarily aimed at Master's and Ph.D. students. While there are innumerable resources online for learning ML, this amalgamation of courses, papers, blog articles, and Twitter threads form daunting list of concepts and algorithms. The goal of this repo is to distill a much smaller list of topics that are useful for the recruiting process in graduate-level internships and full-time roles. Roughly, I see this as five topics.
-
Statistics Basics: Theoretical and applied statistics concepts such as deriving the distribution of a random variable, testing for equality of predictive performance, etc. See
statistics.md. -
Implementation Basics: Simple algorithms which you may be asked to implement on the spot. Expect, for example, to be able to implement
$k$ -nearest neighbors or$k$ -means clustering from memory. -
Trade-offs: Enumeration of various trade-offs between desirable properties that come up in discussions, such as bias-variance, precision-recall, and accuracy-fairness. See
tradeoffs.md. - Advanced Topics: Modern methods and problems in machine learning. See CSE 599i taught during Autumn 2020 at the University of Washington for a survey of generative models, which will in turn cover many relevant topics in deep learning. Pay particular attention to the transformer architecture, and try to implement it yourself.
-
Applied ML: Examples of a question in which the interviewer asks you to design an ML solution to create a business product. See
applied.md.
This material is a work in progress, and I am happy to receive corrections and feedback! Please see my website for contact information.