| Kenneth W. Church |
Thomas J. Watson Research Center
Abstract: When Minsky and Chomsky were at Harvard in the 1950s, they started out their careers questioning a number of machine learning methods that have since regained popularity. Minsky's Perceptrons was a reaction to neural nets and Chomsky's Syntactic Structures was a reaction to ngram language models. Many of their objections are being ignored and forgotten (perhaps for good reasons, and perhaps not). Future work ought to characterize what deep nets are good for (and what they aren't good for). Can we come up with a theory of generative capacity for deep nets? How much more can we generate with more layers? In practice, deep nets have been effective in vision, speech and machine translation, where (1) we have lots of data, (2) representations and scale don't matter much, and (3) nothing else has been all that effective. Conversely, deep nets are probably less appropriate when representations have been reasonably effective (e.g., symbolic calculus), or for large problems beyond finite-state complexity (e.g., sorting large lists, multiplying large matrices).