Microsoft Research has a recent paper (Machine Teaching: A New Paradigm for Building Machine Learning Systems) that explores the eventual evolution of Machine Learning. The paper makes a clear distinction between Machine Learning and Machine Teaching. The authors explain that Machine Learning is what is practiced in research organizations and Machine Teaching is what will eventually practiced by engineering organizations. The teaching perspective is not only different from the learning perspective, but there are obvious advantages in that concept disentanglement is known a priori:
The paper concludes with three key developments that will be required by Machine Teaching to make progress:
To truly meet this demand, we need to advance the discipline of machine teaching. This shift is identical to the shift in the programming field in the 1980s and 1990s. This parallel yields a wealth of benefits. This paper takes inspiration from three lessons from the history of programming.
The first one is problem decomposition and modularity, which has allowed programming to scale with complexity.
The second lesson is the standardization of programming languages: write once, run everywhere.
The final lesson is the process discipline, which includes separation of concerns, and the building of standard tools and libraries.
If you have been actively following this blog, it should be apparent by now that it has a distinctly software engineering spin towards the application of Deep Learning technology. We are inundated on a daily basis with plenty of astonishing discoveries in Deep Learning. To avoid being overwhelmed, we are specifically seeking the kinds of discoveries that can lead to accelerated development of Deep Learning solutions. This accelerated development, as also alluded to the paper above, will likely mirror the history of Software Engineering.
In any new science or technology, as humans we attempt to frame new concepts into a framework that is more familiar. Deep Learning is one of those newer sciences that many experts are having trouble getting a good grasp of. This is due to our lack of understanding of not only how it works but also the limits of the technology. Our collective theoretical understanding of the field is at its infancy. Most progress has been spearheaded by experimentation and not theory.
Software Engineering (SE) practices have been developed over the past several decades with the primary objective to control complexity. SE is driven by the goal of ‘keeping reasoning under control’. That is, the practice of SE focuses on information boundaries, separation of concerns, modularity and composition to build systems that we can evolve in the context of increasing complexity. Software engineering understands how different components of a system evolve over time at different rates. The principle of Loose coupling is what enables this, which I have written about earlier in the context of Deep Learning.
Peter Norvig of Google has a short video on the difference between conventional software engineering and this new paradigm of Deep Learning development:
Monolithic Deep Learning networks that are trained end-to-end as we typically find today are intrinsically immensely complex such that we are incapable of interpret its inference or behavior. There are recent research that have shown that an incremental training approach is viable. Networks have been demonstrated to work well by training with smaller units and then subsequently combining them to perform more complex behavior. Google’s DeepMind and Microsoft’s Maluuba have made significant progress this year in the above research.
To enable Software Engineering practices in the realm of Deep Learning requires mechanism that support Modularity. This is still a topic of research and there are many recent advances in this area. Research that focuses on Domain Adaptation, Transfer Learning, Meta-learning, Multi-objective systems and Curriculum learning are the key areas.
Francois Collet, developer of Keras, wrote a recent piece on the “Future of Deep Learning” where he makes some speculative predictions on the future. He writes:
At a high-level, the main directions in which I see promise are:
Models closer to general-purpose computer programs, built on top of far richer primitives than our current differentiable layers.
Models that require less involvement from human engineers — it shouldn’t be your job to tune knobs endlessly.
Greater, systematic reuse of previously learned features and architectures; meta-learning systems based on reusable and modular program subroutines.
All of this reflects the current pain points of Deep Learning development being at an extremely experimental and the desire for higher abstractions that lead to increased productivity.
Although Collet starts his presentation from the perspective of a programmer, he concludes with the idea of ‘growing’ a system. Deep Learning systems will most likely not be programmed in the manner that we do today. Rather, it will be more like working with a biological system where we purposely condition the system to achieve our objectives.
The Japanese have an art form called Bonsai where miniature trees are grown. Bonsai doesn’t use genetically dwarfed trees, rather it uses cultivation techniques like pruning and grafting to create trees the mimic adult trees in the small. Wired has an article “Soon We Won’t Program Computers. We’ll Train Them Like Dogs” that alludes to the change in paradigm from that of coding into that of teaching.
So rather than having a library of modular programs that we compose together, we rather have a library of teaching programs that we compose together to train a new system.
The second lesson from the history of programming that Microsoft Researchers allude to is the need for a universal machine that permits the easy porting of Deep Learning models to different servers or devices. I have written previously about the current developments in Deep Learning Virtual Machines. The most active projects in this space is Google’s Tensorflow’s XLA project and Intel Nervana’s NNVM project. In the next few years, we will see the introduction of specialized Deep Learning hardware from many companies (see: GraphCore, Wave Computing, Groq, Fujistu DLU, Microsoft HPU etc.). This new hardware can be exploited only if adequate high level frameworks are available. Many hardware vendors will likely be hit by the brutal reality that they need to spend a significant level of investment in porting existing Deep Learning frameworks to support their hardware. Targeting a universal virtual machine is the easiest route to this, unfortunately the present reality is that this approach is very far from being ready.
The final lesson from the Microsoft Research paper is the need for process methodology. Most of what has been explored to this date focuses on training of Deep Learning systems (see: “Best Practices for Training Deep Learning Networks”). There is very little on the process method of ‘Teaching’. This is of course understandable because our “teaching methods” are still being discovered in Deep Learning laboratories and I predict that it will require at least a year for these tools to achieve a level of maturity required by engineering teams.
Back in 2012, Data Science was labeled as the sexiest job of the 21st century. That prediction was of course before the emergence of Deep Learning into the scene. The sexiest job of the 21st century is likely to be teaching, however not teaching humans, but teaching automation to perform jobs that need to be done. With this, permit me the luxury to coin a new term “Deep Teaching.”