You never know what a meeting for a quick coffee in Palo Alto can turn into.
What was supposed to be an ‘informal’ chat (if there is such thing when talking with PhD’s) about feedforward-feedback machine learning models, turned into a philosophical discussion on duck-rabbit paradigm shifts.
(disclaimer 1: I’m just a nerd without credentials either topic you choose, with a genuine interest though)
First, the theory:
Thomas Kuhn described the nature of scientific revolutions back in 1962 (his book The Structure of Scientific Revolutions).
A contrarian back in the time, as he re-defined progress by moving from development-by-accumulation (on pre-established assumptions) into paradigm shifts, or revolutions in scientific progress by looking into anomalies inferring a drastic change of assumptions.
In other words, Kuhn advocated for a change of rules over the pre-existing framework as the ultimate scientific progression method.
The ethos of the scientific progress theory rests on identifying the right anomalies which support new paradigms. Anomalies come up as revolutions in disguise and, utterly (> and I love this> ), expand on the previous paradigm which ends up nested within remaining perfectly valid.
Anomalies create rejection by opposition (it’s a Duck!, no is not, it’s a Rabbit), but after the new paradigm takes over (…I can see the Duck now ?!?) both paradigms co-exist (it’s a duck AND a rabbit!, illustration above)
Ok, fair enough on the lecture, where is this Machine Learning anomaly this click-bait headline was all about?
It’s coming, bear with me, will be worth the reading while we get there, first, a couple of jaw-dropping-no-longer-anomalies-but-paradigm-shifts: the first one explains the origin (and meaning) of life, the second one may redefine physics forever.
1. Dissipation-driven adaptation
Jeremy England. MIT, biophysics
This incredibly simple idea is intuitively so powerful and makes so much sense that is difficult to resist. It explains Darwinian evolution and survival of the fittest, ultimately dwelling on the inherent reasons why life comes to exist.
At an intuition level, in Jeremy’s words:
Jeremy, a MIT’s researcher, has developed a mathematical model based on current physics, exerting that a given set of atoms, exposed to a continuous source of energy (i.e the Sun), surrounded by a hot bath (i.e the Ocean) will self-organize to dissipate energy in the most efficient way (i.e life).
We, carbon-based lifeforms, in Spock’s vulcan language, are much better at dissipating heat than inanimate objects. Both living and non-living organisms show this efficiency driven, self-organizing dissipation behavior.
Photosynthesis and self-replication (of RNA molecules, precursor to DNA-based life) are consequences of dissipation driven adaptation. Photosynthesis is about capturing sunlight energy transforming and storing it chemically (sugar) so it can be transported and reprocessed for plant growth and replication (hence forests).
Don’t believe it yet? see for yourself, here is Dr. Hubler’s Stanford professor experiment on self-wiring ball bearings, an example of dissipation driven matter structure reorganisation.
2. Timeless physics
Julian Barbour. British Physicist. Quantum gravity.
Remember the school/college days?: Speed = space / time, power = energy / time, theorem of calculus df/dt, Maxwell’s equations, Einstein’s relativity, Thermodynamics, etc, etc. In physics, anything dealing with change, requires t (time) as a variable, isn’t it? …may be not any more.
How is it possible anyone dares to defy physics by removing time from centuries old proven equations?
If you think about it, time is just an abstraction we use to facilitate our understanding of how things (matter in particular) transitions from one state to another (change). Because we live in an universe governed by the 2nd law of thermodynamics (fighting an increasing entropy) we perceive linear time as our most reliable and dependable reference.
At an intuition level, if we look at the Universe as a simple but immense ‘cloud’ of matter in permanent change (motion) since the big bang occurred, then, if we reduce our view to atoms transitioning for one state to another, you could remove time entirely.
Our Universe could be viewed as a continuum of matter in ‘motion’ (actually, according to Barbour, not motion, but matter in permanent change, removing in full the spacetime continuum).
Our senses and limited computing capacities can’t deal with such enormous entity so we take partial ‘pictures’ with a reference point (time) to deal with reality and make sense out of it (a constrained and partial view).
Another intuitive line of thought, if Newton’s physics were based on linear time (absolute fixed time), and then Einstein’s relativity made time relative, hence flexible (unlocking a bigger scope for physics), what if we make time super-mighty-flexible to the point of making it irrelevant? wouldn’t this even offer an even wider and extended view as we remove the constraints of a time dimension itself?
…. and now, for something completely different (Monty Python)
3. Flexible recognition in machine learning
Tsvi Achler. Neuroscience (PhD), Medicine (MD), Electrical Engineering (BS-EECS Berkeley) — Optimizing Mind
Our brains are ‘computationally flexible’, this means we can immediately learn and use new patterns as we encounter them in the environment.
We actually ‘like’ to develop those patterns, as we unleash our curiosity, see and try new things for the sake of enjoyment.
Learning, tasting and traveling feed our brains with new patterns. Riding a hover-wheel, flying a drone, speaking to Amazon echo or playing a new game are examples of behaviors where our brains confront and develop new patterns for different uses and purposes.
Now, let’s look into it from a machine learning perspective:
(disclaimer 2: as said at the beginning of this post I’m just a nerd without credentials trying to convey the message. Standing in the shoulder of giants when I wrote what you’re about to read on)
Tsvi Achler has been studying the brain from multidisciplinary perspectives looking for a single, compact network with new machine learning algorithms and models who can display brain phenomena as seen in electrode recordings, performing flexible recognition.
The majority of popular models of the brain and algorithms for machine learning remain feedforward and the problem is that even when they are able to recognitze they are not optimal for recall, symbolic reasoning or analysis.
For example you can ask a 4 year old why they recognized something the way they did or what do they expect a bicycle to look like. However it is difficult to do the same with current machine learning algorithms. Let’s take the example of recognizing a bicycle over a dataset of pictures. A bicycle, from a pattern perspective, would consist of two same or similar size wheels, a handle, and some sort of supporting triangular structure.
In feedforward models the weights are optimised for successful recognition over the dataset (of a bicycle in our example). Feedforward methods will learn what is unique within a bicycle compared to all other items in the training set and learn to ignore what is not unique. The problem is that subsequently it is not easy to recall what are the original components (two wheels of same of similar size, a handle, a supporting triangular structure) that may or may not be shared with other items.
Moreover when something new must be learned, feedforward models have to figure out from what is unique to the new item but not to the bicycle and other items it already knows how to recognize. This requires re-doing learning and rehearsing all over the whole dataset.
What Tsvi suggests is to use a feedforward-feedback machine learning model to estimate uniqueness during recognition by performing optimization on the current pattern that is being recognised, and determining neuron activation. (this is NOT optimization to learn weights by the way).
With this distinct model, weights are no longer feedforward, learning is more flexible and can be much faster, as there is no need to do rehearsal over the whole dataset.
In other words, this model is closer to how our brain actually works, as we don’t need to rehearse a whole dataset of samples to recognize new things.
Think about it, how many samples of the much hyped hoverwheels do you need to see first before recognizing the next one on the street?. Same for a bicycle.
And, the most important thing, with feedforward-feedback models learning happens with significantly fewer data.
Much less data required to learn, an much faster learning.
Optimization during recognition displays also properties observed in brain behaviour and cognitive experiments, like predicting, oscillations, initial bursting with unrecognized patterns (followed by a more gradual return to the original activation) and more importantly even, speed-accuracy trade off (so here is your catch if you were looking for it).
I met Tsvi for the first time at a talk in Mountain View: available here. I will be helping him and his startup along his journey which (as all new ventures) starts with funding, so if anyone has an interest or wants to know more please do not hesitate to reach out and leave a message for Tsvi or me in the comments, or even better, tweet me at @efernandez.
Thanks also to Bart Peintner, Co-founder & CTO at Loop.ai, for his advice, insights and shared interest for the ideas mentioned in this article (note-to-ourselves: keep always bandwidth in your mind to entertain challenging ‘anomalies’)