originally a numenta.com email newsletter (May 5, 2010)
A message from Jeff Hawkins:
Hi everyone,
I would like to update you on recent advances in Numenta's algorithm development.
Numenta was founded to create a computing platform based on the principles of Hierarchical Temporal Memory (HTM). The core ideas of HTM were described in my book "On Intelligence". Interest in hierarchical learning models has grown in popularity over the past few years. We continue to believe these principles are fundamental to building truly intelligent machines and solving many machine learning problems.
A key part of HTM theory is that each node in an HTM hierarchy learns sequences of patterns over time, and passes stable representations of those sequences to nodes higher in the hierarchy. This process is analogous to learning melodies and passing the name of the current melody to the next node. Learning and representing sequences is the essential ingredient in forming invariant representations and making predictions.
Over the last five years we have tried many approaches to learning sequences and forming stable representations of them. All the methods we tried worked to some extent but all had shortcomings. I won't review all the issues we faced but suffice it to say that real world data is complex and messy. The challenge we face is analogous to learning melodies by listening to snippets of music and noise all occurring at the same time.
All releases of NuPIC (our development environment) and our Vision Toolkit were based on the best node learning algorithms we had at the time, but we knew there were limitations and that we had to do better.
Last fall I took a fresh look at the problems we faced. I started by returning to biology and asking what the anatomy of the neocortex suggests about how the brain solves these problems. Over the course of three weeks we sketched out a new set of node learning algorithms that are much more biologically grounded than our previous algorithms and have the promise of dramatically improving the robustness and performance of our HTM networks. We have been implementing these new algorithms for the past six months and they continue to look good.
We call these new algorithms FDR for Fixed-sparsity Distributed Representations. The name refers to a core feature of the new algorithms. The FDR algorithms force representations to be large sparse vectors, where the level of sparsity (the percentage of active bits) is relatively fixed. For example, we might have 10,000 bits (or neurons, if you prefer) of which roughly 1,000 are active at any time, not significantly more, not significantly less. Thus the "sparsity" is fixed.
The new FDR algorithms learn sequences and form stable representations of sparse distributed representations. The methods are elegant, robust, and a little non-intuitive. Here are some properties of FDR representations and algorithms.
High capacity: With even just 100 bits active out of 1,000 bits total, you can represent a very large number of unique objects.
Robust to noise: You can add lots of noise or drop out many bits and still be able to discern a pattern, even when you are differentiating tens of thousands of patterns.
Representations are easily compared for semantic meaning: If two FDR representations have even a modest amount of overlap over chance, they will have similar meanings. Comparing two FDR patterns tells you how they are similar and how they are different.
On-line learning: FDR can learn continuously, even while it is recognizing. Our previous algorithms had separate learning and inference modes. Brains can learn all the time. On-line learning is important for many applications where the statistics can change over time.
Variable order sequence memory: "Variable order" means that sequences can be of varying lengths. Sometimes you need to go back a long way in time to make a prediction and sometimes you only need to go back a tiny bit in time. FDR figures this out automatically.
Sub-sampling: One important property of sparse distributed representations is that knowing only a few active bits of a representation is almost as good as knowing all of them. Nowhere in the FDR algorithms do we store copies of entire patterns. Learning is based on small sub samples of patterns which, among other things, enables new means of generalization.
Everyone at Numenta is currently focused on developing and testing the FDR algorithms. Progress is steady and encouraging, yet we still have a ways to go. In addition to the challenges of achieving our inference benchmarks, the FDR algorithms are more computationally demanding than the previous algorithms, requiring additional work on optimization. As to application focus, we currently are testing the algorithms in vision tasks, but intend to broaden to non-vision tasks as the work progresses.
Let me briefly describe our communications plan going forward. We hope to provide regular updates to you as our work progresses. We plan on documenting both the biological basis of the FDR algorithms as well as how we have implemented them in software. We also may create a series of web-based talks as an alternate way of sharing our progress. Of course, as soon as we feel that the algorithms are in a state that developers can use them, we will provide a software release. We do not yet have a schedule for the release, but will keep you posted.
If you want to get a flavor for FDR, I invite you to watch a talk I gave at the University of British Columbia recently.
Thank you for your ongoing interest in Numenta. I am excited about the development of our next generation algorithms, and look forward to sharing more of our progress over the course of this year.
Jeff Hawkins
Questions
- Can a memristor/FPGA combination implement this algorithm?
