Nov 11, 2022

Avoid catastrophic forgetting with Bondzai’s event-driven continuous and real-time embedded machine learning

This paper describes how Bondzai’s event-driven continuous learning permits to avoid catastrophic forgetting problems in deep neural networks.

Deep learning of static datasets

Existing deep learning artificial intelligence technologies are all based on remote training of neural networks using substantial computing resources and huge databases. In any case, these models are not adapted to handle functioning variabilities (hardware or due to changes in the local environment). Their adaptation to changes in environment and noise is not easy.

All these limitations have a mathematical origin: networks are predefined parametric functions whose parameters must be identified through the solution of an optimization problem assimilating a database. Of course, this dataset cannot be embedded to adapt the model to local changes in the device environment.

Deeplomath, Bondzai’s deep learning engine, remove this fundamental lock by introducing an additive (as in 3D printing) non-parametric network generation algorithm that no longer separates the architecture part of the network and the identification of the weights. The algorithm uses a direct (and not iterative) method for identifying the network parameters. This is a fundamental point because the gradient-based learning algorithms, used by everyone, are not suitable for real-time adaptation of the model to changes in environments. By direct we mean like when solving a linear system using a direct method instead of iterative. In the case of network generation, the algorithm leads to an additive network design similar to 3D printing.

This algorithm makes it possible to rebuild the network directly on microcontrollers with everything locally processed and no more need for remote access.

What about catastrophic forgetting?

Catastrophic forgetting happens, for instance, when one wants to adapt a neural network to new incoming information thus pushing the weights away from values which were obtained training the network on the initial big dataset. Remember, this dataset is no longer there as it is big and cannot be locally handled and in all cases, handling a big dataset would be costly and would introduce huge latency. This forgetting is therefore natural and wouldn’t be a problem if the re-training was event-driven using solely local data (so zero dataset) and if it was quasi-real-time: well this is exactly what Bondzai’s Deeplomath brings to AI.

Therefore, not only we do not worry about catastrophic forgetting, we embrace it as Deeplomath networks are by design evanescent ephemeral beings destined to live as long as the environment has not changed.

Catastrophic forgetting and sequential tasking

The previous situation was an example of sequential learning. Asking the model to achieve different sequential tasks is another example, which is marred again by catastrophic forgetting of how previous tasks have been achieved as the network weights are now quite far from what was suitable there (this is supposing that a same architecture of the network is suitable for all the tasks, which is a very strong hypothesis).

Davinsy answers to this problem is divide and conquer. So, instead of asking a same usually rather big model to achieve several tasks, Davinsy builds and continuously updates a dedicated model for each of the tasks, each having the suitable architecture. An example of this is addressing a multi-label problem coupling several coupled multi-class models.

This is done by the Davinsy’s Application Service Layer offering a high-level software interface to describe the problems resolution and the conditional sequence through sequence of virtual models. “Virtual” means these are templates, describing how to generate the evanescent models by the embedded Deeplomath Augmented Learning Engine (DALE).

To define how those models connect we introduce “Application Charts” that are executed by the Application Service Layer. Application chart describes how the data flows through the models, choosing which model to execute following the outcome of previous inferences in the sequence. Of course, the coupling can be multiple and not necessarily between successive closest models.