Novelty Producing Synaptic Plasticity
Based on , the readers are referred to the paper for the details.
The goal of the Novelty Producing Synaptic Plasticity (NPSP) is to allow learning in Artificial Neural Networks (ANNs) by producing novel behaviors in the cases where there is no reinforcements.
Consider a maze navigation task shown on the right. This task requires the agent to find the button area to open the door, and navigate to the second room to find the goal location.
We refer to this environment as deceptive maze since the use of a direct performance measure (i.e. distance to the goal position) will not provide a good performance measure. Furthermore, maximizing exploration may also be deceptive due to the requirement for pressing button first.
In order to find the goal within the deceptive maze environment, it may be required for agent to try different sequences of behaviors. Thus, we use NPSP rules to produce as many novel behavior as possible by performing changes in ANNs based on the Hebbian Plasticity, which performs synaptic changes on the connection weights based on the local activations of neurons.
One of the main differences of the NPSP from other population based exploration algorithms that does not use direct measure of fitness (i.e. Novelty Search and Map-Elites) is that it considers updating the connection weights of a single ANN without storing history of behaviors.
Defining how to change weights of the ANNs using the NPSP rules is a challenging task. Therefore, we use genetic algorithms to find/optimize these rules. The rules are represented as binary strings that specify how to change a connection between two neurons depending on their activations within a certain period.
Starting from a randomly initialized population of NPSP rules, we apply evaluation, selection and reproduction with variation steps iteratively to find the NPSP rules that performs synaptic changes to produce as many novel behaviors as possible.
We trained recurrent neural networks (RNNs) using the NPSP rules and compared with random search. The results showed that the NPSP rules are capable of producing more novel behaviors relative to the random search. Video recordings of some of the behaviors found by the NPSP rules are shown below.
 Yaman, A., Iacca, G., Mocanu, D. C., Fletcher, G., & Pechenizkiy, M. (2020). Novelty Producing Synaptic Plasticity. GECCO'20 Companion. arXiv preprint arXiv:2002.03620.