Computational neuroscience for learning from a small sample
 
Abstract 
Deep neural networks have been remarkably useful for image 
classification and phoneme recognition. Combined with reinforcement 
learning algorithms, deep neural networks have outperformed human 
experts in simulated video games and the game "Go". To achieve such 
successes, millions of images, hundreds of millions of phonemes, and 
tens of millions of games have been utilized as training data sets in 
the supervised learning or training trials in the reinforcement 
learning. Meanwhile, in the 2015 DARPA robotics challenge final 
competition (2015 DARPA Robotics Challenge Finals), many humanoid robots
 fell while walking on sand, going up stairs, turning bulbs, or getting 
out of a car. A small number of humanoids completed all the tasks, but 
they were extremely slower than humans. By age 5, human infants are able
 to execute all of the above tasks more quickly and reliably than 
humanoid robots developed by world premier researchers. What could be 
the reasons of this dramatic contrast between success and failure for 
simulated versus real-world tasks by artificial intelligence? In the 
simulated video games and "Go", the degrees of freedom of the controlled
 system were relatively small, there were no hidden variables, and state
 transitions were deterministic without noise and perfectly described by
 simple rules. Thus, the computer simulations were exactly correct 
without errors. For the final reason, tens of millions of simulated 
games are generated by software players, and they can be used 
efficiently for DeepQ learning (a Q-learning algorithm of reinforcement 
learning combined with deep neural network learning). In contrast, a 
humanoid robot in the real world is a complicated nonlinear dynamical 
system with huge degrees of freedom. Indeed, hidden states can be 
situated far above measured sensory signals and far below issued motor 
commands. Many physical processes, including contact and friction, are 
difficult to model. Mainly for the final reason, quantitatively reliable
 simulations of humanoid robots in real-world environments are extremely
 difficult even if not impossible. Thus, reinforcement learning in 
humanoids designed to operate in the real world has been typically 
conducted using real experimental trials. However, when humanoids fall, 
they are often damaged such that no further trials can be accumulated 
before painful, expensive and laborious repairs are made. In artificial 
intelligence, or more precisely, in neural networks learning and machine
 learning, it is well established that when a learning system with a 
fixed degrees of freedom n is utilized, approximately 10n training 
samples are necessary. If it is possible to conduct tens of millions of 
learning trials, a large learning system, such as deep neural networks, 
can be utilized. However, if only 100 trials can be accumulated, only 
very simple learning systems with ten degrees of freedom should be 
utilized to avoid over-fitting problems in learning. I postulate that 
these differences in the number of training samples and consequently 
resulting allowed degrees of freedom of the control systems readily 
explain the dramatic contrast between the success of the simulated 
learning and the failure of the real-world learning mentioned above.
    Animal brains are confronted with sensorimotor problems that are 
much more challenging than those faced by humanoid robots. Animal bodies
 are flexible and possess an enormous number of muscles, sensors, and 
motor neurons. Neurons are slow-computing devices with a significant 
degree of noise. Thus, physical modeling of animal movements is very 
difficult, as there are many degrees of freedom, hidden variables, a 
high noise level, and a risk of injury or death in the case of failure. 
The human brain contains 10 to the 11th neurons and 10 to the 14th 
synapses. As a learning control system it has enormous degrees of 
freedom. If we assume that the number of synapses correspond to the 
degree of freedom of the learning system, and that a single 
reinforcement learning trial can be obtained within 10 seconds, then it 
follows that an animal brain will need 10 to the 15th training trials, 
and thus 10 to the 16th seconds for learning time to avoid over-fitting.
 This period is much longer than an animal life. In contrast to this 
estimate, humans learn motor control very quickly. For example, humans 
can learn new dynamic environment within a few trials. Human infants 
learn to walk after only several thousands falls. Through computational 
neuroscience research of sensorimotor learning, I hope to understand a 
mystery to brake the common sense in artificial intelligence: 10 to the 
11th degrees-of-freedom learning system can learn to control an 
extremely complicated nonlinear dynamical system only after 1,000 
failures. Kawato and Samejima (2007) reviewed several computational 
schemes for enabling efficient reinforcement learning from a small 
training samples. They include internal models, sparse estimation 
algorithms, multiple- paired forward and inverse models, and a 
hierarchical reinforcement learning algorithms. Attention, 
consciousness, metacognition, and episodic memory are important research
 topics in cognitive neuroscience, and have recently attracted the 
interests of artificial intelligence researchers with the hope that they
 could provide computational mechanisms to decrease high dimensionality 
of data in learning. They may play essential roles in constructing 
abstract concepts, dimensions and attributes that are high-level 
representations necessary in the upper layers of hierarchical 
reinforcement learning. With respect to reducing the dimensionality of 
high-dimensional data, electrical synapses that transmit information via
 gap junctions are attractive elements in neuronal circuits because they
 tend to synchronize neurons and effectively reduce the degrees of 
freedom of the circuit.
    The cerebellum is important for motor control and motor learning and
 plays very important roles in multi-joint movements such as walking. 
The inferior olivary (IO) nucleus sends climbing fiber inputs to 
Purkinje cell (PC), the only output of all motor coordination in the 
cerebellar cortex, and possesses the highest density of gap junctions in
 the mammalian brain. As a good candidate for a neuronal system that 
plays a central role in motor learning and that may be useful in 
investigating the above-mentioned disparity between the large degrees of
 freedom of learning systems and conditions where only a small number of
 training trials are available, I focus on the olivo-cerebellar system. 
Of special interest is the network of IO neurons, which may control the 
degrees of freedom by adjusting their synchronous/asynchronous firing 
activities to provide an adaptive framework for the learning machinery. 
In the cerebellar motor learning, it has been known that the IO neurons 
transmit error signals to the PC, inducing plasticity at the parallel 
fiber-PC synapses. Recent investigations have also revealed multiple 
plasticity mechanisms as well as evidence that parallel fiber-evoked 
simple spikes to PCs contribute to cerebellum-dependent learning to some
 extent. One dominant view over the last several decades suggests that 
complex spikes transmitted through the climbing fibers provide 
instructive signals to the PCs to drive learning. To examine the 
functions of the IO, computational modeling has been one of the 
promising driving forces. As the carrier of the teaching signals, the IO
 has been modeled to provide the climbing fiber inputs in the simulation
 studies of the cerebellar learning. To explore the IO dynamics in 
detail, a class of simplified conductance-based models has been 
developed to reproduce experimental observation of sub- threshold 
oscillations. Further details of the electrophysiological properties of 
the IO neurons have been described by multiple compartment models, which
 have been applied to elucidate experimental observation of the 
sub-threshold activities, to examine the capability of their information
 transmission, and to estimate conductance levels of the IO network from
 experimental data. Owing to the advanced experimental methods as well 
as the rapid growth in computer power, the computational models have 
been nowadays utilized for quantitative understanding of the 
experimentally measured IO dynamics and furthermore for testing 
hypotheses regarding IO functions. Here, I review recent advances in the
 computational modeling of the olivo- cerebellar system.