Lecture by Richard Born at Harvard University. My personal takeaway on auditing the presented content.
Course overview at https://klab.tch.harvard.edu/academia/classes/BAI/bai.html
Biological and Artificial Intelligence
Warren Weaver was the head at the Rockefeller Center in the 1950s and he said the future of engineering is to understand the tricks that nature has come up with over the millennia.
Anatomy of visual pathways
The visual system spawns large areas of the brain and often any damage to the brain causes malfunctions of the visual system.
The world is mirrored on the retina. About one million axons are connected to the retina. Vision is also connected to the brainstem to orient the head in space. This is a semi-automatic system to pay attention. It also connected to the Cycadian rhythm to manage sleep cycle through brightness.
The important point in primates is the V1 striate area 17. A lession here makes humans blind. A brain has no visual understanding. It only produces action potentials that are interpreted to be visual. In monkeys, there are more than 30 visual areas roughly grouped into two. The ventral stream (down) is concerned with the what (object recognition) and the dorsal stream (up) is concerned with the where (spatial perception). Retinotopic representations are aligned with the retina space but object recognition ought to be object-centred. How the brain converts this to world coordinates is still an open question. Mishkin showed in 1983 that monkeys taught to associate food with a specific object or with a specific location solve tasks at random if they had a lesions in the respective brain area.
Receptive Fields
The sensory epithelium can influence a given neuron’s firing rate. Hubel and Wiesel showed that the
Lateral geniculate nucleus (LGN) excited by a light signal is triggered. Hartline showed that surround suppression helps to locate points of interest. The brain is interested in points in the visual space where the derivative is not zero. Brains locate contrast (space), color contrast (wavelength), transience (time), motion (space&time), and space&color.
Hierarchical receptive fields
There is a hierarchical elaboration of receptive fields. Hubel & Wiesel also measured the signal in the primary visual cortex and found that the neurons encode orientation of an edge with a stronger off response on one side but no response to diffuse edges. Essentially, we can think of the neurons as a filter or a convolution (simplification). A brain does it in parallel in contrast to a computer. Horace Barlow noted that the brain focuses on suspicious coincidence (e.g. unusual changes).
We go from LGN (center-surround) to simple cells (orientation) to complex (contrast invariant across area or pooling/softmax). In the 1950s, the psychologist Attneave found the 17 points of maximal curvature on an image of a cat and connected the lines and produced an abstract representation that was recognizable as a cat.
Convolutional Neural Networks
An engineered alternation of selectivity (convolution) and generalization (pooling) has led to great success early on in vision research but then came deep networks. However, deep networks actually does apply convolutions, rectification (ReLU), pooling, and lastly normalization. The non-engineered application of these features improved performance.
Yamins et al., 2014 showed that alexnet has some non-trival similarity with monkey brains in the ventral stream visual areas.
What is missing?
Adding noise to images, CNNs failed quickly at 20% noise whereas human performance reduced gracefully at the level of noise. Even worse, the CNNs can learn solutions with specific kind of noise but end up failing if the noise changes.
In Ponce et al., 2019, random codes are fed to a generative neural network to synthesize images. The neuron is used as an objective function to rank the synthesize images. A genetic algorithm is applied to find the codes that maximally trigger the neurons.
PredNet predictes on videostreams with unsupervised learning.
Tootell and Born showed in 1990 that clustering visual cortex the data is still very retrinotopic but in the MT it is organized in hypercolumns to detect motion direction rather than spatial connectivity.
Neurons near each other seem to like to do the same. Brains are not just look-up tables but have a sematic structure the spatial organisation.