Neuro 140/240 – Lecture 7

Lecture by Jan Drugowitsch at Harvard University. My personal takeaway on auditing the presented content.

Course overview at

Biological and Artificial Intelligence

A simple understanding would believe that a brain has a state has is changed via a function and input to produce a behaviour. However, the complexity of the brain makes the function intractable. It is more amenable to make smart hypotheses about how these functions could be structured as a proxy for the real behaviour.

Idea oberserver modelling

The main axiom is that information is uncertain. Typical approaches include boltzmann machines (stochastic Hopfield networks), Bayesian networks, statistical learning (support vector machines) and Variational Bayes and MCMC. Deep learning did not have uncertainty initially but more recent work does include uncertainty.

To understand the environment, the brain needs an understanding of uncertainty. If we can understand how the brain represents and uses uncertainty, we can improve AI algorithms.

Based on Bayesian decision theory, uncertainty is handled by having a prior on the state of the world P(sw) and an observation with sensory evidence P(es | sw ) provides a posterior P(sw | es ). The P are functions over multiple states.

A typical application in the brain is to combine uncertain evidence from multiple sources such as audio and visual information. Each source is providing an estimate of the value and an uncertainty estimate. The linear combination of estimates is usually sharper and is biased towards the more certain estimates.

Priors allow us to explain several optical illusions (Weiss, Simoncelli & Adelson, 2002). We seem to have a preference for slow speed priors meaning that the barber shop illusion is caused by our preference for the slower upward velocity in contrast to the sideway velocity.

Neuro 140/240 – Lecture 5

Lecture by Tomer Ullman at Harvard University. My personal takeaway on auditing the presented content.

Course overview at

Biological and Artificial Intelligence

The development of intuitive physics and intuitive psychology

Turing proposed that an AI could be developed very much like a human – from a empty notebook or child to a developed adult. However, early development research in psychology and cognitive science has shown that the note may not be “empty”. There is some core knowledge that seems to be either innate to or very early developing in humans but the notion is generally contested and still under research.

Evolutionarily, it may make sense to kickstart a “new” being with some innate knowledge to give it a head-start instead of having to acquire the knowledge on its own. The core knowledge is very limited to several domains. In core physics knowledge, infants have expectations about objects amongst others permanence, cohesive, solid, smooth paths and contact causality. There is not much more than these principles. At the moment, for these core knowledge expectations, there seems to be more a limitation of how we inquire the knowledge rather than them knowing it. Research is actively conducted to find a lower limit.

Side note: For preverbal infants, surprise is measured by the time they spend looking at something but it can be confused with looking at things or people they are attached to (like parents).

Alternatives to Core Physics?

Physical Reasoning Systems

There could be physical reasoning systems (Luo & Baillargeon, 2005) where visually observable features are evaluated to make a decision what physical result would happen. For infants, it appears that the reasoning system is refined with development. A feedforward deep network by Lerer at al. (2016) trained to evaluate whether a piling of stones is stable but the system did not generalize.

Since 2010 a cognitive revolution of sorts has happened in neural network architecture consisting of decoders, LSTMs, memory, and attention that have become “off the shelves”. Using these systems (Piloto et al., 2018), it is possible to generalize better (51% success classifying surprise).

Mental Game Engine Proposal

Maybe, the human brain works like a game engine that emulates physics to approximate reality (Battaglia et al., 2013, in PNAS). A minimal example is a model, a test stimuli and data. This is an ongoing area of research. A model of physics understanding at 4 months consists of approximate objects, dynamics, priors, re-sampling and memory (Smith et al., 2019) is used to predict the next state which is compared to the real next state. In this context, surprise can be defined as the difference between the prediction and the outcome.

Core Psychology

Mental planning engine proposal

There are also expectation about agents. Infants have ideas about agents goals, actions, and planning.


Models have many possible routes.

  1. Human brains could be the way to model intelligence.
  2. Intelligence can be modelled in another way and need not be human-like.
  3. A general/universal function approximator may eventually converge to human behaviour/ability.
  4. A general/universal function approximator actually represents human behaviour/ability.

The problem is that any input output problem can be represented with a look-up-table and thus have no intelligence. Many models may eventually end up in “look-up-table land” where they don’t learn an actual model but only a simple look-up. These models can be useful to solve some tasks but they do not respond to common sense and fail easily on variation.

At the time, reinforcement learning is only a solution in as much evolution seems to have “used” it to produce human behaviour. But how that worked and what the conditions are to make that work are still unknown and thus reinforcement learning is still not the solution to get human-like-behaviour.

Neuro 140/240 – Lecture 4

Lecture by Cengiz Pehlevan at Harvard University. My personal takeaway on auditing the presented content.

Course overview at

Biological and Artificial Intelligence

Inductive bias of neural networks

A brain can be understood as a network with parameters as 10^11 neurons (nodes) and 10^14 synapses (parameters. Geoffrey Hinton cleverly observed that “The brain has about 10^14 synapses and we only live for about 10^9 seconds. So we have a lot more parameters than(supervised C.P.) data.” But biologist Anthony Zador argues that animal behaviour is not the result of algorithms (supervised or unsupervised) but encoded in the genome. When born, an animals structured brain connectivity enables them to learn very rapidly.

Deep learning is modeled on brain functions. While we cannot answer (yet), why brains don’t overfit, we can maybe understand why modern deep learning networks with up to 10^11 parameters don’t overfit even when they have orders of magnitude more parameters than data. Double Deep Descent implies that at the interpolation threshold – that is one parameter per data point – the test error actually starts to fall again.

Using the simplest possible architecture, we analyze how to map x -> y(x) with two hidden layers and 100 units per layer. We obtain bout 10000 parameters which out to be heavily overparameterized. Any line shape between two data points would be possible but only a line estimation is produced. It is as if the neural network applied Occam’s Razor. It seems that neural networks are strongly biased towards simple functions.

  1. What are the inductive biases of overparametrized neural networks?
  2. What makes a function easy/hard to approximate? How many samples do you need?
  3. Can we have a theory that actually applies to real data?
  4. Hwat are the signatures of inductive biases in natural population codes?

Goal of network Training

Cost function for training: min(theta, 1/2 sum(mu=1,P)(f(x^mu;theta)-f_T(x_^mu))^2 )

However, with more unknowns than equations, we end up with a hyperplane of possible solutions. The gradient descent method end us somewhere on the hyperplane and therefore produces a bias to land at a specific point (based on random initialization). To understand the bias, we need to understand the function space (what can the network express?), the loss function (how do we define good match?) and the learning algorithm (how do we update?).

My own thoughts that I need to check whether they are correct: A neural network projects such a hyperplane into the output space. Therefore the simplest/closest projection is probably a line or approximating a line. The answer is that only for linear regression and special setups. The theta space is to complex, only in the weight space we can argue for linearization.

Can we simplify this to solve this hard problem?

Looking at a infinitely wide network, we can have a look at the function space. In the neural tangent kernel, we can see that that most of the time we produce some thing close to a line. The wider the network, the easier it is to fit the points with the random initialization and only requiring minimal change. Looking at a Taylor-Expansion, we see that wide networks linearizes with respect to the loss produced by the gradient flow.

Kernel Regression

The goal is to learn a function f from X -> R from a finite number of observations. The function f is a reproducing Kernel Hilbert Space (RKHS) – essentially a special kind of smooth function space with an inner product. The regression then is a minimizer with a lamba times inner product term to penalize complex functions. Under RKHS there is a unique solution that approximates the quadratic loss on a zero training error for infinite width neural networks. Kernel Regression is easier to study than neural networks and can shed light on how neural networks work. Eigenfunction of Kernel is an orthonormal basis (like an eigenvector) under RKHS.

The functions that a neural network can express in the infinite width scenario are part of RKHS. Taken the space of these functions, we can say that a neural network of infinite width is just the set of eigenfunctions and select a particular weight.

Own thought: The eigenfunctions are fixed and the weight can be learned like in a perceptron. Can we layer these eigenfunctions layers and get something new, or is that just linearizable as well? The answer is it linearizes again!

Application to real datasets

Image data sets can be reduced to kernel PCA. Take the kernel eigenvalues from KPCA and project target values to kernel eigenbasis to get target weights for the one hot encoding.

Applying the generalization error based on the eigenfunctions, we can look at the relative error of the networks weight compared to the eigenfunctions space weights and produce a spectral bias that tells us which eigenfunctions are primarily selected. The larger the spectral bias, the more a network is likely to rely the particular eigenfunction to produce the output.

We can use KPCA to understand how the data is split and how many eigenfunction (KPCA principal components) we need to discriminate between outputs.

Neuro 140/240 – Lecture 2

Lecture by Richard Born at Harvard University. My personal takeaway on auditing the presented content.

Course overview at

Biological and Artificial Intelligence

Warren Weaver was the head at the Rockefeller Center in the 1950s and he said the future of engineering is to understand the tricks that nature has come up with over the millennia.

Anatomy of visual pathways

The visual system spawns large areas of the brain and often any damage to the brain causes malfunctions of the visual system.

The world is mirrored on the retina. About one million axons are connected to the retina. Vision is also connected to the brainstem to orient the head in space. This is a semi-automatic system to pay attention. It also connected to the Cycadian rhythm to manage sleep cycle through brightness.

The important point in primates is the V1 striate area 17. A lession here makes humans blind. A brain has no visual understanding. It only produces action potentials that are interpreted to be visual. In monkeys, there are more than 30 visual areas roughly grouped into two. The ventral stream (down) is concerned with the what (object recognition) and the dorsal stream (up) is concerned with the where (spatial perception). Retinotopic representations are aligned with the retina space but object recognition ought to be object-centred. How the brain converts this to world coordinates is still an open question. Mishkin showed in 1983 that monkeys taught to associate food with a specific object or with a specific location solve tasks at random if they had a lesions in the respective brain area.

Receptive Fields

The sensory epithelium can influence a given neuron’s firing rate. Hubel and Wiesel showed that the
Lateral geniculate nucleus
(LGN) excited by a light signal is triggered. Hartline showed that surround suppression helps to locate points of interest. The brain is interested in points in the visual space where the derivative is not zero. Brains locate contrast (space), color contrast (wavelength), transience (time), motion (space&time), and space&color.

Hierarchical receptive fields

There is a hierarchical elaboration of receptive fields. Hubel & Wiesel also measured the signal in the primary visual cortex and found that the neurons encode orientation of an edge with a stronger off response on one side but no response to diffuse edges. Essentially, we can think of the neurons as a filter or a convolution (simplification). A brain does it in parallel in contrast to a computer. Horace Barlow noted that the brain focuses on suspicious coincidence (e.g. unusual changes).

We go from LGN (center-surround) to simple cells (orientation) to complex (contrast invariant across area or pooling/softmax). In the 1950s, the psychologist Attneave found the 17 points of maximal curvature on an image of a cat and connected the lines and produced an abstract representation that was recognizable as a cat.

Convolutional Neural Networks

An engineered alternation of selectivity (convolution) and generalization (pooling) has led to great success early on in vision research but then came deep networks. However, deep networks actually does apply convolutions, rectification (ReLU), pooling, and lastly normalization. The non-engineered application of these features improved performance.

Yamins et al., 2014 showed that alexnet has some non-trival similarity with monkey brains in the ventral stream visual areas.

What is missing?

Adding noise to images, CNNs failed quickly at 20% noise whereas human performance reduced gracefully at the level of noise. Even worse, the CNNs can learn solutions with specific kind of noise but end up failing if the noise changes.

In Ponce et al., 2019, random codes are fed to a generative neural network to synthesize images. The neuron is used as an objective function to rank the synthesize images. A genetic algorithm is applied to find the codes that maximally trigger the neurons.

PredNet predictes on videostreams with unsupervised learning.

Tootell and Born showed in 1990 that clustering visual cortex the data is still very retrinotopic but in the MT it is organized in hypercolumns to detect motion direction rather than spatial connectivity.

Neurons near each other seem to like to do the same. Brains are not just look-up tables but have a sematic structure the spatial organisation.

Geometry of Big Data – Tuesday session

All talks are summarised in my words which may not accurately represent the authors’ opinion. The focus is on aspects I found interesting. Please refer to the authors’ work for more details.

Session 1 – Graph-based persistence

The talk On the density of expected persistence diagrams and its kernel based estimation is given by Frederic Chazal. A draft is available on arxiv.

Grow circles around point data to generate a graph whenever other points meet the circle and produce a persistent homology of filtered simplicial complexes (e.g adding edges to possibly change homology). Persistent barcode and persistence diagrams encode the same information produced by this process.

Measures are nicer to work with than sets of points for statistical purposes. If the persistence diagram D is a random variable, then E[D] is a determnistic measure on R². Persistence images reveal E[D] and are more interpretable than persistance diagramms which may be too crowed for visual inspection with a large sample.

Persistence can be used as an additional feature on a dataset. For example, a random sample from the data set can be taken and the persistence diagram/image can be computed and compared between random samples giving us an idea of the stability of the homology.

Session 2 – Log-concave density estimation

The talk Log-concave density estimation: adaptation and high dimensions is given by Richard Samworth. The paper is available at Project Euclid.

To randomly sample a density f_0 there are generally two appraoches parametric and non-parametric methods. A density f is log-concave if log f is concave. The super level sets need to be convex. Univariate examples are normal, logistic and more. The class is closed under marginalisation, conditioning, convlution and linear transformations.

In an unbounded likelihood, the density surface is spiky. The log-concave density addresses this.

Session 3 – Infinite Width Neural Nets

The talk Infinite-Width Bounded-Norm Networks: A View from Function Space given by Nathan Srebro has two parts Infinite Width ReLU Nets and Geometry of Optimization Regularization and Inductive Bias.

Part 1: When we are learning we find a good fit (of weights) for the data. What kind of functions can be approxmiated by Neural Net? Essentially all, but the question is how large does the network have to be to approximate f to within error e. The question should be: what class function can be approximated by low norm Neural Nets? Another question should be: Given a bounded number of units what norm is required to approximate f to within any error e? The cost of the weights is taken as the parameter. This results in linear splines. A neural net with infinite width and one hidden layer solves the Green’s function.

Part 2: How does depth influence this? Deep learning should be considered with infinitive width and implemented with a finite approximation. Deep learning focuses on searching parameter space that maps into a richer function space.

Session 4

The talk Some geometric surprises in modern machine learningis given by Andrea Montanari.

Session 5

The talk Multi-target detection and cryo-EM imaging by autocorrelation analysis is given by Amit Singer.

Session 6

The talk Learning to Solve Inverse Problems in Imaging is given by Rebecca Willett.

Geometry of Big Data – Monday session

All talks are summarised in my words which may not accurately represent the authors’ opinion. The focus is on aspects I found interesting. Please refer to the authors’ work for more details.

Session 1 – Learning DAGs

The talk DAGs with NO TEARS: Continuous Optimization for Structure Learning is given by Pradeep Ravikumar. A draft is available on arxiv.

Learning directed acyclical graphs (DAGs) can traditionally be done in two ways: conditional independence and score-based . The latter poses a local search-problem with out a clear answer. More recently the problem has been posted as a continuous (global) optimisation for undirected graphs.

A loss function is a log-likelihood of the data and we need to find the most appropriate W such that X = XW + E. They provide a new M-estimator.

Session 2 – Parallel transport for data alignment

The talk Data Analysis with the Riemannian Geometry of Symmetric Positive-Definite Matrices given by Ronan Talmon. A draft is available on arxiv.

The talk focuses on how to align data when the intersubject variation is large but consistent and the intrasubject variation could be mapped. Parallel transport has the goal to align the intersubject values on an symmetric positive definite (SPD) embedding in n-dimensional space. SPD matrices are embedded on a hyperbole and all computations can be performed in closed-form.

Data from multiple subject and multiple session, it does not matter whether to first adapt the sessions or the subject – which only works for parallel transport and not with identy transformations.

Session 3 – Persistence framework for data analysis

The talk Metric learning for persistence-based summaries and application to graph classification is given by Yusu Wang. An underlying paper is available on PlosOne.

Persistence diagrams can be used to describe complexity. The features are simpler but persistent to the underlying object. A geometric object through a filtration perspective produces a summary. Filtration is a growing sequence of spaces. The time that sets get created and destroyed can be mapped onto a persistence diagram with death time on the y axis and birth dime on the x-axis.

The bottleneck distance is a matching between two persistence diagram such that each feature is matched with the shortest distance. Features may be matched to a zero-feature (capturing noise) if they are to close to the diagonal. More complex approaches include persistence images that transform the diagram (after transforming it) into a kernel density.

The weight function should be application dependent and thus can be learned instead of pre-assigned. We can just take the difference between two persistence images as a weighted kernel for persistence images (WLPI).

For graphs the following metrics can be used for persistence. The Discrete Ricci curvature captures the local curvature on the manifold. The Jaccard index function compares for nodes who has common neighbors which is good for noisy networks.

In general, a descriptive function must be found for the domain and may even encode meaningful knowledge on how the object behaves. High weights would describe the more distinct features.

Session 4 – Behold the spikes

The talk Proper regularizers for semi-supervised learning is given by Dejan Slepcev.

A d-dimensional point cloud can be converted to a graph representation using a kernel that connects close edges (with a fall-off or discontinuity). As the number of nodes n goes to infinity, the kernel bandwidth should shrink to 0.

The error bandwidth is critical. The take-away is that instead of producing single labeled data points, the label should be extended beyond the kernel bandwidth. A single data label can produce spikes because essentially the minimiser obtains smaller values for a flat surface with a single spike than for an appropriate surface.

Session 5

The talk Solving for committor functions in high dimension is given by Jianfeng Lu.

Session 6 – Finding structure in loss

The talk A consistent framework for structure machine learning is given by Lorenzo Rosasco.

Structured machine learning is not structure learning. It refers to learning functional dependencies between arbitrary output and input data. Classical approaches include likelihood estimation models (struct-svm, conditional random fields, but limited guarantees) and surrogate approaches (strong theoretical guarantees but ad hoc and specific).

Applying empirical risk minimisation (ERM) from statistical learning we can expect that the mean of the empirical data is close to the mean of the class. However, it is hard to pick a class. The inner risk (decomposing into marginal probability) reduces the class size. Making a strong assumption the structured encoding loss function (SELF) requires a Hilbert space and two maps such that the loss function can be presented as an inner product. Using a linear loss function helps. For a crazy space Y (need not be linear) the SELF gives enough structure to proceed. This enlarges the scope of structured learning to inner risk minimisation (IRM).

There is a function psi hidden in the loss function that encodes and decodes from Y to the Hilbert space. The steps are encode Y in H, learn from X to H, and decode H to Y. In linear estimation with least squares, the encoding/decoding disappears and the output space Y is not needed for computation.

About Ujung Kulon: The final Chapter

We arrived in Tamanjaya thinking our car would await us and we would go straight back home. Far from the truth this thought was. Our driver hadn’t arrived yet, but we didn’t worry yet. We did our business like paying for the guide and the boat and getting a free lunch – to get the boat (and the lunch) had been another story, we met the owner and talked to him when we first arrived in Tamanjaya four days earlier. First we talked about general things, then about personal things, last about the boat. We drunk coffee as we talked and later bargained. It was a hard deal, but in the end the extras made the deal. We would have a dinner at his house once we came back and some fresh coconuts – we paid the boat and went back to the ranger station and waited for our driver to arrive.  But as the hours passed by and nobody arrived we started to worry. It happened that we had chartered our boat from the Sunda Jaya Homestay and as nobody arrived the owner invited us to stay with him. He offered us to stay for free one night since our driver didn’t come. So we sit down with him and ate dinner. We talked a lot and had a good time. We would have to stand up early in the morning to catch the first bus back to Jakarta. So we prepared to go to bed early and as we wanted to go to sleep, our driver arrived. Happy as we could be we packed our things, thanked the owner of the Sunda Jaya Homestay for the hospitality and left. Our driver had taken one wrong turn on his way to Tamanjaya which had cost him several hours. At 3 a.m. in the morning we were back in Bogor. An adventure had ended more adventurous as expected.

Als wir in Tamanjaya ankamen, dachten wir unser Auto würde auf uns warten und wir würden geradewegs nach Hause fahren. Wir lagen sehr falsch. Unser Fahrer war noch nicht angekommen, aber wir sorgten uns noch nicht. Wir hatten eh noch einges zu erledigen. Wir bezahlten den Fremdenführer und das Boot, außerdem war da noch dieses kostenlose Mittagessen – das Boot zu mieten (und das Mittagessen zu bekommen) war noch einmal eine Geschichte für sich, wir traffen den Bootsbesitzer als wir in Tamanjaya vier Tage zuvor ankammen. Man unterhielt sich, erst über sehr allgemeine Dinge, dann über persönlicheres und plötzlich war man beim Boot angelangt. Zu den Verhandlungen gab es Kaffee. Wir feilschten hart und am Ende waren die Dreingaben entscheidend. Wir bekammen ein Mittagessen bei unserer Rückkehr und frische Kokosnüsse zum trinken – so bezahlten wir also unser Schiff und gingen zurück zur Rangerstation um auf den Fahrer zu warten. Aber die Stunden vergingne und niemand kam, so begannen wir uns Sorgen zu machen. Glücklicherweise war der Bootsbesitzer zugleich auch der Besitzer des Sunda Jaya Homestays und da unser Fahrer nicht kam, bot er uns an bei ihm zu bleiben. Wir könnten eine Nacht um sonst übernachten, da unser Fahrer nicht auftauche. Wir setzten uns mit ihm zusammen und aßen zu Abend. Wir redeten viel und hatten einen unterhaltsamen abend. Da wir früh aufstehen müssten um rechtzeitig den bus nach Jakarta zu erwischen, wollten wir früh ins Bett. Gerade als wir uns hinlegen wollten, kam unser Fahrer an. Überglücklich packten wir unsere Sachen zusammen, bedankten uns bei Besitzer des Sunda Jaya Homestays für seine Gastfreundschaft und fuhren ab. Unser Fahrer hatte einmal eine falsche Abbiegung genommen, was ihn mehrere Stunden gekostet hatte. Um 3 Uhr morgens kammen wir in Bogor an. Ein Abenteuer, das abenteuerlicher geendet hatte als erwartet.

Cuando llegabamos a Tamanjaya pensabamos que solamente tuvieramos que sentarnos en el carro y nos pudieramos irnos a casa. Pero no pasó asi. Nuestro Conductor no llegaba pero aún no nos preocupabamos. Primero tuvimos que hacer unas cosas. Pagar el guía y el bote, comer el almuerzo – rentar ese bote ( y obtener el almuerzo gratis) es otra historia. Nos encontrabamos con el dueño del bote cuando llegabamos en Tamanjaya cuatro días antes. Hablabamos, primero en general, luego sobre cosas privadas y al final del bote. Todo el tiempo tomabamos café. Barateabamos por un rato y finalmente nos quedamos con un precio aceptable y un almuerzo y cocos frescos – asi que pagabamos el bote y regresabamos a la estación de los rangeres para esperar el conductor. Las horas pasaron y nadie llegó y al final empezabamos preocuparnos. Por suerte el dueño del bote era al mismo tiempo el dueño del Sunda Jaya Homestay y nos invitó a quedarnos en el homestay ya que nuestro conductor no llegó. Cenabamos con el y hablabamos por un rato. Era un buen tiempo. El bus a Jakarta sale temprano así que nos queríamos dormir temprano. Al estar listo para dormirnos el conductor llego. Muy felices cogimos nuestras cosas, dijimos gracias al dueño del Sunda Jaya Homestay por su hospitalidad y nos ibamos. El conductor se iba una vez por una mala dirección y le costaba unas horas. A las 3 de la mañana llegabamos a Bogor. Una aventura que terminó más aventurera que expectado.

[imagebrowser id=62]

Ujung Kulon: The Beach and the Moon

At the first shelter we had some hours to spend at the beach which we gladly did. Silently we sit there for hours, just watching the waves and the clouds passing by. It was silent at that beach, only the surf was to be heard. Inner peace comes upon you when you’re in such a place. Later on I walked the beach for a while, I found some crabs, curious plants and an indescribable atmosphere. With the sunset coming, the scenery was perfect. The colours ranged from brightest yellow to the deepest red. Hypnotized by the colors it took me a while to realize the moon. What a great first day in the wilderness.

Nachdem wir am ersten Unterstand angekommen waren, hatten wir etwas Zeit, die wir am Strand verbringen konnten und taten dies auch überglücklich. Dort saßen wir also, Stunde um Stunde, ohne Mucks. Wir beobachteten die Wellen und die Wolken die vorbeizogen. Es war still am Strand, nur die Brandung säuselte leicht. An solch einem Ort findet man den inneren Frieden. Später dann bin ich den Strand entlang gelaufen, ich begegnete Krebsen, seltsamen Pflanzen und einer unglaublichen Atmosphäre. Als der Sonnenuntergang dann anfing, war die Szene perfekt. Die Farben reichten vom hellsten Gelb zum tiefsten Rot. Verzaubert durch die Farben, bemerkte ich den Mond erst nach einer Weile. Was für ein toller erster Tag in der Wildnis.

Llegando al primer refugio tuvimos unas horas que pasar por la playa y lo hicimos con gusto. Silenciosamente estabamos sentados por alla por horas simplemente observando las olas y las nubes pasandonos. No se escuchaba nada más que el oleaje. En aquellos lugares uno encuentra la paz en si mismo. Luego caminaba por la playa y encontró a cangrejos, plantas raras y una atmósfera indescriptible. Cuando el sol atadeció se perfeccionó el escenario. Los colores iban del amarillo luminoso al rojo oscuro. Encantado por los colores me costaba darme cuenta de la luna. Que buen primer dia en la selva.

[imagebrowser id=57]