Academia, Uncategorized

Geometry of Big Data – Monday session

All talks are summarised in my words which may not accurately represent the authors’ opinion. The focus is on aspects I found interesting. Please refer to the authors’ work for more details.

Session 1 – Learning DAGs

The talk DAGs with NO TEARS: Continuous Optimization for Structure Learning is given by Pradeep Ravikumar. A draft is available on arxiv.

Learning directed acyclical graphs (DAGs) can traditionally be done in two ways: conditional independence and score-based . The latter poses a local search-problem with out a clear answer. More recently the problem has been posted as a continuous (global) optimisation for undirected graphs.

A loss function is a log-likelihood of the data and we need to find the most appropriate W such that X = XW + E. They provide a new M-estimator.

Session 2 – Parallel transport for data alignment

The talk Data Analysis with the Riemannian Geometry of Symmetric Positive-Definite Matrices given by Ronan Talmon. A draft is available on arxiv.

The talk focuses on how to align data when the intersubject variation is large but consistent and the intrasubject variation could be mapped. Parallel transport has the goal to align the intersubject values on an symmetric positive definite (SPD) embedding in n-dimensional space. SPD matrices are embedded on a hyperbole and all computations can be performed in closed-form.

Data from multiple subject and multiple session, it does not matter whether to first adapt the sessions or the subject – which only works for parallel transport and not with identy transformations.

Session 3 – Persistence framework for data analysis

The talk Metric learning for persistence-based summaries and application to graph classification is given by Yusu Wang. An underlying paper is available on PlosOne.

Persistence diagrams can be used to describe complexity. The features are simpler but persistent to the underlying object. A geometric object through a filtration perspective produces a summary. Filtration is a growing sequence of spaces. The time that sets get created and destroyed can be mapped onto a persistence diagram with death time on the y axis and birth dime on the x-axis.

The bottleneck distance is a matching between two persistence diagram such that each feature is matched with the shortest distance. Features may be matched to a zero-feature (capturing noise) if they are to close to the diagonal. More complex approaches include persistence images that transform the diagram (after transforming it) into a kernel density.

The weight function should be application dependent and thus can be learned instead of pre-assigned. We can just take the difference between two persistence images as a weighted kernel for persistence images (WLPI).

For graphs the following metrics can be used for persistence. The Discrete Ricci curvature captures the local curvature on the manifold. The Jaccard index function compares for nodes who has common neighbors which is good for noisy networks.

In general, a descriptive function must be found for the domain and may even encode meaningful knowledge on how the object behaves. High weights would describe the more distinct features.

Session 4 – Behold the spikes

The talk Proper regularizers for semi-supervised learning is given by Dejan Slepcev.

A d-dimensional point cloud can be converted to a graph representation using a kernel that connects close edges (with a fall-off or discontinuity). As the number of nodes n goes to infinity, the kernel bandwidth should shrink to 0.

The error bandwidth is critical. The take-away is that instead of producing single labeled data points, the label should be extended beyond the kernel bandwidth. A single data label can produce spikes because essentially the minimiser obtains smaller values for a flat surface with a single spike than for an appropriate surface.

Session 5

The talk Solving for committor functions in high dimension is given by Jianfeng Lu.

Session 6 – Finding structure in loss

The talk A consistent framework for structure machine learning is given by Lorenzo Rosasco.

Structured machine learning is not structure learning. It refers to learning functional dependencies between arbitrary output and input data. Classical approaches include likelihood estimation models (struct-svm, conditional random fields, but limited guarantees) and surrogate approaches (strong theoretical guarantees but ad hoc and specific).

Applying empirical risk minimisation (ERM) from statistical learning we can expect that the mean of the empirical data is close to the mean of the class. However, it is hard to pick a class. The inner risk (decomposing into marginal probability) reduces the class size. Making a strong assumption the structured encoding loss function (SELF) requires a Hilbert space and two maps such that the loss function can be presented as an inner product. Using a linear loss function helps. For a crazy space Y (need not be linear) the SELF gives enough structure to proceed. This enlarges the scope of structured learning to inner risk minimisation (IRM).

There is a function psi hidden in the loss function that encodes and decodes from Y to the Hilbert space. The steps are encode Y in H, learn from X to H, and decode H to Y. In linear estimation with least squares, the encoding/decoding disappears and the output space Y is not needed for computation.

Standard
Academia

Starting “Geometry and Learning from Data in 3D and Beyond” at IPAM, UCLA

Today is the first day of my stay at the Institute for Pure and Applied Mathematics (IPAM) at University of California Los Angeles (UCLA). Over the coming weeks I wil try to discuss interesting talks here at the long course Geometry and Learning from Data in 3D and Beyond. Stay tuned for the first workshop on Geometry of Big Data.

Standard
Academia, Citations, Equality

A Manifest to Cite 50/50

I recently came across Women Also Know Stuff. I think it is a great initiative that helps to slowly combat systemic and structural inequality. They point to many female scientists in most social sciences and I wondered whether I could find a similar program in computer science. The answer was no because apparently we first need to get women into computer science. I would still love to see #WomenAlsoKnowComputerScience on twitter, alas the search results are empty. It is not that I don’t know great female compute scientist but maybe they lack exposition which makes it all the more harder to convince women to join the field.

What I thought could help would be a larger exposition in scientific citations. I will need to go a bit off-topic to explain my thinking but bare with me. Citations produce scale-free networks (Klemm & Eguiluz, 2002).

Comparison of a random network and a scale-free network. The scale free network shows super connecting nodes in grey. Taken from wikipedia.

That means that a few super-connected nodes (so-called hubs) take up almost all the citation. In general, if we as scientists need a citation to underline a concept, we are much more likely to end up citing such a super-connected node. What that means is that highly cited scientists will get even more cited and less cited scientists remain so. That is even if there science was better. Network effects (or economies of scales) ensure that not necessarily the best science is cited the most, but usually the one preserving the status quo (Wang, Veugelers, & Stephan, 2017). But the effect is even stronger than that. The big names (not only the citations) dominate the field to such an extend that alternative explanations favored by other scientists are locked out of the discussion until such a star departs from the field (Azoulay, Fons-Rosen, & Zivin, 2015).

So where does that leave us with citing female scientists? They are at a triple-disadvantage:

  1. They have been structurally excluded from the discipline
  2. They (usually) don’t have a big name so their citation counts don’t increase
  3. As there are no role models young women may not take up the field

However, and this is what I would like to stress most, it is not the quality of their research. Now, if citations are usually not awarded for merit only but mainly due to structural reasons, why not use them to start shifting the scales today such that in some day in the future women are equally represented in this field (and in many others) such as the statistical distribution of people would predict.

The Manifesto to Cite 50/50

Making a citation to underline a concept does not require us to only cite that one citation that we always use. We can vary whom we cite and we can choose to cite female scientists as well.

  • Citing a female scientist does not cost us anything in our career but it may help build those careers that eventually will bring equality.
  • Citing a female scientist when we only have male scientists at our hand makes us critically reflect our own field and possibly help us to engage with research more deeply to find female scientists.

We probably won’t reach a 50/50 quota any time soon in our citation lists but maybe we can start climbing towards it. I admit I am not there yet and I haven’t done this for any publication I produced yet, but I am of a mind to change this. Maybe you would like to contribute as well? Change is hard and so my first goal is to have at around 50% of publications having a female co-author (though first author would be preferable). I am sure I will fail miserably to reach that goal in the next few publications I make. But yesterday I sat down and tried to find a few women in the field that I could cite and it was surprising how relevant their research was and shocking how I barely heard of any of them (except those who despite the odds managed to become a big name of their own). I think that in the long-term this practice will also make me a better and more engaged scholar that (at least sometimes) manages to look beyond the in-group in which my work is circulating.

Computer Science and more

Now I know I specifically focused on computer science but probably such an attempt should not be confined to one discipline. It should be a truly interdisciplinary endeavor.

Azoulay, P., Fons-Rosen, C., & Zivin, J. S. G. (2015). Does science advance one funeral at a time? National Bureau of Economic Research.
Klemm, K., & Eguiluz, V. M. (2002 NaN). Highly clustered scale-free networks. Physical Review E. APS.
Wang, J., Veugelers, R., & Stephan, P. (2017 NaN). Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy. Elsevier.

Standard