ETH, STP

Off to the Chicago Forum on Global Cities

Today I write you as part of a mini-series on my stay at the Chicago Forum on Global Cities (CFGC). I have been kindly sponsored by ETH Zurich and the Chicago Forum to participate in the event. I am currently sitting in my train to Zurich airport and I am looking forward to 3 days of intensive discussions on the future of global cities. You will also find a post about this event on the ETH Ambassadors Blog and ETH Global Facebook and you may look out for some tweets.

I hope for many interesting meetings and conversations at the Forum, especially about my main topics of interest Big Data in Smart Cities – for which I have a short policy brief with me designed in the Argumentation and Science Communication Course of the ISTP – as well as ways to design better cities based on Big Data and knowledge of human (navigation) behaviour – the topic of my soon to start PhD.

Standard
ETH

CGSS: Introduction

Complexity and Global Systems Science (CGSS) will cover Game Theory and mechanism design, complex network, socio-physics, and critical thinking essays regarding the topic.

Complexity science is related to systems that are made up of thousands of units, whereas global systems describe large systems. Systemic instabilities are of a major interest and need to be understood.

Collateral costs are hard to track but can be associated with financial crises, conflicts, terrorism, crime and corruption, epidemics and cybercrime. Reducing collateral costs provides a new opportunity to tackle each problem. However, this required to understand complex systems.

In a complex system a large number of interacting system elements follow non-linear interdependencies. They are dynamic and probabilistic and therefore elude easy descriptions.

It is necessary to make a difference between a complicated and a complex systems. On the one hand, a car is a complicated system out of thousands of parts. However, each part constitutes a specific task and can be understood (mostly). On the other hand traffic of cars is complex and predicting traffic jams is fiendishly hard (e.g. the phenomena of phantom traffic jams).

Complex systems often exhibit self-organisation (e.g. pedestrian forming lanes to walk into different directions), however, self-organisation is not a guarantee to an efficient solution (e.g. the Love Parade disaster).

Predictability of complex systems is limited. Dynamics of such system usually are highly sensitive and therefore small differences in initial setup can cause largely different results (e.g. butterfly effect in weather forecasting).

Control over complex system is an illusion due to a irreducible randomness and delays in consequences together with regime shifts (i.e. only if a threshold is met a change becomes (catastrophically) visible). Goodhart’s law/Principle of Le Chatelier states that a system tends to counteract external control attempts.

The unstable supply chains and phantom traffic jams are caused by delays in the system that are then amplified and maintained throughout without possibility of stopping them. However, it can be modelled. Those models often show oscillations that propagate and make it difficult to obtain a specific state. Tragedies of the Commons are another classical case where oscillations eventually cause a break-down.

Strongly coupled system behave different: they have faster dynamics, extreme events, self-organisation, emergent system behaviour and low predictability.

Cascade effects in networks together with probabilistic events and delays make causal analysis difficult. A blackout is such an event where failure in a single node in the system can stop the whole system. Whereas more connectivity allows for quicker results in positive ways, it also allows for quicker spreading of negative results. In addition to an catastrophic event, secondary and tertiary disasters may follow up. A causality network can be modelled to identify n-ary disasters based on a specific disaster. If an effective decoupling strategy could be setup, the catastrophic spread could be interrupted.

Decentralisation seems to be a useful tool in reducing inherent risk.

Big Data is a double-sided sword. The more data you have, the more patterns you find. However, those patterns are mere correlation and do not represent causation. Therefore simply sifting through data does not allow to find causation. The idea was to create AI, that are able to detect patterns and find causation. However, AIs are themselves driven by data and can therefore be manipulated by data (chat bots learn racism from users in the internet, police machine learning discriminates against Black or Hispanic people in the US).

Standard
ETH, STP

Cornerstone Course – Day 5: Digital Society – Big Data

Digital Society is a very elastic phrase. We will explore three examples:

  • Network Neutrality
  • Privacy and Surveillance
  • Big Data

All are focused on how technology changes society. It is a contested topic on whether the impact is positive or negative. Issues are at the intersection of information and communications technologies and society, law, and public policy.

Big Data

Mass collection of personal information is essentially discrimination, however, widely used in credit rating (Gandy & Oscar , 1993) . The Internet aggravated the situation by commercial use of targeted marketing. This leads to a fine-grained market segmentation and systematic discrimination which in turn is hard to detect or resist. Even worse, most companies cannot pinpoint the discrimination that they apply in their services.

The traditional scientific approach describes itself as

  1. Formulate hypothesis
  2. Design and conduct experiments
  3. Use results to confirm or disprove
  4. Basis for decisions and actions

It is arguably not how science works, but it is how science presented itself to work.

Big Data contrasts to the scientific approach as it

  1. Existing large data set (not necessarily what you where looking for)
  2. Mine data for correlations (patterns)
  3. Infer links between factors (sort of a hypothesis)
  4. Basis for decisions and actions.

The approach is completely automated and produced by a computer, no humans involved (other than devising the algorithms). Resulting models of the world are highly complicated and incomprehensible to humans (even beyond the possibility of understanding by humans). Big data further focuses on correlations rather than causation. The complete data is used rather than sampling and statistics are used in contrast to actual individual accuracy. To make this work you must collect all the data in advance and more specifically you must collect any data you can.

Why now?

Computational power has become much cheaper. Data is available and data mining & machine learning have become viable. The Internet of Things (IoT) is increasing the amount of data available drastically. Processing the data is difficult and it is not clear how malicious actors could influence the process. Most IoT services are useful, but they generate a huge amount of data that is shared and used by the provider of the IoT services.

Google Translation is a case in point for Big Data. Previously, people tried to deconstruct language by understanding the grammar and then reassemble them in another language. Google learns nothing of grammars, but actually correlates the same text in two languages to obtain a statistical connection between languages. The EU provided a great source of data as (nearly) all its text are (manually) translated in all 24 official languages.

Another example is Google Flu Trends, which automatically found search terms that where correlated with influenza cases to create a prediction system. The system worked well for data between 2004 and 2010, however than it broke down. The question is whether public policy can be based on this.

“Personal data is the new oil of the Internet and the new currency of the digital world.” – Meglana Kuneva, European Consumer Commissioner, 2009

Buying habits, how likely you are to vote for a party, likelihood of accidents and health habits can be (tried to be) predicted. Statistical learning is better with larger data sets which favours larger players. Data has unexpected/unpredictable uses when it is correlated which apparently with unrelated information. All this makes data looks like a natural monopoly.

An example is exploding manhole covers in New York that happened inexplicably. However, they could be correlated to requests for telephone line repairs. After investigation it was found that old (broken) lines produces explosive chemicals that eventually would go off. Replacing the lines solved the issue.

However, correlation is not causation and therefore it is dangerous to base policy solely on Big Data.

References

Gandy, J., & Oscar , H. (1993). The Panoptic Sort: A Political Economy of Personal Information. Critical Studies in Communication and in the Cultural Industries. Boulder, CO: Westview Press, Inc.

Standard