Cornerstone Course – Day 5: Digital Society – Big Data

Digital Society is a very elastic phrase. We will explore three examples:

  • Network Neutrality
  • Privacy and Surveillance
  • Big Data

All are focused on how technology changes society. It is a contested topic on whether the impact is positive or negative. Issues are at the intersection of information and communications technologies and society, law, and public policy.

Big Data

Mass collection of personal information is essentially discrimination, however, widely used in credit rating (Gandy & Oscar , 1993) . The Internet aggravated the situation by commercial use of targeted marketing. This leads to a fine-grained market segmentation and systematic discrimination which in turn is hard to detect or resist. Even worse, most companies cannot pinpoint the discrimination that they apply in their services.

The traditional scientific approach describes itself as

  1. Formulate hypothesis
  2. Design and conduct experiments
  3. Use results to confirm or disprove
  4. Basis for decisions and actions

It is arguably not how science works, but it is how science presented itself to work.

Big Data contrasts to the scientific approach as it

  1. Existing large data set (not necessarily what you where looking for)
  2. Mine data for correlations (patterns)
  3. Infer links between factors (sort of a hypothesis)
  4. Basis for decisions and actions.

The approach is completely automated and produced by a computer, no humans involved (other than devising the algorithms). Resulting models of the world are highly complicated and incomprehensible to humans (even beyond the possibility of understanding by humans). Big data further focuses on correlations rather than causation. The complete data is used rather than sampling and statistics are used in contrast to actual individual accuracy. To make this work you must collect all the data in advance and more specifically you must collect any data you can.

Why now?

Computational power has become much cheaper. Data is available and data mining & machine learning have become viable. The Internet of Things (IoT) is increasing the amount of data available drastically. Processing the data is difficult and it is not clear how malicious actors could influence the process. Most IoT services are useful, but they generate a huge amount of data that is shared and used by the provider of the IoT services.

Google Translation is a case in point for Big Data. Previously, people tried to deconstruct language by understanding the grammar and then reassemble them in another language. Google learns nothing of grammars, but actually correlates the same text in two languages to obtain a statistical connection between languages. The EU provided a great source of data as (nearly) all its text are (manually) translated in all 24 official languages.

Another example is Google Flu Trends, which automatically found search terms that where correlated with influenza cases to create a prediction system. The system worked well for data between 2004 and 2010, however than it broke down. The question is whether public policy can be based on this.

“Personal data is the new oil of the Internet and the new currency of the digital world.” – Meglana Kuneva, European Consumer Commissioner, 2009

Buying habits, how likely you are to vote for a party, likelihood of accidents and health habits can be (tried to be) predicted. Statistical learning is better with larger data sets which favours larger players. Data has unexpected/unpredictable uses when it is correlated which apparently with unrelated information. All this makes data looks like a natural monopoly.

An example is exploding manhole covers in New York that happened inexplicably. However, they could be correlated to requests for telephone line repairs. After investigation it was found that old (broken) lines produces explosive chemicals that eventually would go off. Replacing the lines solved the issue.

However, correlation is not causation and therefore it is dangerous to base policy solely on Big Data.


Gandy, J., & Oscar , H. (1993). The Panoptic Sort: A Political Economy of Personal Information. Critical Studies in Communication and in the Cultural Industries. Boulder, CO: Westview Press, Inc.

Cornerstone Course – Day 5: Digital Society – Privacy and Surveillance

Digital Society is a very elastic phrase. We will explore three examples:

  • Network Neutrality
  • Privacy and Surveillance
  • Big Data

All are focused on how technology changes society. It is a contested topic on whether the impact is positive or negative. Issues are at the intersection of information and communications technologies and society, law, and public policy.

Privacy and Surveillance

“You have zero privacy anyway. Get over it.” – Scott McNealy (Co-founder of Sun Microsystems, 1999)

In the physical world you could go to remote locations that are private and data was stored in paper – hard to copy, easy to destroy and costly to move. Additionally, information sharing was limited and specialized based on a regulated and trusted system – e.g. bank, post office, phone company, state, etc.

The biggest change is that most personal data is now online – easy to copy, hard to destroy and cheap to move. Infrastructure capital cost is nigh zero with little regulation and very rapid adoption. Additionally, technology spans jurisdictions and a big incentives to monetize personal information.

A positive outlook could be personalized services. Price-discrimination is easier to perform which makes the economy more efficient. On the back of the medal personalized services in the area of news creates gaps in the society (e.g. Obama would loose according to Republican news sources and in the information bubble of people). In the book (Sunstein, 2001)this issue is broadly discussed. However, there seem to be noticeable portion of a population that consumes news of opposing views (“Ideological Segregation Online and Offline,” 2011) .

Is regulation futile?

Facebook enables to identify people by simply photographing them and uploading it to Facebook to get recommendations who is in the picture. Privacy cannot be protected without forbidding such services.

On a darker point surveillance by government has been facilitated by the same technology. Surveillance infrastructure is a commodity and large-scale computing power is available, possibly soon bolstered by quantum computing.

US vs EU regulation

In the US activities are usually allowed unless specifically forbidden whereas in Europe they are usually forbidden unless specifically allowed. In Europe there are privacy protection agency which can sanction (up to 2% of annual worldwide turnover).

The EU anti-trust body has become prominent because it can generate income for the EU.

The Save Habor Principle was an agreement regulated data exports, however the European Court of Justice ruled it incompatible with European Law and consequently it was replaced with Privacy Shield.

In Europe you have to provide your consent as a consumer to companies for them to process your data. Today, consent is usually given by using the service. This form of consent implies that people do not understand what they are consenting to. Research showed that people care for privacy, nonetheless they ignore privacy by giving immediate consent to internet services  (Staddon, Acquisti, & LeFevre, 2013) (QUOTE). The EU introduced the “Right to be forgotten” to whip off content off the internet that has been stored under this kind of consent.


Ideological Segregation Online and Offline. (2011). Quarterly Journal of Economics, 126(4), 1–2.
Staddon, J., Acquisti, A., & LeFevre, K. (2013). Self-Reported Social Network Behavior: Accuracy Predictors and Implications for the Privacy Paradox. Presented at the Social Computing (SocialCom), 2013 International Conference on , Washington, DC, USA.
Sunstein, C. R. (2001). (1st ed.). Princeton University Press.

Cornerstone Course – Day 5: Digital Society – Network Neutrality

Digital Society is a very elastic phrase. We will explore three examples:

  1. Network Neutrality
  2. Privacy and Surveillance
  3. Big Data

All are focused on how technology changes society. It is a contested topic on whether the impact is positive or negative. Issues are at the intersection of information and communications technologies and society, law, and public policy.

Network Neutrality

The term dates back to the telephone line networks. It is focused on the Internet, but there are more kind of networks which rely on the same principle.

The main idea is that traffic over the network should all be treated the same. No package should be singled out for its user, content, origin or destination.

“The principle that Internet service providers and governments should treat all data of the Internet equally, not discriminating or charging differentially by user, content, site, platform, application, type of attached equipment, or mode of communication” – FCC

The Wallstreet journal said in 1990 that the major effect of the Internet would be the replacement of the fax machine – which would be a major change. However, it eventually failed at that as the fax machine provides a receipt for having sent and printed a file which the US legal system recognize as a serving a subpoena.

Minitel was a France initiative to look up phone numbers on a small computer in the 1980. In the follow-up it offers weather forecasts and soon allowed private companies to offer services. It was quite successful, however, it was always run by France Télécom S.A. (top-down) and eventually the Internet overtook it because the internet was bottom-up.

In 2008 2/3 of the Internet bandwidth was used by BitTorrent. The share was actually decreasing. However, the FCC was investigating Comcast whether they slowed BitTorrent connections. This was the precedence for the FCC to enforce net neutrality. The next round was fought between Netflix and Comcast in 2012 when Netflix accused Comcast of throttling their video services to favour their own. Out of court settlements were reached and end-user speed on Netflix increased. Another case happened in 2015 when the Marriott hotels blocked wireless spectrum to prevent customers from operating their own mobile Wi-Fi hotspots. After a public outcry they retreated from the policy.

In Switzerland communication providers offer their own video services without limitation (Salt/Zatttoo and Swisscom/Swisscom TV Air).

Networks and layers

Networks in general are built in layers. Specifically upper layers built on lower layers.

[table id=6 /]

The Internet is only fixed in the IP protocol, technology above and below can be changed. This is called the Internet hourglass model. The network types on the bottom of the glass and the applications above at the top of the glass have evolved tremendously.

On the downside IP is very constrained. Improving it is very difficult, but denial of service attacks (DOS), security concerns and the limited number of addresses require changes.

Network neutrality has a technical foundation that it requires the end-to-end communication to work. However, there are two perspectives.

  1. Functions should only be implemented in a lower layer if it can be completely & correctly implemented at that layer
  2. In addition, if the function is needed by all clients of that layer

On the one hand, the advantages are that the network doesn’t know anything and doesn’t do things that are better done at the end. The argument naturally emerges from the technical structure of the internet. Additionally, it provides the following advantages:

  • Long-term evolvability
  • Application autonomy
  • Reliability
  • Minimizes interfaces between modules
  • reduces complexity

Those advantages are claims and not necessarily correct. On the other hand, disadvantage is that the internet does not do things well, but it does them all. Phone calls for instance. Phone lines were optimized for verbal communications. When a fax machine was connected it had to realize that, stop echo cancelling, stop compressing. The phone system typically ran at 90% utilization whereas the internet rarely runs above 20%.

Carrying public internet traffic is not economically viable as the traffic does only cost but cannot provide income. Sprint subsidized the public internet by phone systems and today the internet is subsidized by large companies buying private networks.

Modularity as a design principle offers advantages by minimizing interfaces between modules and reducing complexity. Facebook started out as a website out of a dorm room. However, the design tends to be static and can have negative effects on

Virtual Private Networks (VPNs) and Tunnels allow to disguise the content of your communication (e.g. human right activists hiding communication from a repressive government or drug cartels organising their deliveries). This poses a problem for law-making if they want to change net neutrality. Encrypted content cannot be identified and therefore the loss of net neutrality would most likely result in the illegality of VPNs.


The US has different legal framework for DSL and cable modem providers. Telecommunication services and information services have different legal requirements. Telecommunication services are more restricted and enforced net neutrality whereas information services do not. DSL is a telecommunication services and cable modems are information services. The legal definitions are not clear and still contested in the courts. The FCC tries to reclassify cable modems to be telecommunication services.


  • Does a neutral network discriminate against Quality of Service applications? (e.g. Skype)
  • Does the argument hold in a world of competition between DSL and cable?
  • Do we need special regulations? Isn’t this topic for antitrust policy?
  • Is this a debate about economic effects or about freedom?
  • Situation in Europe and Switzerland?
  • Technical solutions to the network neutrality problem? How to resolve this transparently?