ISN: Communities and cliques

Dyads are not yet interesting for network research. However, starting at triads interesting behaviour appear. In triads, balance and control appear. Triads appear more commonly in social networks than in random graphs.

Clustering coeffcient

The clustering coefficient measures the amount of transitivity in a network. When A is related to B, and B is in turn related to C, then A is also related to C. The index ranges from 0 to 1. In social networks it is usually between 0.3 and 0.6.

Triad counts

Four patterns are counted between all vertices. Empty (three vertices with no connections), 1-Edge (two vertices are connected via an edge and one vertex is not), 2-Edge (One vertex is connected to both other vertices with an edge) and Triangle (all vertices are connected).

Groups of more than 3

There are a few technical description that are extensions (and modifications) of the complete for more than 3 nodes:

  • Clique: Maximal complete subgraph of [latex]n \geq 3[/latex]
  • k-cliques: Relaxation to [latex]k > 1[/latex], where [latex]k[/latex] is geodesic length
  • k-core: Subgraph where each node adjacent to at least [latex]k[/latex] other nodes
  • k-plex: Maximal subgraph of [latex]g[/latex] nodes where each node adjacent to no fewer than [latex]g-k[/latex] nodes

Communities

Communities are densely connected within and sparsely connected with others. A community structure can affect individuals, groups, networks and give insights into how social systems work.

Community detection

Community detection is a computationally difficult problem. Knowing the optimal solution is not always possible. Algorithmic approximations are often used to detect communities.

Modularity

Modularity is always smaller than 1, but can also take negative values. Higher values means more edges within modules.

[latex]Q = \frac{1}{2m}\sum_{ij}\delta(C_i,C_j)(A_{ij}-P_{ij})[/latex]

Where [latex]A_{ij} encodes that an edge exists and [latex]P_{ij}[/latex] the probability of an edge existing and [latex]m[/latex] the number of edges and [latex]\delta(C_i,C_j)[/latex] describes whether two nodes are inside the same module.

Kernighan-Lin Algorithm

Based on a pre-determined number of communities is randomly assigned and the modularity score is computed for switching any node. The highest achievable modularity with a single switch is assigned. The process is repeated until no more switches could improve the score. The solution, however, is not necessarily optimal, a local maxima may be chosen based on the random initial assignment. The algorithm should be repeated

Edge-Betweenness Clustering Algorithm

Evaluate the edge-betweenness of each edge in the network. Find the edge with the highest score and delete it. As long as the disconnection between two components increases modularity, the algorithm continues. While there is no random variation involved, it may not find the optimal solution, it may not maximise modularity and modularity is slow.

Fast-Greedy Clustering Algorithm

Starting with an empty graph where each node is its own community. The modularity for each possibly join between two nodes is computed and the one with the highest modularity is chosen. The process is repeated until no further increase in modularity is possible. An issue is, that small communities are easily missed. However, a dendrogram allows to judge how many communities could be present.