1. Whitfield, J. (2007). Survival of the likeliest?. PLoS biology, 5(5), e142.

It has long been thought that thermodynamics and evolution go against each other: the Second Law of thermodynamics holds that the universe, as a closed system, tends to be more disorder: perfume diffuses and the broken mirror will never be reunited. From this perspective, it is amazing how life and society emerge and sustain themselves. This article introduces a bold assumption that the Second law of thermodynamics is not in conflict with evolution, but rather the underlying principle of evolution. Living systems and societies that stand out from evolution are those that consume energy and produce disorder most efficiently.

Paper

2. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620.

Claude Shannon quantifies information using the formula of entropy, which was created by Ludwig Boltzmann to explain thermodynamic phenomena by connecting microscopic and macroscopic statistics of particles -- it is said that John von Neumann suggested Shannon use the name of "Entropy". Is the entropy in thermodynamics and statistical mechanics actually the same thing as the entropy in information theory? Jaynes was the first who said yes and systematically reinterpreted statistical mechanics from an information theory perspective; physical laws are patterns that are more likely to be observed.

Paper

3. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.

What is intelligence? How did it emerge from such a disordered universe? How can we create an adaptive system which exhibits intelligent behavior by extracting information from the environment and storing it as a form that is constantly transforming? The Hopfield Network is an innovative design for this goal, a pioneering attempt of energy-based learning. It is a network, with a link structure to store external information. It connects knowledge and energy -- connection structures that correspond to "correct" answers are defined as "low energy states", allowing the system to remember by using the power of statistical mechanics. Labels correspond to macrostates and training data (e.g., images) correspond to microstates. The entire system remembers by rolling on the underlying network of microstates and stops at the low-energy attractors (communities).

Paper

4. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.

How do we push a learning, adaptive system to the state of minimum energy (maximum entropy or probability)? There are several ways, including Monte Carlo Simulation, Hebbian Rule (fire together wire together), or Backpropagation. In this paper, Geoffrey Hinton proved that the second method, Hebbian Rule, can be used to minimize the Kullback-Leibler divergence (also known as information gain) between models (systems) and data (environment). This paper also inspires us to think deeper: If we can use an evolving network to learn, then the evolution of real-world networks, including social networks, is it evidence of learning, and if yes, what are they learning?

Paper

5. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.

Modularity (Q) is a measure of the degree to which a network has a community structure: nodes are more likely to be connected at some places than others. Random networks have low modularity, whereas real-world networks, such as social networks, tend to have high modularity. Note that the formula for modularity is very similar to the energy functions defined in Hopfield Network. Can we assume that social networks constantly learn from the environment and transform themselves by forming and reorganizing communities? According to the principle of energy based learning, systems are looking for the lowest energy state. Does the modularity definition actually take advantage of this underlying process and uses the energy formula to describe the community structures?

Paper

6. McPherson, J. M., & Ranger-Moore, J. R. (1991). Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social Forces, 70(1), 19-42.

Finding out that people are divided into social groups and predicting the behavior of individuals from the group they belong to is one of the main frameworks of quantitative social sciences. Blau Space is a model based on simple Euclidean geometry, with each dimension representing a demographics attribute: race, gender, income, education, religion, etc. An individual is a point in this high dimension space. The regression models used in social sciences can be regarded as an application of this geometrical model. A popular assumption in sociology, "homophily in social networks " (McPherson et al., 2001), can also be understood from this perspective, suggesting that points close to each other in this space are more likely to connect. This assumption provides insight into the sociology of organizations: an organization, such as a worker union or a church, is an area in Blau space. Organizations may compete with each other for the same niche of Blau space.

Paper

7. Abbe, E. (2017). Community detection and stochastic block models: recent developments. arXiv preprint arXiv:1703.10146.

Harrison White is one of the first scholars who introduced the concept of block modeling (White et al., 1976). This model suggests that the linking behavior of nodes are determined by the underlying social "blocks" they belong to. Within a block, nodes are interchangeable with each other. We can also understand a block configuration correspond to a macrostate that corresponds to a variety of different networks (micro-states). Stochastic blockmodels (SBM) connects block models to the Ising model and uses Monte Carlo Simulation to infer the most likely block structures. Interesting questions to think over include: What is the relationship between Blocks and communities? Can blocks be considered as areas in Blau space? Is defining individuals by block or community vectors a way of embedding? And if yes can we achieve it by using artificial neural networks?

Paper

8. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.

The DeepWalk model is based on a neurolinguistic model called word2vec (Tomas Mikolov et al., 2013). Word2vec represents words as vectors by embedding words in a high-dimensional, Euclidean space. It uses artificial neural networks as an optimization algorithm to find the optimal position of words. The optimization goal is to predict the context words from a focal word, which is expected to form the smallest angle to its neighbours in the embedding space. DeepWalk simulates random walks on a social network and generate sequences of nodes as input to the word2vec model and thus embed social networks in European spaces to predict clustering patterns or sharing labels. The principle of word2vec is claimed to be "distributive hypothesis", i.e., the meanings of words are defined by its neighbors. In a similar way, DeepWalk is applying this principle to social networks in the sense that the social roles of individuals are defined by the surrounding individuals.

Paper

9. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2018). The geometry of culture: Analyzing meaning through word embeddings. arXiv preprint arXiv:1803.09288.

This paper demonstrates that word embedding reveals culture dimensions (e.g. man - woman, rich - poor, black - white, liberal - conservative) underlying collective narrative and may help us understand the world world-views of societies. It also reveals the hierarchy of languages, for example, nouns like "man" or "woman" can be represented as vectors, and adjectives and verbs, such as "masculine" or "feminine", which are operations of nouns are more "abstractive", can be represented as vector differences. This provides insight into the hierarchy of society and the functions of individuals.

Paper

10. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903, 1(2).

Blau space and Block models use demographic variables to understand and predict linking structure, but Graph Neural Networks (GNN) is the reverse, inferring demographic variables (node features) from social networks. Different fromDeepWalk, GNN is supervised machine learning. This leads to an "emergent" interpretation of demographics attributes, that is, they are fluent labels as a consequence of evolving social networks which restlessly learn and store environmental knowledge. Features on nodes are temporary storage of knowledge. This expression, while has the same spirit as the social construction theory, defines the social construction process precisely and quantitatively as a social computing process, therefore, goes beyond the philosophical interpretation.

Paper

How Society Thinks

1. Whitfield, J. (2007). Survival of the likeliest?. PLoS biology, 5(5), e142.

2. Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620.

3. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.

4. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.

5. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.

6. McPherson, J. M., & Ranger-Moore, J. R. (1991). Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social Forces, 70(1), 19-42.

7. Abbe, E. (2017). Community detection and stochastic block models: recent developments. arXiv preprint arXiv:1703.10146.

8. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.

9. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2018). The geometry of culture: Analyzing meaning through word embeddings. arXiv preprint arXiv:1803.09288.

10. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903, 1(2).