The Geometry of Thinking

Thinking unthinkable - can human do it with the help of machines? This study group investigates the duality between reasoning and the geometry of data representation learned by artificial neural networks. Based on this investigation we will discuss how to achieve intelligence augment.

Thinking Unthinkable

“Just as there are odors that dogs can smell and we cannot, as well as sounds that dogs can hear and we cannot, so too there are wavelengths of light we cannot see and flavors we cannot taste. Why then, given our brains wired the way they are, does the remark “Perhaps there are thoughts we cannot think,” surprise you?

Evolution, so far, may possibly have blocked us from being able to think in some directions; there could be unthinkable thoughts.”

- Richard Hamming, The Unreasonable Effectiveness of Mathematics

Outline for discussion 19-09-28

1. Machines, and not humans, define computability
1.1 Leibniz: thought symbolics to settle dispute
1.2 Church-Turing thesis on "effective computability": Turing machine, lambda computability, and general recursive functions
1.3 Three schools of foundations of mathematics: D. Hilbert - formalism, L.E.J Brouwer - intuitionism, and B.Russell - logicism

2. Shapes know the answers: landscape is intelligence
2.1 Two worldviews: Newtonian (complex dynamics + simple geometry) vs Einsteinian (simple dynamics + complex geometry)
2.2 Introduction to artificial neural networks: machines see worlds differently to solve puzzles

1. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.

What is intelligence? How does it emerge from such a disordered universe? Can we design a system that exhibits intelligent behavior by extracting information from the environment and hardware it in topology? The Hopfield Network is an innovative brainchild of these questions. It suggests looking at machine learning tasks from the perspective of statistical mechanics and creates the traditional of energy-based learning. It connects information and energy by representing information using a linking structure on which energy is designed. "Correct" structures have low energy, whereas "wrong" structures have high energy. The system randomly evolves and converges to the low-energy states, exhibiting the resilence to intervene, or, "remembering."

Paper

2. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.

How do we evolve an energy-based learning system to the state of minimum energy or maximum entropy? A naive way is to do the Monte Carlo Simulation and count on chance. In this paper, Hinton proved that the Hebbian Rule, also known as "fire together wire together," can be used to evolve the system. It minimizes the Kullback-Leibler divergence (information gain) between models (systems) and data (environment). This paper also inspires us to ask: if we evolve (neural) networks to learn, then what are evolving (social) networks learning?

Paper

3. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.

Modularity (Q) is a measure of the degree to which a network has a community structure: nodes are more likely to connect within than between communities. Random networks have low Q values, whereas most real-world networks have high Q values. Note that the formula for modularity is very similar to the energy functions defined in Hopfield Network. Can we assume that social networks are continuously learning from the environment and forming communities to store information? According to the principle of energy-based learning, systems converge at the lowest energy state. Can Q modularity be understood as an energy formula to describe networks?

Paper

4. McPherson, J. M., & Ranger-Moore, J. R. (1991). Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social Forces, 70(1), 19-42.

Understanding individual behaviors based upon the social groups they belong to is one of the dominant perspectives of social sciences. Blau Space is a geometric model of this perspective. It is a Euclidean space of demographical dimensions such as race, gender, income, etc. Linear regression models following the iid assumption are applications of this model. An individual is a data point in this high dimensional space. And the statement that "Birds of a feather flock together" (McPherson et al., 2001) assumes that points close to each other are more likely to connect. Meanwhile, social organizations compete with each other for the same niche in the Blau space.

Paper

5. Abbe, E. (2017). Community detection and stochastic block models: recent developments. arXiv preprint arXiv:1703.10146.

Harrison White is one of the first scholars who introduced the concept of block modeling (White et al., 1976). This model suggests that the linking behavior of nodes is determined by the underlying social "blocks" to which they belong. Within a block, nodes are interchangeable with each other. Stochastic block models (SBM) connects block models to the Ising model and uses the Monte Carlo Simulation to infer the most likely block structures.

Paper

6. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.

The DeepWalk model is based on a neurolinguistic model called word2vec (Tomas Mikolov et al., 2013). Word2vec represents words as vectors by embedding words in a high-dimensional, Euclidean space. It uses artificial neural networks as an optimization algorithm to find the optimal position of words in predicting words nearby. DeepWalk simulates random walks on social networks and generates sequences of nodes that can be input to the word2vec model and predict community structures. The word2vec model follows the "distributional hypothesis" to define the meanings of words by its neighbors. Similarly, DeepWalk defines the social roles of individuals by surrounding nodes.

Paper

7. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review, 0003122419877135.

Word embeddings capture culture dimensions (e.g., man vs. woman, rich vs. poor, black - white, liberal vs. conservative) underlying collective narrative and reveal how society thinks and what society knows. It also models the hierarchy of languages. For example, nouns like "man" or "woman" can are modeled as vectors, while adjectives and verbs, such as "masculine" or "feminine," which are operations of nouns, can be represented as vector differences. This provides insight into the hierarchy of society and the functions of individuals.

Paper

8. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903, 1(2).

Blau space and Block models use demographic variables to predict social connections. In contrast, Graph Neural Networks (GNN) infer demographic variables (node features) from social networks. Different from DeepWalk, GNN is a supervised machine learning technique. It assumes and models the diffusion of labels between nodes.

Paper

Study Group Members

Di Tong

MA Student
Masters in Computational Social Science
University of Chicago

The Geometry of Thinking

Thinking Unthinkable

Alternative Geometries
by Neural Networks

Alternative Geometries
by ANNs

Alternative Geometries
by Neural Networks

Reasoning vs. Geometry

Networks have directions: money flow

in-in for complementarity (flow)
in-out for
Substitution (orthogonal to flow)

Outline for discussion 19-09-28

Reading list

1. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.

2. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.

3. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.

4. McPherson, J. M., & Ranger-Moore, J. R. (1991). Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social Forces, 70(1), 19-42.

5. Abbe, E. (2017). Community detection and stochastic block models: recent developments. arXiv preprint arXiv:1703.10146.

6. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.

7. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review, 0003122419877135.

8. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903, 1(2).

Study Group Members

Likun Cao

Colin Yuanhao Liu

Wendy Chengwei Wang

Haochuan Cui

Yiling Lin

Yanbo Zhang

Huimin Xu

Linzhou Li

Hongbo Fang

Di Tong

The Geometry of Thinking

Thinking Unthinkable

Alternative Geometriesby Neural Networks

Alternative Geometriesby ANNs

Alternative Geometriesby Neural Networks

Reasoning vs. Geometry

Networks have directions: money flow

in-in for complementarity (flow)in-out for Substitution (orthogonal to flow)

Outline for discussion 19-09-28

Reading list

1. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.

2. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive science, 9(1), 147-169.

3. Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.

4. McPherson, J. M., & Ranger-Moore, J. R. (1991). Evolution on a dancing landscape: organizations and networks in dynamic Blau space. Social Forces, 70(1), 19-42.

5. Abbe, E. (2017). Community detection and stochastic block models: recent developments. arXiv preprint arXiv:1703.10146.

6. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.

7. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review, 0003122419877135.

8. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903, 1(2).

Study Group Members

Di Tong

Alternative Geometries
by Neural Networks

Alternative Geometries
by ANNs

Alternative Geometries
by Neural Networks

in-in for complementarity (flow)
in-out for
Substitution (orthogonal to flow)