Connectionism: Neural Networks from Cognitive Science lens

14 min readMay 10, 2022

In the Philosophy of mind, Connectionism aims to explain mental processes with the aid of artificial neural networks.

Connectionism

According to the Connectionism theory, the human mind is a network of nodes. This network according to the theory, is comprised of several small units or nodes. The network may consist of an input layer, a hidden layer, and an output layer. The 3 layers are arranged one after the other in a sequential manner, but the nodes within the layers are parallelly arranged.

In the given network example above, there are a total of nine units. Three units are in the input layer, having connections with the four nodes in the hidden layer, while the output layer consists of 2 nodes.

In the connectionist’s network, all the layers are connected to the previous layer — in a sense that they take information from the previous layer and using an activation function decide on which information or how much of the passed information should further be forwarded.

Example of a Neural Network

We can take a real-life scenario and discuss how this neural network so to say tackle an everyday problem. Let us consider a case of identifying between a picture of a dog and a picture of a cat. Now, there are a lot of possibilities on what kind of pictures may they be. There could be small, large, colored, black and white images, 3D images, and so on. For the simplicity of our discussion, we will consider the pictures to be of dimensions 128x128 and that they are grayscaled, ie. each pixel could have a possible value between 0 and 255, where 0 means nothingness or black color, while 255 represents a total white. All other values are shades of grey.

How can we teach a neural network to identify given a picture if it is of a cat or a dog?

Firstly, let’s start with a loose analogy of how humans may take up this task. When we see someone for the first time, we ask their name, and there is some internal mapping that gets built between the name of the person and how they look. And if our frequency of meeting them is high, that association gets stronger and stronger and with time we feel no issues in identifying that person. Hence, to have a computational equivalence, we may say that we will require a plethora of pictures of dogs and cats for the neural network and also the associative label. In a sense, we can say to the network, “hey here’s a picture and you know what it is a picture of a cat”, and slowly we would hope that the network learns the mapping.

Now that we have set up the premise, let’s discuss what it could mean to ‘learn’ the mapping.

Each of the units of the hidden layer has values called weights associated with it. These weights are a measure of the amount of information they want to take from a particular unit in comparison to the other units. A value of 1 would mean that the hidden layer unit wants the information passed to it from a certain unit of the first layer as is. 0 would imply that the hidden layer node simply doesn’t want to accept any information from a certain node.

Classification Network/Source: https://github.com/ReiCHU31/Cat-Dog-Classification-Flask-App

The weights may also be seen from the point of view of strengthing or inhibiting. These weight values could be any real numbers — this is something that the nodes of the hidden layer need to identify in order to be able to perform the end task. Usually, it is hoped that after going through a huge number of training images, the network should be able to finetune the same — this is exactly what learning means in the context of neural networks.

What gets passed? During the training phase, the images get passed! And for a computational perspective that would be a 2d array of pixel values for 128*128 pixels — each ranging from 0 to 255. This information is given to the input layer, implying that we’d need 128*128 input nodes in order to contain the information of one picture. Also, the label gets passed during the training phase, for eg in the case of cats we may pass the value 1 and in the case of dogs, we may pass the value 0. The key idea is that once the neural network computes its take on the image, it then gets to compare that take with the reality, and then it can take the same as feedback on improving itself for future decision making — such would take place by tweaking the weight values.

One of the popular weight update mechanisms is backpropagation, here’ the idea is that the weights are tweaked in a backward fashion and this happens so because of how the chain rule works. To compute the value of the weight in the ith node of the nth layer, one would have to go back from the nth-1 node to the first node as in the formulation the weight of the ith node of the nth layer depends on all such weights in the previous layers through which its value may be changed.

Backpropagation/Source: https://medium.com/hackernoon/classifying-images-using-tensorflow-js-keras-58431c4df04

With the backpropagation, the hope is that the weights are changed in such a way that the network’s capability of classification or regression improves, in our case the problem statement is of classification. Further, to compare if the network is improving, the cross-entropy loss value may be calculated (in the case of the classification problem statement).

Besides the traditional feedforward network, a view of which I elaborated on, nowadays there are various variants of Artificial Neural Networks which do not have a uniform notion of direction. For example the Recurrent Neural Networks and the Long Short Term Memory.

My explanation of properties of connectionist networks (by Ravencroft)

Connectionist networks can learn: In order to distinguish between cat and dog, the network must be able to make a distinction between what could be striking features of the cat or that of the dog. For example, cats may have whiskers, dogs may have their tongue out most of the time — their source of oxygen. Then there is inter-species variability to account for. The ability of the networks to learn from the dataset of the images of cats and dogs to be able to make a correct classification is what makes neural networks stand out.

According to research outcomes of the neural networks, it is said to have great generalization properties, ie if a new dataset gets checked for dogs and cats of other breeds, it is expected that the network may be able to make distinctions even then.

Connectionist networks process in parallel: As discussed earlier the nodes or the units in a particular layer are parallel arranged to each other. i.e. nodes in a particular layer may either be sending or receiving information from the previous or the latter layer in a parallel manner. ie. severally units get activated or deactivated at the same time. This should theoretically result in faster overall computations.

For example, all the input layers would get their respective pixel value from the dataset at the same time, while the values from these input layers will get passed to the next layer — where further computations will take place — all in togetherness as well.

Representations in connectionist networks are distributed: Information is represented in the artificial neural networks in both series and parallel ways, while the nodes or the units of one layer are arranged in a parallel setting the layers amongst themselves are arranged in a linear manner.

The passing of information takes place in the feedforward (or bidirectional in the case of Recurrent Neural Networks) manner while the processing takes place parallel-y.

Processing in connectionist networks is local: The output of all the units in the hidden layer depends on two factors, first is the information or activation it received from the previous layer (input layer), and secondly, its own activation function. In a way, this makes a claim in the direction that for the neural networks there is no central authority but rather the responsibility is equally distributed amongst the nodes or the units of the hidden layer. One analogy of it could be how people behave in a free environment versus a controlled environment.

For eg when you go to the park, you can choose to run, exercise play on rides, and so on, there are no restrictions to which ride you can take or boundaries to where you can explore in the park. Now imagine a scenario where there is a function going on in the park, now there are limits to what you can do in the park. For example, there would be territories that would be out of bound and so on. Hence here in the first case, there is a parallel notion of the restrictionless ability to make decisions, while in the second scenario there is an invisible controller who can restrict and allow what you may or may not do, the first case is how a neural network works.

Connectionist networks tolerate poor-quality input: There may be a scenario where the new images are not as clear or easy enough visually to make a judgment — if they are a picture of a dog or a cat.

According to the experiments, turns out that the neural networks are pretty robust and proficient even when an image might be not as clear as the image set it might have been trained upon.

Connectionist networks gracefully degrade: The expertise of the neural network degrades when its weights get reset. It’s as if the memory of learning is getting lost. But turns out that unless a lot of weights have lost their correct value, the neural network would still be able to give good predictions — albeit at low accuracy. But of course, as intuitively we feel about it, the quality will degrade in proportion to the number of weights losing their learning.

My explanation of Evidence in favor of Connectionism (by Ravenscroft)

Good at what we’re good at; bad at what we’re bad at: We have seen in the previous discussion that connectionist networks have a great generalization capability that seeing something it can partially think a match of is the essence of learning and adapting. This ability is quite in some sync with that of humans at a similar task.

Both humans and connectionist networks perform quite well on the visual recognition tasks. This is a point where we can iterate, “connectionist models are good at what we are good at”. But is it merely a unique case, an outlier, or are there more of those we good, they good situations?

Well over the years Neural networks have been successful in a plethora of situations where it was believed to be possible only by a human being. For starters chess. In the 1990s no one imagined that a world chess champion in Gary Kasparov could be beaten by a network. Chess was something that was seen as the one key distinction between human capabilities and machine capabilities.

Kasparov losses to IBM Deep Blue/Source: https://theconversation.com/twenty-years-on-from-deep-blue-vs-kasparov-how-a-chess-match-started-the-big-data-revolution-76882

The IBM Deep Blue win was seen as a monumental moment of progress in understanding and implementing human intelligence into a machine. Since then there has been a flurry of examples where such neural networks have re-enacted human cognitive capabilities almost flawlessly, more examples could include language translation, patient diagnosis, and so on. Just like their human counterparts the connectionist models have the capability of learning and adapting.

Brain-like architecture: Just like the case of the neural networks the brain also has a parallelism notion attached to its structures — every neuronal unit being connected to other neurons. But besides, there are differences between the artificial nodes and the brain. If we allow ourselves to engage in a comparison with the biological aspects of neural communication then we’d be in a better position to have an opinion about the applicability of neural networks in the study of the human mind. Earlier we discussed how neural networks are trained with the backpropagation algorithm.

With the backpropagation, we said that our hope is that the weights are changed in such a way that the network’s capability of classification or regression improves. It is yet to be established if some notion of backpropagation is how human beings make corrections to their ‘model’ of decision making. Further, communication at the neuronal level in human beings has been extensively studied.

According to neuroscientists the neurons converse with each other with the help of chemicals called neurotransmitters. In comparison, artificial neural networks dont take such factors into account while transferring the information from one layer to another.

Reasons for Skepticism towards Connectionism

Issue of Language and Thought Systematicity: One of the biggest reasons for skepticism toward connectionism comes from its inability to factor in the Systematicity argument. Fodor and Pylyshyn claimed connectionists will find it difficult to explain the systematicity of language. “The systematicity of language talks about the fact that the ability to produce, understand or think some sentences is intrinsically connected to the ability to produce, understand or think others of related structure”.

For example, if someone understands the sentiment and semantics behind the sentence “David admires Rahul”, then they can be expected to understand the sentiment and semantics behind the words “Rahul admires David”.

There’s no certain opinion as to if the connectionist models are capable of ensuring systematicity. While on the other hand systematicity may easily be proven of having its existence in the Computational Theory of Mind school of thought. As their thoughts are built similarly to concepts. Knowing the meaning and the underlying interrelationships between the words ‘Rahul’, ‘admires’ and ‘Dravid’ would imply that the listener has the syntactic knowledge to make the required conclusions.

Systematicity of thought, according to Fodor and Pylyshyn needs syntactic and semantic relations with the mental representations in order to provide for the existence of the language of thought, while in the case of connectionism theory there is no discussion of syntactic and semantic relationships within the mental representation; this should be a strong case against the connectionism theory as it is unable to comment on the systematicity of thought.

Nativism: Nativism also pops up as a big argument against the connectionism theory. According to the Nativists, some skills are native to the human brain — ie they are inherently ingrained in the human brain.

Goldstein and Jack A. Naglieri write about nativism, “A set of theories that contend that human abilities and developmental processes are innate and hard-wired at birth. These theories inform beliefs about developmental processes most closely associated with initial language acquisition.”

The anti-nativists believe that human brains enter the world tabula rasa — ie. in form of a blank slate. Blank slate in the sense that all skills and capabilities including language comprehension require making sense of the world are not inherently present in humans but instead get developed by the external stimuli and interact with the external environment.

Nativists on the other hand believe that there is some inherent understanding of key skills such as language. According to linguist Noam Chomsky, children are born with a language acquisition device embedded in their brains.

We should be cautious to not totally negate the nativistic notion of the connectionist model. The connectionist models are not aligned with the empiricist way as well. As with nativism the connection network also begins with a set of initial values — though they are of little significance due to their randomness, it is still some knowledge to start the network with. Hence there are elements of nativism in the connectionism approach as well, but how well defined could it be and what will be its implication is a question of research.

Connectionism assumes that the network initially produces responses no better than a random flip coin and only after it acquires more knowledge about the world does the predictions get better. Remember we said that initially all the weights are assigned random values which are as good as setting each parameter zero — but initiating with zero is still in a sense some information that may be used for decision making, hence randomness is instead gets assigned. Hence we can see why the nativists will get irked by connectionists.

Rationality: Connectionist networks have over the years been designed to determine the validity of simple arguments. Experiments have been successfully conducted where the model has made conclusions on the basis of the premise.

For instance, below is one of the eg Premise and Conclusion scenarios.

Premise 1: If A then B
Premise 2: A
Conclusion B

This is a valid argument of the modus ponens form.

An example of an incorrect argument would be,

Premise 1: If A then B
Premise 2: Not A
Conclusion Not B

This is an invalid argument as here happening of A implies the happening of B, but the negation is not true as the premise discusses nothing about the relationship between A and B when there is no A.

For example “If India wins the World Cup, I will go to watch a movie,” in this case the meaning is that if the premise of India winning the World Cup comes out to be true then the premise that “I will go to the movie” will also be realized. There is no discussion of the relationship between India not winning the World Cup and what impact would it have on me going to watch a movie. I may still go to the movie in case India losses, but if India wins am definitely going to the movie.

Bechtel and Abrahamsen conducted research on 6 arguments of the kind as given in the example. After the training, with some accuracy, the model was capable of making the distinction between valid and invalid arguments. With a larger corpus the accuracy increase by a few percentage points. Hence, while not full-blown evidence, this can be taken a step in the right direction for the connectionists.

Explainability of connectionist models: The connectionists model to solve problems of classifications such as we discussed still doesn’t say a lot about how it achieves success in the task. While given a picture we may think that each layer is somehow looking for one particular feature and then overall producing a consensus of if an image is of a dog or of a cat. For example, the first hidden layer is noting if the whiskers are present or not, the second hidden layer (if any) is making a judgment on the tongue visibility, and so on. Hence even when we see a huge success rate with connectionist models at tackling such problems, we are yet to fully understand the nuances of the network and its functioning. And without fully elaborating on the inner functioning it might be difficult to take this approach to understand the human mind.

Bibliography

Buckner, Cameron, and James Garson. “Connectionism,” 1997. https://plato.stanford.edu/entries/connectionism/#SysDeb.

Goldstein, Sam, and Jack A. Naglieri, eds. Encyclopedia of Child Behavior and Development. Boston, MA: Springer US, 2011. https://doi.org/10.1007/978-0-387-79061-9.

Ravenscroft, Ian. Philosophy of Mind: A Beginner’s Guide. Oxford; New York: Oxford University Press, 2005.

This paper was written as part of the Philosophy of Mind course requirement at the IIT Delhi MSc Cognitive Science program taught by Professor Smita Sirker.

Alan Turing: Tech Ideas that revolutionized 20th Century

Turing over his lifetime produced various groundbreaking seminal papers and works. In this write-up, I am going to…

sidgupta234.medium.com

The author can be reached at https://www.linkedin.com/in/sidgupta234