Unedited audio transcription from Google Recorder
Hi everyone. Welcome back to ethics analytics and the duty of care. I'm Stephen Downes. We're in module seven, the decisions we make focused on AI workflows and the decisions made as we apply AI and analytics to education and development. Today's video is entitled tools and algorithms and we're going to be looking at some of the mechanisms that are used by AI and analytics developers in order to create the kinds of tools that were using and learning analytics.
This is a big subject and I'm going to try to do it in less than an hour. Hopefully lesson. Well, maybe not less than half an hour and I'll be sleeping. I'm going to be gliding over the details. I'm not going to get into the many many algorithms that have been developed over the years, the tools and algorithms have.
I will talk about are one set of been developed for a while ago and they're now regarded as classics in the field and the intent here isn't to teach you artificial intelligence. That's not going to happen, but it is to give you a sense of the range of decisions that gets made on a day to day basis, while these tools are being developed and applied in real world situations.
And that's why I begin with this particular quote from Zimmerman, at how in 2020 developers cannot just ask. What do I need to do to fix my algorithm? They must rather ask, how does my algorithm interact with society at large? And as it currently is, including its structural qualities.
We're not designing, AI, algorithms, and learning technology for some abstract future where everything is hunky-dory. We live in an existing flawed human world and the tools we use will reflect that and we have to ask, how do we make the better given that reality. Now we're not going to answer that question either in this presentation, but we'll inch forward bit by bit and try to come up with an answer to this overall in the course.
So I'm gonna be using terms a bit loosely during this talk and indeed I have throughout the course strictly speaking. There are different terms, we could use for different things. Artificial intelligence is a very broad field developed well, after the second world word developed, almost with the beginning of computer technology and basically it's Amy technology designed to emulate human intelligence.
That includes a lot of machine or a lot of rule-based. Algorithms expert systems stuff, like that, that mostly, we're not talking about machine learning, kind of began in the 1980s. It's a subset of artificial intelligence where instead of us telling machines what to do, we begin to think about how machines could learn on their own.
Now sometimes this includes neural networks, sometimes it doesn't, it's all very fuzzy, It doesn't really matter, but we need to keep that in mind. Deep learning has been around. I think it got popular around 2010 or so, the concept has been around since the neural networks work that was done in the 1980s but really, it didn't take off until we got the processor power.
That would make it possible. Deep learning is for the most part neural, network learning, and the. And as I'll say later on in the presentation, the deep in deep learning comes from the idea that there are multiple layers of processing that takes place, the deep referring to the number of layers.
So neural networks are mostly about deep learning but they do kind of extend into machine learning as well outside because you can have like one layer in your own networks. As we've seen perceptrons right Now. Strictly speaking. Perceptrons are a kind of machine learning. You see why the termology is kind of messy, Don't sweat the terminology really My perspective is each one of these tools.
Each one of these, I'll grow them is a thing in itself. We can classify them and categorize them all we want. But really, there's one thing that there's another thing that there's another thing. Then there's another thing And that's probably the best way to look at it. All right?
So speaking of categorizing machine, learning is generally broken down into three major subcategories supervised unsupervised and reinforcement. Supervised machine learning uses labeled data sets. So we have data and each bit of data has a label, these labels are created by humans. So, for example, it might be a strength of numbers and a human said that's temperature, or it might be a string of values and human say that's house prices.
Get the idea. So, with supervised machine learning, we use the data with the labels and that produces output and then a human critic. Looks at the output and decides, whether it's acceptable or not, and then feeds back any error into the system. And we saw Apple of that in the earlier presentation on back propagation and back propagation is typically thought of as the, the paradigm instance of supervised learning and it it does go all the way back to the 1980s unsupervised learning is data without labels.
Now could have labels but we don't care in this case. And what the machine is trying to do, is discover hidden patterns in the data without human intervention. Now there are different ways to do that. So really, the human intervention, in this case is in the design of the network itself and the functions in the network and how the neurons interact with each other.
But there's nobody looking at the output if there is even indeed an output and saying, yes this is good know, that is bad. That's why we call it unsupervised. Reinforcement is kind of like supervised, but we're not saying that an error has been necessarily found. What we're doing is we're applying machine learning within a context like a game say and the state of affairs of the game is fed back in to the the machine learning algorithm and then it uses that in order to correct what it's doing.
So those are the three major types, really, it breaks down into two, right? One in which some kind of external experience, corrects the machine and the other in which the machine kind of learns on its own. All right, so machine learning, let's say, goes back while there's a couple of examples.
I want to talk about here, I won't linger on them a little I could, but are types of machine learning but they're not neural networks. One of them is k, means the idea of k means, is to take some data and organize it into clusters. So imagine you have a bunch of data points, you know, it could be a time temperature or whatever, right represent that on a two.
Two dimensional graph, you know, much like the one we see pictured here. Now, what we want to do is draw circles around the clusters of the data maybe not circles necessarily but we want to organize the data into clusters, how many clusters? Okay, so maybe two, maybe four, maybe is many clusters as there are individual data points that would be like perfect clustering, but we never do that.
So, what we do is, we present that data, we pick a couple of points called centroids and then we cluster all of the data points around those centroids, according to a couple of rules or principles one is, you know, which centroid is the most close to the data point itself.
So that'll break it down into two groups and then we try another centroid, you know, that another paracentroids and create clusters again, we keep trying it, a bunch of times. And what we're going to try to do is pick the best clustering, but what counts is best. Well for perhaps one way of counting, the best is the longest distance in in our graph here between the two centroids we can we can see on on the diagram that we have here and then they're pretty far apart.
It would pretty be pretty hard to imagine that. There were further apart, given these points. Another way of looking at, it might be the tightness of the cluster, right? How close are the individual data points to the centroid? So these are decisions we make on picking the best way of clustering.
These individual data points and here comes the train because I have no control over my environment. All right, here's another type of machine learning. Again, it's for clustering but it's kind of like, predicting clustering. So let's say we have a bunch of data points sitting on a graph. And they've already been identified as this type of thing or that type of thing.
So we've produced some clusters, we can see the clusters there on the, I'm the graph. And now we have a new data point and we wonder what kind of data point is this? Well, what we do is we look at the closest data point. So here on the right hand side, we here's our new data point, the closest data point is a red triangle.
So we could group it as red triangles. Well, we don't have to just pick the closest we could pick the closest two data points, still red train. Okay, well what if we do the closest three data points well to red triangles and one blue square. Okay, so we're still going to call it a red triangle.
What if we took the closest five data points? Now we have three blue squares and two red triangles. From now we might want to call this a data point of the type blue square. So the size of the circle here or another is the number of data points, we will look at k data points really matters.
So when we're using nearest neighbor, that's the sort of decision that we want to look at right. How close or how broad should these circles be how, what size of k in other words, are we going to use auto producing different results of for new data points? And, you know, it really depends on how these things are organized with both kines and k nearest neighbors.
These data points are plotted on the graph according to their features. So we need to pick up what features are relevant to plot them on the graph. And in fact, we might even decide to to use a bunch of different features and different graphs and then average them out, or something like that, or maybe have them compete against each other, whatever, you know, these different ways of looking at that.
All of these are designed decisions. It's almost like an art, right? Because you try these different designs and then see if the results are really what you think they ought to be. You know, if this diagram here, classified, my green dot, as a yellow circle, I'd really wonder about that.
I'd say, no, that's not really the result and I'm looking for. So, in all of these cases, the machine learning or the neural network involved in the deep learning, is going to have to do some learning, it's going to have to reorganize the relations between or the connections between the data points, and I'm using the term data points a little bit loosely here.
But by a data point, what I mean basically is the value of, in this case, one neuron in the network, or it might be one piece of input, it's a little bit loose here, but what we're doing here, in these learning, algorithms is basically reorganizing the weight of the connections or perhaps the, the bias of the activation function or maybe messing with the activation function itself.
Etc. They're always in, which the the machine learning algorithm or the neural network can reshape itself in response to feedback either internally or from the environment. So there's a bunch of different learning algorithms and just as an aside, when people say, you know what is a learning theory, these are things that I think of learning, I think of, as learning theories, you know, educators talk, about learning theories a lot and they'll talk about well, yeah, social constructivism or something like that.
But those are how things learn whether machines or otherwise, right? Social constructiveism isn't a description of how a person learns. It's a description of an entire education and pedagogical approach that hopefully gets people to learn but I think about learning theorem or as I've described here a learning algorithm, as the story.
We have that explains exactly what's going on as the connections between individual neurons or the associations between different data points are shaped and reshaped by well, whatever the learning theory says reshapes them. So let's look at a few of these. So this is the oldest and in many ways, the best, it's called heavy and learning people of heard me talking about heavy and learning for decades.
It's often summarized as cells that fire together wire together. So suppose we have a neural network. Here it is and we present some phenomenon on to the neural network that result in a certain number of cells firing. I might think of this is a layer maybe or maybe a hidden layer whatever.
Okay. So these neurons are all connected to each other. But what is this strength of these connections? Well what we can say, is, if two cells, whoops, tried to click on it, if two cells fire together, then we create or anything, the connection between them, if two cells fire together here, two faced cells fire together.
Here you see, we're going to draw wines, basically connecting the red dots and strengthening that. If a different set of neurons fire, then we'll draw the connections differently. And so here we be connecting the green dots, that's all it means. That's what it is basically, right? So things that fire together wire together they become associated with each other, and that's pretty neat because if you fire a whole, you know, if you draw all these connections, whatever these green dots fire, then suppose maybe two-thirds of them fire.
Well, we're probably going to because the connections are so strong in induced the firing of the remaining dots. So that's called a partial activation. And so we can create an experience is though, it were the whole set of green dots by a partial activation of some of the green dots heavy and learning, you know, how was anticipated as far back as David Hume.
You know, when, you know, when things happen together, we tend to associate them together, that is how human that is. He said how we develop concepts like causing effect or necessary connection or personal identity. Second one, we've talked about already and one of the previous presentations, it's called back propagation.
So in back propagation as we discussed before and quite a bit more detail, and we will hear errors are measured by a human, presumably typically and then a correction is sent back through the network. So that each layer assumes part of the responsibility for the error in the output the the mechanism by which that happens is called a descent function, something like that.
And basically, we're and just looking at the slope of the difference between the result that we wanted and the result that we got. That's all say about that. I talked about that earlier group method of data, handling isn't one thing? It's a bunch of things, so, it's kind of hard to get your hands handle on it, but basically, what we're trying to do is lighten the load for neural network engines.
They're all kinds of ways of doing that. I'm not going to get into the details of those because I'd spend the rest of the talk trying to get into the details of those. But here's an example of what GMDH might do for you. Here's the complete graph that we have at the start of GMDH processing.
It looks at those connections that are really actually be relevant to whatever it is that we're trying to produce. For example, we're really interested in this particular data point has an output, forget the other stuff, so that means we're only going to be interested in the connections that lead to that data point, which really produces kind of a subset of the overall neural network for us.
Why would we do that? Well, processing, the the optimum graph here as it's called, is going to be a lot faster and easier than processing. The whole complete graph as well. We might be able to emphasize some types of data in the optimum graph, but wouldn't necessarily be emphasized, if we are doing calculations on the whole complete graph.
So there is a bunch of like I said, there's a bunch of different method edge that can be used in GMDH if you're interested, follow the links. But, you know, again for our purposes, these algorithms bring in a whole set of factors that allow us to tweak, how we look at the, the learning process in one of these networks.
And so each one of those and no time for discussion of them, but each one of those may have ethical implications, maybe not
Competitive learning is interesting. And what happens here? This is kind of, I mean, it's it's a type of unsupervised learning. Although it can also be used in supervised learning and basically we have our nodes in layers just like before. But in giving layer the nodes are connected to each other and the firing of one layer might prompt or inhibit the firing of another.
Sorry the firing of one neuron one unit in this layer might prompt or amplify or inhibit one. The firing of another one in the same layer. So in a sense the individual units in a layer are competing with each other. For the right to be basically the one on that eventually is the mirror on that fires in that layer.
And this the usefulness of this is that over time. And possibly with training, these will tend to each neuron, will tend to focus on a characteristic set of input. So they'll be what we call feature detectors. So if x1 and x2 fire an x and does not, that might prompt one to fire and two, meanwhile might gain the upper hand if x2 and x and fire, next one does not and similarly, with m down here.
So the competitive learning is a way the machine learns, how to handle the input data on its own simply by feeding back information from the same layer that's doing the processing. Pretty neat idea. Neural evolution isn't technically a part of of deep learning or neural networks. But basically the idea here is to use what are called evolutionary algorithms in order to generate neural networks, neural network parameters, topology and rules.
So the basically the genetic algorithm will respond to perhaps environmental conditions or other variables out there in the world of, for example, keyfold cross, validation, etc. And what this will do is it will actually change the design. What we're calling here a meta model of the neural network that we're going to use to apply to a particular problem.
So it's kind of like yeah it's well it's inspired by evolution right? That's the mean suggests and similarly humans went through, centuries millennia of evolution and over that time here at human neural networks. Evolved and not, not the individual neural networks but the overall design of human neural networks.
You know, we've got, you know, the visual cortex in the back and the hippocampus and all of the different brain components and the way neurons in general link to each other, the two hemispheres of the brain connected by a corpus. Callosum, all of that was shaped by evolution. And so similarly, perhaps the actual design, the topography of a neural network could also be shaped by evolutionary factors.
So there's a variety of approaches and a lot of work done, on evolutionary algorithms. I've listed a whole bunch of resources here. Now, the sorts of decisions that will be looking at here is you know what counts as an evolutionary factor? How doesn't evolutionary factor interact with the design of the neural network?
You know, there's a validation and testing process that happens. What is a good test for a meta model and and what is and irrelevant test for a meta model questions like that come up and the answers to those depend partially on what you're trying to do and partially by what you think evolution is, is it really just a random process is that a directed process in some way toward a better, you know, toward a goal like a better human a smarter human, they're different, approaches to evolution, you know, even if you didn't think like the difference between Darwinian and La Marque, and in evolution, right?
Are we evolving physical features, only or rather contents like say archetypes ideas that also pass down from generation to generation really interesting. Questions here. The boltsman machine is my personal favorite just because I love the elegance of it. And the idea here is that you have a neural network kind of an interesting way to design one here.
But, you know, again, they can have different topographies and what you're trying to do is for a given set of activations. So, for values of you here, being, you know, one or zero, you want to set up your connection weights so that it achieves the lowest possible kinetic energy.
So what'll happen is that and it's, it's nicely described in a Jeffrey, Hinton video, which I linked to a little bit later, but you keep adjusting your connection weights and then to shake it up a bit, and then adjust it again, and shake it up a bit and adjust it again, until you settled into the lowest possible state.
I like to think of it by analogy as what happens, for example, when you throw a stone into a pond and you know that messes up the water but then the water will ripple for a while but eventually it'll settle down into its stable state. Now, what's interesting about this, you know, how that how that happens, depends a lot on the nature of the stone and the nature of the water and we can think of different ways of creating these so that you know, you get different results, you know, throwing a stone into water in zero, gravity probably wouldn't work at all.
So, you know, the amount of gravity that we have also matters but also too, we can come up with different definitions. As to what we think is a stable state. I'll be talking about a tractors later on and you get a visual representation of that. So those are some of the learning theories, it's not all of the learning theories, but I'm, you know, it's, it's not not even necessarily all of the most important, learning theories, but the idea to get a cross here is that there are different kinds of learning theories in my talks, in the past, I've basically focused on heavy in contiguity, which I didn't really mention here, but, but it's similar to competitive pools, back, propagation, and boltsman mechanisms.
And those have been the four that I emphasized over the years, as the way we learn as humans, the kinds of mechanisms that we use to learn. You know, it doesn't translate nicely and directly into learning or pedagogical practice and nor either does it translate nicely into ethical and non-ethical behaviors.
However, the sorts of decisions, we make given these different kinds of approaches to learning that we might invoke in this, or that circumstance, may have pedagogical, or ethical implications Another set of decisions concerns the topology, or the organization of the numbers. And there's two nature kinds of topologies, a physical topology, which is the actual physical organization of the network itself and then a logical topology, which shows how data flows within a network For our purposes.
We can just think of them as one in the same. But obviously there might be logical topologies within larger, physical topologies. There might be multiple logical topologies within a single physical topology. Think, for example, the way male flows in through a company, the physical topologies, all of the people connected to each other in the company.
So it might look like this fully connected to follow G or maybe a star topology or a tree topology if it's higher key. But the information won't flow equally to everyone. An email that comes in to the company might flow up and then stop an email that comes into a company.
Might flow to three or four people and then stop. So the logic topology can be different from the physical topology topologies matter a lot. You know, the way you can construct your network really influences, how your network behaves, a fully connected network is going to behave very differently from align.
I'll give you a simple example. Suppose you have a virus and has a 50% chance of being passed on from one person to the next. Well, if we have a line, there's person has the virus, there's a 50% chance of it being passed on to the next person, which means only a 25% chance of being passed on to the next person.
And so on the probability gets lower and lower and lower so it's almost impossible but the last person is going to catch the virus. Meanwhile, look at our fully connected network. If this person has a virus with a 50% chance of being passed along was look, one, two, three.
There's five individual connections. So that basically means that the odds of this being passed along are really good. I'm not going to do the calculation off the top of my hand I think they're just virtually certain. Yeah. 50% 50%, 50% 50%. I think he just adds them. We're not going to make it more than one, that can't be it.
I'm sure there's a calculation. I just don't know if the top on my head but anyhow if you even passive off to one person right now, this person and the original has the virus and they also have a 50% chance of passing it along, so they probably won't anyhow, everyone's going to get it, right?
So that's really different. The ring. Well the virus can go one or two directions with the mesh. She could go in this case, depending on who gets it, right? And one of three. So you know, the basically the topology influences, the propagation of a signal through the network, makes it important.
There's a lot of topologies, I'm going to look at a few that use some of these topologies. I'm not going to cover all of them because that would be crazy. But remember here's the perceptron that we talked about way back at the beginning. Here's a an ordinary feed forward network with one layer.
Here's a deep network with multiple layers, so and there are other types. Here's a bolts machine. Then I talked about which is a fully connected network in this case but which is using the specific algorithm to way the connections. So here's the feedforward neural network. This this would be the the first of the topology that I'm going to look at.
I'll look at a few and as we saw, it's a network where the data flows into the network and then out the other end. It's like, you know, the perceptron or the multi-layer perceptron or examples of feed forward, neural networks and this internal layer there may be one or maybe more than one will do the processing for us.
Something called a radial basis network is organized differently by means of changing the activation function. You know, it's how this activation function doesn't look like the ones that we talk about in the previous presentation. Previously, it was either a square or single jump or later on. We talked about the sigmoid curve.
This kind of activation function has a point at the middle and decline at each. And what that means is that we're dividing the possible states of a fair here, not into two, but into three, right? A zero. If it's too low, a one, if it's in that middle value and is zero if it's to high, this allows us to do to organize data, not just using a line or single line, whether it's straight or curved, but by using non-linear classifiers.
For example, here we have a case where this x was too low. This x was too high, but these owes were perfect. Similarly, here's the center of course, bonding to the top of our activation function and here's these are all the data points around it. So but changing the activation function, we can change how we do classification in our data.
That's really important because this fact, if that's true, now, let's think about that for a minute. There's an infinite number of ways. We could define that activation function. Every one of those is going to treat the data differently Now. There's no obvious ethical implications to treating the data one way as opposed to another way although, you know, if you view the world as nothing but sigmoid functions, it's either this or that then you can't do all of logic So and if you can't do all logic, that seems to be a weakness, might be a problem with the system that you're using.
So and that's one of the one of the things that led to the development of radio basis networks, if you wanted to do a logical operation like exclusive or for example, then you need this kind of network and not a simple. Perceptron this is a convolutional neural network and it's a fascinating thing.
So suppose, we're processing and image. That's represented by all the green squares here. On the left hand side and what we want to do with this image is identified different features in the image. If you remember from the presentation, we did near the beginning of this module, where we were recognizing different numbers, remember that?
And we were recognizing numbers first by picking up the strokes or the circles. So detecting the features and then combining the features so that there were parts of numbers and then identifying the number. But this is how you do that. So basically you work your way through the input data with a subset but looking at subset by subset.
Basically, what you're doing here is you're taking what we call a kernel filter and applying it over and over and over to different sections of the input data. So this filter is looking for an x. Can you see the x there in that diagram, right? So what you do is you apply it to the top, three by three part of this matrix, and then you multiply the filter by the values in the matrix that gives you a result in this case, four, and then you map out all of the results.
And that way, you're doing feature detecting for this particular filter on this particular on this particular data. Now you could apply this, you know, use different filters. So take your, take your image data, make a couple dozen copies of it, and then apply apply different filters to each one of those create one of these individual grids and in fact, you probably want to pull this.
So it's just one single number and then put that into a competitive mirror network layer, and then feed that forward, you've got a really reliable feature detector, and that's what these things are used for. So at first glance the definition of the kernel filter, seems completely arbitrary and it is right, It's just ones and zeros here.
Right once zero one, zero, one zero ones are all one. That's the feature that we're looking for. But the question is, what delays ones and zero stand for? What are the ones and zeros in this input data, you know, in a typical image, it might be the, the, the three primary colors, right?
So we'd have three actually three squares. Each one representing one of the primary colors. So what are they? Red cyan and yellow or something like that. I can never remember. Once again, introduced sign into it. I was lost. I'm, I'm old school, right? Red, yellow and blue. Those were my primary colors, anyhow.
I also think there were only six colors in a rainbow so there that was one for Andrea who believes that there are seven Amy. How what are those ones in zero stand for? What are you detecting with those? Are you detecting simply dots in a sensory apparatus like a camera?
What is your camera looking at? Where is it pointed? What is the focus is large? Is it narrow? Is it? A micro camera? Is it testing for infrared and ultraviolet as well. Maybe it's got a filter on the camera, so it's not doing heat detection, maybe and this is an interesting application.
What's feeding into this matrix here is data that you've already processed, for example, by looking for what's called gradient descent. So what you're doing there is you're looking for if you've got cases of a high value and a low value. So then you have a high gradient descent. As compared to cases where the two values are close together.
You have low, gradient descent that allows you to detect edges. So this might be a gradient descent matrix. That now we're analyzing with this filter in order to find features, so that's an edge detection used to do feature recognition. That's very similar to how the human visual cortex actually works as well, where the edge detection place a major role in how we recognize objects in the environment.
So take this combine it with a time sequence detector so that we can persist data over time. And now we're well on the way to picking out objects in the world. Aren't we? So recurrent new neural networks, use the output from one neuron as the input for another. This is the basis for the competitive pool.
And that works that I talked about earlier, this is the basis for both machines as well, where neurons feedback into each other. So a fully recurrent neural network will connect all of the neurons to each other. But usually we have simple recurrent neural networks, where we just take one layer of the neural network to feed it back into each other.
This allows us to think about things like well, long, short-term memory. Let's see what the cognitive psychologists do with that, right? They love their short-term memory, and cognitive load, and all of that. So, basically, this is the the time sequence detection that I talked about, just a few moments ago, where it can process data that comes in sequential sequentially and keep its hidden state over time.
So, you know, I see, we have eight coming in at time, minus one and coming out of here is h t, and then that'll feedback into HTM, minus one for a new ht etc. So, as you can see, the networks feeding back into itself here, so that it can keep this hidden state through time.
And so, this is useful for things, like detecting, handwriting detecting, speech, finding anomalies. Where the one thing depends on the next thing. Depends on the next thing, depends on the next thing and you wouldn't be able to detect those unless you had something like an LTSM in order to do that.
So, this 10H and and this, I'm not sure what that is, is that it's not a sigma, it's a, it's not an omega, it's something, it's please don't say. It's no micron. These are activation functions again So 10/8. So we had sigmoid, We had the bell curve kind of thing.
And now there's 10 h is another kind of activation function Hopfield networks sometimes talked about under the heading of spin glass, I've been around since forever. Well, since 1982 and basically combines the idea of LTSM with our LSTM with boatsman machines. This is the one that's described by Jeffrey Hinton.
In this really nice video. I do recommend that you watch it and the idea here is that memories, could be, he suggests energy minimum of a neural net. So what was happening here? Is that the hop fuel? Net is storing a pattern and then it uses that pattern in order to recall a full pattern based on partial input the way I talked about before.
So what you're doing is you're trying to achieve a state of minimal energy. So what you're going to do is here you have these different values ones and zeros. But now what you do is you work through these one at a time, you just pick one at random and ask yourself, what would the value be given everything else.
So here's a one here, let's turn this into a question mark and ask. Well, what would the value be? Well, it's whoops. One times three plus two times. One is five. So five is above zero. So we'll give that a one. Let's move over here to this zero. Let's put make that a question mark.
What would the value be? Well, one times minus four is minus four. Plus one times three is three so that's minus one plus three times. Zero is zero, I still minus one size below zero we'll even ask a zero and so on and basically the idea is that you get this stable state when you get this number.
Well the way I've been doing it as high as possible because this number of corresponds to the negative energy. So the higher this number is the lower the energy state is in this era, are in this neural network. And so, that's how it's stores. It's, it's memory. We have the weights of these connections which are basically telling us what this memory should be.
This memory is the lowest energy state. Now, I just drew a little parabola or upside down parabola. And so, the lowest energy would be the minimum, right? Of course, that's in a nice simple world in the real life, you might get ups and downs. He might even have a three dimensional field, which has different minimal points.
That define what your lowest energy state is and it even depends on how you're defining energy, how you're setting these ways, how you're writing these activation functions, all of those impact, how you think of your lowest energy state still need idea. And now, people don't really use these anymore, but the concept hasn't gone away.
And it does give us a step toward what might be called and a tractor network. Remember I talked about different ways of finding these lowest energy states. Well, let's define those as a tractors, and so just brief aside, all of chaos there, throw that into the next. So now, these are my attractors.
Instead of, you know, just a single lowest point defining my most stable state here, I'm defining lowest points on this grid or this two-dimensional three-dimensional matrix. So there are different minima that I can be working towards. Here's a couple, here's one or on this one. Here's one, here's one.
So as I calculate the different values for these neurons, I'm looking for the values that will lead us to this lowest possible state. So I'm going to do some random calculations work at what all about values are plotted on this graph. Do some random calculations work out where all the values are plotted on this graph.
And what I'm going to do is actually trace a path through the graph. Whoops, has I moved toward that lowest energy value and so that's how my attractor network works. And that's also how the the process of this attractor network reaching that lowest possible can be described and you have these paths that I actually look.
Like, you know, the the attractors that you see in chaos theory because you know, they won't go straight. You know, they don't go straight to the lowest point, though. Usually sort of circle it and then zero in on it. So, there are different ways of defining these. These may represent real-world conditions, they may represent, you know, changing environments over time.
There's any number of things that these could represent. In this particular example here, we're looking at cell sequencing cell, proliferation, cell death, I know nothing about cell sequencing, and proliferation and death frankly, but that's the sort of thing. This is the the p-53 regulatory network. So we're using these networks to determine whether there's going to be DNA damage reparable, DNA damage or irreparable DNA damage to.
This particular regular starting network fascinating stuff, but you can see how the the real world application of this feeds directly back into how we're designing these networks. And how and how we're designing these internal properties of these networks, these incredibly complex, internal properties of these networks. So this actually may represent an actual set of physical connections between say cells or DNA strands or whatever in a human body.
And now, we're predicting as a result of some external influencer, just the state that they find themselves in. So, as I mentioned before, deep learning is taking all of these things and actually running it through multiple layers. So we talked about edge detection and feature detection a little bit earlier, that's how this process works, right?
So we have perhaps, even our input layer, a bunch of light and dark pixels. Now our next layer, might identify the edges. So you know we're doing gradient descent or something like that. Then in the middle we're going to find combinations of edges. Which tells us where we found and I or a nose or, I don't know what that is and that lets us build up a bigger features like before, nose left.
And right, I still don't know what those are, what could those be? I had no idea and then not eventually allows us to recognize, oh, it's the top of the head and it's the channel, okay? And then combining those allows us to pick from a set of possible outcomes, what we think this input is.
And in this case, the input is George Washington. The output here is George. This is really important because in certain very significant respect, you can't recognize George and unless you already have some idea of what George looks like. So you know, recognition isn't something that just happens automatically it isn't inherent in this data.
This data all by itself isn't going to tell you that it's George. You need to recognize it as George in some way. We can say that this input corresponds to this output and that's all just ones and zeros right this set of ones and zeros corresponds to this set of ones and zeros.
But from our perspective we need to be able to say this set of ones and zeroes actually is George. So that's an important limitation on how we can interpret the reasoning. That's conducted by artificial intelligence and neural networks. All right. One more thing and we're done frameworks. Now, part of the reason I was slow with this particular presentation is, I thought I should look at the current state of some of these frameworks.
Well, there's no such thing as just looking at the current state of some of these primaries. First of all, look at them. All, You know, we have Google machine, learning kit, open your own networks auto machine learning etc. I messed around with tents tensorflow, the other day and I also messed around some of psychic.
Most of these are written in python. Some of them are written in C, none of them are written in languages and I'm happy and comfortable with, which is her real problem for me. And, and there's a real problem for people in the world at large. Right? You know, if you're doing machine learning, really, you need to be comfortable with python, really comfortable with python.
And this is a edge. It's similarly with some other languages where it really matters a lot which version you're using. For example, I installed the most recent distribution on of Python on my computer in order to work with TensorFlow. Not realizing that well TensorFlow, I install 3.10 the latest version TensorFlow works between.
I forget what it was exactly but something like 3.5 and 3.7. So I couldn't run tensorflow on my version of Python. Also a lot of these machine learning and neural network applications will load whole libraries. For example, matplotlib, which is a library that actually creates those graphs for you, or there's forget when it's called, it's basically, it's a numbers for Python site, is not psychic but anyway, whatever it is others.
The specific whole framework for machine learning calculations in Python. It's it's not it's siphon or something like that, I'd have to look it up. So and this hall goes to say that, you know, developing and using these machine language or machine learning and neural network or deep learning applications.
The, the learning, the learning theories. The, the patterns, the, the design, the activation functions the topographies, this isn't something that you and I and pretty much everyone watching this video or doing. It's a very specialized discipline and not inherently carries with it risks, any discipline? That's really specialized will carry with it, inherent risks, you know, the the club of people who are expert in nuclear systems is very small.
I might be not looking at the long-term effects or even outside their own. This one takes a lifetime to master all this. You don't have time to study all of the history of philosophy or pedagogical implications and theories from the 1950s on. It's just not gonna be part of your education.
And so consequently, this is why we say things like we need diverse teams working on these applications because it takes so much to learn these things. You've seen some of the detail right, that the people who learn how to make these AI engines, run aren't going to be the people who have a deep and informed understanding of say, the ethics of care or contract theory or pedagogy social constructivism, connectivism the rest, the going to be different groups of people with different bodies of knowledge.
That's the nature of knowledge. I mean it would connect to this system is like that too. You know, any neural network is like that where each neuron in the network has its own thing that it's doing and only the network as a whole does things like recognizes George Washington.
There's no individual neuron. That's the George, Washington recognizer, You know, just like there's no peace in my head that is the knowledge. That is Paris's a capital of France. It's all distributed in the connections in my head. So that's one thing. The other thing is that this discipline is still incredibly in a flux.
And, you know, you don't necessarily get that sense, reading it in the newspapers reading about it in the newspapers but you really do get that sense, actually trying to work with some of these tools. A lot of, it's like, just simply don't work right now. Of course they all do work but you can't just install them on your computer, like, Microsoft Word and it takes off generally.
And as a solid piece of advice here, I would say use something like a cloud container. Like a doctor container where it's all pre-installed and ready to run. Now, there's, you know, a gazillion decisions have already been made in designing that particular container with that particular framework to run those particular algorithms.
How about works been done? And so in a sense, if you use one of these containers, you're viewing the world from the same perspective. As the designer of those containers Nonetheless, it's going to take you years to learn how to design that perspective for yourself. So this is a case where, you know, when people talk about having a theoretical lens to look at the world where that actually makes sense, but it's not theory in the sense of constructivism or whatever, right?
It's theory in the sense of here is a mural network tool or framework contained in the container that I will use to run this data through. So you can either install it as a doctor container. Or what's also becoming popular is that these are just made available as services by large companies.
For example, I demonstrated at a talk, I did a while ago, I fed one of these AI recognizers by Microsoft, I fed it in an image and it sent me back a caption, right? So, I just had a little bit of JavaScript of my side and it had this great big AI with, you know, high processing power on its side.
So there's a cost to that. Obviously although in this case it was a demo so as you able to do it for free, but if I tried to do that in production, now I'm going to get a cost but if I run it on my own machine, rather, still going to be a cost.
But then it's a trade-off right now, I'm doing economic trade-offs, and not just processing our algorithm decision making trade-offs. You look at any of these and you will immediately see what I mean. You, you'll see that this is really hard to do almost impossible for an individual person to do from scratch.
Although you might be able to support that as an organizational capacity in a larger organization and is going to impose a way of looking at the world but not as simple kind of lens. Like, you know, critical race theory or something like that. There's no such thing as a doctor container, containing machine learning algorithms, that is critical race theory and forms.
I mean, it's really is comparing letters to oranges, you know, we just completely different kinds of things, but there is a theoretical perspective. Look at this image here, right, there's a theoretical perspective here. Look what's important? The big orange thing. The little red things and these tall green things right in this purple thing.
So we know that the objective here is to keep the blue thing on the purple thing and to avoid the orange red and green things. But that's not all of it is to driving right? And there's holes of the world not covered by or orange red and green things.
And that's the kind of perspective that's brought to bear here. You know, that's that the I don't want to say bias because bias is the wrong word, it doesn't even make sense to say bias anymore in this sort of context. So that's the tools and algorithms presentation, took me a bit to get this one out, but I'm glad I did, You know, when I want I want to I want to stop for a year and go back and keep doing this presentation until it's unhappy with it, but I can't, I got to move on.
We've got to move on The world's. Not going to wait for us, So I'm going to move on. Next presentation, We need to talk about models and interpretations and I'm going to throw a whole bunch of other stuff into the mix. The mix is, like I said during the summary for this module, We're not taking into account, five ten, twenty factors.
When we're making an ethical decision using artificial intelligence, or, and intelligence ourselves. We're taking into account thousands, tens of thousands 60,000 features you look at this and you can see how that could be the case. And that's what I'm trying to show in this presentation. So that's it. I'm Stephen Downes.
Talk to you next time.
- Course Outline
- Course Newsletter
- Activity Centre
- -1. Getting Ready
- 1. Introduction
- 2. Applications of Learning Analytics
- 3. Ethical Issues in Learning Analytics
- 4. Ethical Codes
- 5. Approaches to Ethics
- 6. The Duty of Care
- 7. The Decisions We Make
- 8. Ethical Practices in Learning Analytics
- Videos
- Podcast
- Course Events
- Your Feeds
- Submit Feed
- Privacy Policy
- Terms of Service