Unedited transcript from Google Recorder
Hi, everyone. Welcome to ethics analytics and the duty of care. We're in module seven, the decisions we make. And this is the first of four videos in, which I'm going to be looking at data. I'm splitting it to four videos so that we don't have this big huge monster video, so I hope you appreciate that.
It does mean that we have four shorter videos which would add up to a big huge monster video. I've nonetheless decided that I'd like to break things up a little bit. I'm so basically what we're going to look at is how to source data, where the sources of data can be found decisions that we make there, then looking at things, like classifying data managing data and working with data.
So, without further ado, let's begin. So first of all, I guess it's kind of obvious to most people would data is, but for the purpose of complete, lists completeness, let's state the definition to be clear about it. I've just taken the definition from the first result on Google when I type in what is data and we get it as a noun facts and statistics collected together for reference or analysis or similarly, the quantities characters or symbols, on which operations are performed by a computer being stored and transmitted in the form of electrical signals and recording on magnetic optical or mechanical recording media and could be recorded in other kinds of media as well.
But those are the big free right now in the year. 2021 in philosophy. There's a concept of data things, known or assumed this facts, making the basis of reasoning or calculation. The kind of get the idea, right? I mean data is the input to our analytics or AI systems.
It's probably going to have to be digital in some way although it could be a digitization of some external source and it's going to be taken as a giving, you know, as you know we would think of it as the premise of an inference or the raw material or something like that.
So let's you know as similar terms include facts, figures statistics, details particulars, specifics features. We often think data needs to be numerical or mathematical, but we know that that's not true. It could be qualitative data, as well as quantitative data. It could be descriptions accounts pictures. So on. So, you know, there's a wide range of different types of data.
Different sources of data, all of which becomes the raw material for our system. Since we're looking at a analytics, we're not so worried about the philosophical sense of data, although, you know, it always lingers in the background but we're thinking specifically of data for computation. So here, from from Jack Vaughn, we'll use the following accounts.
I won't say definitions because that's too strong. But data is information that has been translated into a form that is efficient for movement for processing. Now, we need to be a bit careful about that. Because information is a technical term data, can provide information, but not all data provides information.
Give you an example suppose. I know that Andrea's shirt is red, okay? I already know this. So if you came to me and said, Andrea's shirt is red, there's no information there because you haven't added to my stock of knowledge about the world or another way of putting it.
You haven't reduced the number of possible states of affairs. The world could be in but you have given me data and I might maybe add it to my previous assertion or whatever. Certainly, I'd need to go through some sort of data processing in order to understand what it was that you were saying, etc.
So data is the presentation of the content if you will but it doesn't buy that fact acquire stitch. It doesn't buy that fact acquire semantical importance. It might be information or it might not, it might be knowledge or it might not. It might be wisdom or it might not.
So it's just the presentation. Similarly, you know, when Von says, data is information converted into binary digital form? No, not information, I think a better word is content. I don't know what other better word would be to be quite honest measurements. Perhaps readings or readouts, you know, creative output.
Well, there's our range of things that could be. If we look at the, the little diagram there, we get a sense of the sorts of things data. Could be like quantities information, graphs measurement, observations facts, numbers and I could go on right videos. Speeches weather reports, you know, there's a whole range of things that can be data, and that's part of the problem.
Almost, anything can be data. This video is data, you know, and within this video, one segment of it, say beginning now and ending. Now, that's also data. Von also uses the term raw data to describe data in its most, basic digital format. The only type of digital format there is other than basic is analog, which is what it is out there in the world.
Perhaps, with markings on a piece of paper or something like that. I don't think there's, you know, a distinction to be drawn between more basic and less basic digital formats, but I get the sense of what he means there. He means data as a series of ones and zeros because that's, you know, the most basic data format we can get.
But ultimately all data becomes a series of ones in zeroes For now, right? I mean, in the future, we might be working with quantum computers and things aren't so simple anymore. But for now, it's all ones and zeros. So maybe we'll just think of data as the ones in zeros that are presented for analysis by an AI or an of analytical system.
So, we just say it like that. Actually, we're actually pulling out and pulling away some of our presuppositions as to what data must be, it's just one's in zeros. That's what it is. And then we might call it a number. We might call it a graph. We might call it a fact, but to the computer, it's just one's and zeros and we're feeding it into the computer.
So we entered the age of big data. Oh maybe 10, 20 years ago, Chris Anderson writing in 2008. Well after the demise of wire does an actually useful and cyberpunk publication wrote about basically data and the end of theory writing quote this is a world in which massive amounts of data and applied mathematics or place.
Every other tool that might be brought to bear out with every theory of human behavior. From linguistics to sociology. Forget taxonomy, ontology and psychology now that's pretty extreme, right? Because basically what he saying is it's all ones and zeros, but I think it's too extreme because, you know, in addition to computers we have to have humans somehow interacting with this system and humans don't work well with ones in zeros.
I mean we sort of do in some limited cases but we've developed things like language and arts and music in the rest over time to communicate ideas and feelings and thoughts and sentiments in a way that is more analog than ones and zeros. So there's always going to have to be this process of interpretation and using that word, kind of in a technical sense and kind of in a loose sense, maybe I should say translation between the ones in zeros and then the kinds of noises and scribblings that humans make or have in their thoughts.
Certainly Anderson's statement met with some resistance and and most immediately by people like Jenna Boyd and Kate Crawford. Been a boy was very active in the early days of social media and, and became a critic of social media early on. And so it's an actual that she and Crawford would also be a critic of Anderson's article and they posed a set of what they called.
Critical questions for big data, they asked, would it lead to better tools services and public goods or would it lead to privacy incursions and invasive marketing? They asked whether it would help us understand online communities, and political movements better or instead would it be used to track process, protesters.
And suppress speech, wouldn't they asked transform how we study human communication and culture, or instead would it narrow, the palliative, research options and alter what research means? Well obviously we've seen in the 10 year since a bit of both for all of these questions. So I mean there were kind of false dilemmas, we got better tools and services in public goods.
We're using one right now, but we also got privacy encourages and invasive marketing, similarly, with the rest. So never was an either or question to begin with, and that's important because we understand that data and especially big data actually is changing all the things that Chris Anderson said that it would.
But it's eliminating none of the things that he said it would eliminate. And that creates a lot of tensions and it creates a lot of disagreements. And what the data is and how the data should be used. Because look, let's go back this where this is a course about ethics, right?
And someone somewhere thinks that everyone of the things on this page is good, someone thinks the privacy incursions are good. Someone thinks the marketing is good or tracking protesters or narrowing the palliative, research options. They think it's good that we do this clearly void. And Crawford think it's bad, just in the way.
They've, they phrase these questions. But really, that determination is itself. A subjective question and not, not really answered by the data at all, they approached data then as a big data, as a cultural technological and scholarly phenomenon, that rests on the interplay of technology, andalysis and what they call mythology.
Now, the technological aspect we understand pretty. Well, the technological aspect might be sensors and tools input devices like this keyboard or like the camera that I'm recording this video on, maybe the my phone technology would also include the processing of the data to storage of the data etc. Pretty much everything.
On the left hand side of this diagram, the analysis part, I'll be talking about that for throughout these presentations, is the work that we do to make the data useful to us. As I said, it comes to us in the forum of one's in zeros. We can't work with that as humans, not really.
So there needs to be some kind of mechanism to make this data work for us. Both classify all about under analysis, and then third is the mythology. And I've decided to quote that in this box here, the widespread belief that large data sets offer a higher form of intelligence and knowledge, that can generate insights.
That were previously impossible with the aura of truth. Objectivity, and accuracy. Now, there's quite a bit to unpack in that. I think it's quite unfair of them to say. There's some loaded terminology, for example, higher form of intelligence or aura of truth. Let's dispense with that. I don't think the proponents of big data are thinking in those terms, but they are thinking I agree with them.
That big data can generate insights that were previously impossible. It can draw. For example, kinds of distinctions between populations that we couldn't draw because these distinctions are based on analyzing 50,000 characteristics, instead of the usual 10 or 12 the humans use. There's also an aspect of truth and accuracy, that's provided by big data.
For example, when we use big data to do automated marking of essays and I don't have an on the slide here but there's a paper out there that says that the results of the AI generated marking were more consistent than the results of marking done by traditional professors are instructors.
Now better in a sense, is more accurate it or maybe it's more precise, you know, maybe the AA is clustering but in the wrong place that certainly possible but it's doing something that we weren't able to do before for sure. And it's doing it in the way that at least up.
Here's more objective and more accurate. Now, there are ways in which it won't be, and we'll talk about that. But it doesn't fall that simply by stating this. As a mythology that these things are actually properties of big data. Now of course they have a whole paper which they're making these criticisms but and I do urge you to go read it, but I don't think we should be dismissive right off the top.
I think that would be a mistake. The there is a discipline that is developed from that paper and and similar observations called critical data studies. And was most notably launched by Craig Dalton in Jim Thatcher about eight years ago and they launched it there. Looking at it from the perspective of geography and as geographers, but they launched it in the form of what they called seven provocations.
The first propagation is situating big data in time in space. So and again, it's just a bunch of ones and zeros, but those ones and zeros come from specific times since specific places. So we need to recognize and realize that second provocation technology is never as neutral as it appears.
We've talked about this already in this course. Amy technology could be opinionated technology might be designed for a certain purpose. It might be something that enables us to do new things that we weren't able to do before. It might be something that leads us to see things from a different perspective like the telescope or the microscope big data.
They write does not determine social forms. The data doesn't tell us how society is organized. It does not fall from the fact that data reports an organization say of society. That, that organization is actually in society data. They say, is never raw. We'll touch again on that later. On in these presentations, even though the data is nothing more than ones, zeroes coming in there are ways in which we have selected an organized.
Those ones in zeros, they don't just magically appear out of the ether. So it's not raw in the sense that it's not, you know, not altered or creative in some way. They also said, big, isn't everything. And I remember while back there's sort of a counter movement of small versus big, there was also a counter movement of slow versus fast because big data sometimes makes you think of fast data as well.
And it's true, big isn't everything a course, for example is one thing when it's really big and that's 150,000 people in it but it's not necessarily better or more informative etc. They brought in the idea of counter data. And then finally, they asked, what is our practice? What is the theoretical basis behind which we plan and deploy?
Our work are a higher analytics engines using big data. These are interesting questions and obviously there's a perspective and a point of view from them when I want to again draw this distinction, that Anderson raises because we really do have to very different points of view of the world here.
The one view of the world is what will call quote, unquote data driven, now, given or recognizing all of these provocations, nonetheless the idea is that the data is what we are studying and the data is enough for us to study Against that is what we might call a theoretical perspective.
And you know this includes critical theory but it also includes educational theory. You know, any pretty much anything, you call theory, right? And there was a whole movement that created theory and that movement in a sense are used that amendment on caricaturing, it here a bit. But in a sense argues that the theory comes first, you pick your theory based on various considerations and then you use that as a quote unquote lens to look at the data.
So you see the data through the perspective of the theory or you see the world through the perspective of the theory, I think these are very different approaches and I'll be honest, I fall more into the first than I do in the second. I've never been a love. A lover of the theory, based approaches to anything and to me, it often feels like pulling a theory out of the air and then using that doesn't seem like the right way to approach it.
But then again, I would say that because, you know, I'm more interested in the data than the theory, but we have to recognize that you can't just work with the data without any theory. And you can't just do the theory without any data. The data has to be theorized in some way otherwise it's once in zero, so we just can't handle that.
But the theory has to, as high degree would say, save the phenomena, it has to respond to an in some important way depend on the phenomena, you can't just be right or wrong independently of what the data says. So, we need to keep these two tensions in mind, as we talk about how we work with data.
And what the ethics of working with data happened to be the context of critical data studies is the work that we're doing using data in order to perform analytics or artificial intelligence. So CDS critical data studies forces us to ask questions about how we define the data. How we are?
You know, how we justify not using some data or, you know, maybe we've just not considered some data, how the data are produced, how the data are conceptualized, or organized or classified, we'll talk about that. And then the actual practice of how we'll employ the data. We've got on the right hand side and that that's from Paul Princeloo on the right hand side.
We have this data maturity model of the different stages of things that we do with data. So first of all, defining it, then protecting the data, I don't know why, not just stage, but it's there, understanding the data, using data analytics, tools, and business intelligence or BI activating the data.
In other words, beginning to use the data in some ways, such as creating custom audiences or personalization etc. Optimizing the experience, for example, cross channel journeys, and then using it to predict using prediction models using data to predict oriented customer behaviors. You know, to serve them better the data sources.
Well, this is kind of a messy slide because, you know, they're different ways we can talk about data sources, certainly, and the field of learning analytics to a large degree. We're looking at data from learning management systems or elements and couple references there. So, and these could be divided into two types of statistics engagements statistics.
Like, you know, how often did you log on how often, did you review a video and then perform its statistics? How well did you do on a task for example, or or what grade would be system? Give your short text response. Another data source, might be students interactions and discussion forum posts.
And that, of course, would include names videos, photographs, etc, that they may have uploaded and shared with each other, but the data sources are not limited to educational tools or the educational context and they could include social media and sources from outside that learning context, Twitter, Facebook blog posts.
If it's sufficiently invasive, it could log the website, you've read it could log even the books you've borrowed from a library or purchased from the store. If it's really, really invasive attracts you around the house and watches. You do hobbies we haven't reached that point yet but I'm sure there are people out there who would like to and then of particular importance is just from the perspective of care, and responsibility is data resulting from active intervention.
You know, a lot of analytics is treated just as well. The exhaust from the LMS that were analyzing, but in fact, a lot of data is produced when we actually experiment with subjects, I might be a minimal intervention with a subject like giving them a survey or it might be sorting them into AD groups to the usability testing or it might even be things like tricking them by using an AI tutor on them.
And seeing if they detect whether or not it's a tutor. If we go beyond the field of learning and development into something like same medicine, it might actually involve invasive probes, you know, cutting open somebody to do a biopsy stuff like that it you know, so the interventions can be very significant and then of course there's a range of studies from psychology where they've done.
Inter comes with, with groups. The sources of data are significantly influenced by the instruments that are used to collect the data. And by instruments here, we can think of it as very broadly. So that a survey is an instrument, you know, a suggestion box is an instrument, but so is a sensor, a microscope, a telescope etc, and reading Robert John Ackerman from, I guess about 40 years ago.
Now you see that these instruments very much determine the nature of the data and the sources of the data. For example, if we have a motion detector in our house, by it's very nature, it is going to detect only motion and it is not going to detect. For example, the color of whatever is moving or whether it is furry and small or big non-furry.
It's etc, right? And this can lead you to draw conclusions that lead out, important aspects of the environment that you're describing. Your tool might limit, might be limited in terms of range. For example, it might scan for only certain frequencies inside, only certain colors. And in photography, it might have a course resolution or a fine resolution so on and so forth and it's interesting how the tools themselves in an important way define what we think we're measuring and think for example of a thermometer you know degrees didn't exist until we developed a thermometer to measure degrees.
Before we have a thermometer, there's no real sense to be made of, you know, the temperature of the air. If anything temperature would be completely subjective and would include not just the plane air temperature. But the humidity, the effect of the wind even perhaps our mood. How fatigue we are.
How much we've sweat sweated. I don't know what the past tense. Of sweat is sweated. Recently etc. We try to even correct for that now. But there was a time when all we thought temperature was is whatever that reading was on a thermometer. And it's sort of makes you think now, you know, what else are we missing from temperature?
So, there's very much a sense in which, our picture of a world depend determines what our data sources are, but conversely are data sources. Very often determine what our picture of the world is and it's something that we need to keep in mind. So that's the end of this presentation on data and I'll stop here and we'll move on to the next presentation.
Well, for me, it'll be just a couple moments for you and to be as long as you want. So that's it for now. Thank you. I'm Steven Downs.
--------------
-------
Wasn't actually included in the audio but shouldn't be omitted
1973 HEW report proposes three types of data - these are based on its application: (HEW, 1973: 5-6)
-
Administrative Records. The administrative record is often generated in the process of a transaction-marriage, graduation, obtaining a license or permit, buying on credit, or investing money. Usually a record that refers to an individual includes an address or other data sufficient for identification. Personal data in an administrative record tends to be self-reported or gathered through open inspection of the subject's affairs. Private firms usually treat administrative records pertaining to individuals as proprietary information, while administrative records held by the government are normally accessible to the public and may be shared for administrative purposes among various agencies. Administrative records sometimes serve as credentials for an individual; birth certificates, naturalization papers, bank records, and diplomas all serve to define a person's status.
-
Intelligence Records. The intelligence record may take a variety of forms. Familiar examples are the security clearance file, the police investigative file, and the consumer credit report. Some of the information in an intelligence record may be drawn from administrative records, but much of it is the testimony of informants and the observations of investigators. Intelligence records tend to circulate among intelligence-gathering organizations and to be shared selectively with organizations that make administrative determinations about individuals. Intelligence records are seldom deliberately made public, except as evidence in legal proceedings.
-
Statistical Records. A statistical record is typically created in a population census or sample survey.The data in it are usually gathered through a questionnaire, or by some other method designed to assure the comparability of individual responses. In nearly all cases, the identity of the record subject is eventually separated from the data in the record. If a survey must follow a given individual for a long time, his identity is often encoded, with the key to the code entrusted to a separate record to guard anonymity. Data from administrative records are sometimes used for statistical purposes, but statistical records about identifiable individuals are generally not used for administrative or intelligence purposes
Decisions about data collection should depend on the purpose, says the report: "Religious data, for example, should not be recorded where there is no state supported church, and citizens should not be required to furnish extraneous data as the price of obtaining a benefit" (HEW, 1973: 6)
---------------------
- Course Outline
- Course Newsletter
- Activity Centre
- -1. Getting Ready
- 1. Introduction
- 2. Applications of Learning Analytics
- 3. Ethical Issues in Learning Analytics
- 4. Ethical Codes
- 5. Approaches to Ethics
- 6. The Duty of Care
- 7. The Decisions We Make
- 8. Ethical Practices in Learning Analytics
- Videos
- Podcast
- Course Events
- Your Feeds
- Submit Feed
- Privacy Policy
- Terms of Service