Ethical Practices: Part Three

By Stephen Downes
Dec 22, 2021
Transcript of Ethical Practices: Part Three

Unedited

Hello and welcome to ethics analytics and the duty of care. And we're in part three of our series on ethical practices. This is a part of the eighth and final module of the course, ethical practices in learning analytics. Today we're starting on the some of the meat here parts of ethical practices.

We've looked at the concept in general already and looked at some fairly lightweight. Dare I say, toy examples of ethical practices frameworks number. None of them really got into the sort of depth that we require in order to really understand what it takes to inform ourselves about ethical practices in learning analytics there.

You know, the the simple ones basically reduce to checklists and that's nice to have a checklist but a checklist doesn't tell you why we're up to what we're up to. But in this video we're going to get a sense of some types of practices that are a lot deeper than that actually do help us get to the why of the matter.

So we're going to be again with an area of studying known as data governance. And this is used in order to look at the ethics of AI and analytics, for example, a client and brown. You said a data governance framework for there, article ethics at the core and the idea here is that you're combining and ethical perspective on data with sounds and and by now already well established and well, recognized data governance processes.

So this is a departure from a lot of the practices. Well indeed, all of the practices that we've been looking at so far in the sense that it's actually looking at how the development of AI and analytics and software in general, are actually conducted in the previous section. We look at the very long and involved process or workflow in an AI or analytics application, each one of those steps, including the steps about data in particular, our governed under a series of governance frameworks and that's what we're going to look at here.

But with an, I as this article suggests toward keeping ethics at the core before that though. Let's think about what exactly we mean by governance. And I'm not gonna go into a discourse on political theory. We already did that to some degree with the section of social contracts, but it is worth keeping the mind that there are different approaches available for organizations to governing themselves.

And this applies to IT organizations and and therefore to AI analytics development companies or institutions such as colleges or universities that use or can use AI and analytics. So real andross identify six kinds of governance classifications what they're doing. So getting beyond the idea of, you know, centralized federated decentralized and their drawing out a classification of governance types.

It's more appropriate to understanding IT governance in particular. So, here are the six classifications. First of all, business, monarchy IT decisions. Are all are made by the CEOs of CTOs or whatever the CIOs. It's rule from the top. And we've seen that a lot of enterprises. Similarly, you have what might be called, the, IT monarchy.

So a lot like the business monarchy in the sense that all the decisions are made at the top, but in this case, the top isn't the business side of the operation. It's the IT side of the operation and corporate. IT professionals are the ones to make those decisions very often.

If you have a separately managed or sorry, essentially managed and relatively autonomous, IT department you'll get something like an IT monarchy. Well, the use of the term monarchy for these types of governments is revealing, in that monarchy isn't the ideal form of governments, and most nations around the world, have abandoned it, and there's a variety of reasons, the inability of the monarch to comprehend.

The complexities of managing the state. The idea that power corrupts and absolute power corrupts absolutely the equation of the monarchs interests with the interest of the state or, you know, as was it Louis, the 15th famously said the tests and why the state it is me. Another type of governance is what might be called feudal.

And we see that a lot in colleges and universities where IT decisions are made by autonomous business units. So, instead of a single centralized ID department, IT department, you'll have one for the business. Faculty one free humanities, one for the sciences, one for the arts etc. And you see that in the corporate environment as well at an RC.

For example, we're dividing into various institutes or today they're called research centers. And previously we had separate IT, governance for each. One of these research and institutes or research centers, Then there's the federal model which is a hybrid hybrid decision making process. And that's kind of what we have now adminarcy, where we do have a centralized IT function known as shared services.

Canada, who is never mind. But also we have staff working for specific research centers and they make, you know, have a great deal to do with IT decisions themselves. So, we have something called kits, and our research center.

And IT duopolies you basically have. It's it's a lot like a monarchy but it's like a dual-headed monarchy thinking, you know, the Austria, Hungary and Empire where you have the IT executives. And one business group, say marketing making all of the IT decisions. And finally we have what is called here, anarchy where each small group makes decisions.

And in MNRC's case would be like each work team makes its own IT decisions or even each individual makes their own decisions in a college or university would be like each department, making their own decisions. And so you get situations like we've seen before with learning management, where one college university might be supporting eight different learning management systems on campus and things like that.

So, what's interesting about these is that there's almost no intermediate point between the types of monarchies and energy. We don't have anything like the republic. We don't have anything like a democracy. It's either power and control or nothing. And I find that an interesting gap. When we're thinking about governments and this interesting gap I think is going to come back to bite us, okay?

So now let's look at governance frameworks generally. Now again these are more than just checklists, right? They're going to involve checklists of various sorts and workflows of various sorts but they're doing a lot more besides. So here's a quick summary. A governance framework will structure and delineate power and roles except for the monarchies which case all the power of the center.

But even so the governance framework will say that and we'll set rules procedures and other guidelines. So that's the the legislative branch if you will. And then it'll defined guide and provide for enforcement of these processes we'll call that the executive branch. And then most importantly in this is the big difference between what we've been looking at.

So far, it'll be shaped by the the goals strategic mandates financial incentives and established power structures and processes of the organization. So for example, if you look at this corporate governance diagram here and I don't have a source for this, I'll explain, I found the diagram on Google image, search and I went to the website where the diagram was supposedly from, I was presented with a screen that said, Firefox needs an update, download up right now Firefox.exe and there's no way it wasn't a Firefox website or anything like that.

So there are obviously trying to get me to install some malware, spyware or virus or something. So I'm not gonna pass that link on to you, obviously. So here, here are some of the, you know, the mission division. The objectives in the strategy that might shape a governance framework transparency and accountability.

For example, the board and supervisory, responsibilities the values and ethics thought of generically policies and regulatory framework obviously a monitoring and internal control required by law for for audit purposes and then risk and performance management. So, all of these different kinds of values are coming in and shaping the corporate governance.

And then the corporate governance is realized by the four points that or by the remaining three points that I've described here. So, in the context of a data governance, framework specifically, here comes our usual. Two o'clock train and it's doing the same sort of thing except with respect to data governance.

So here's the values in the mission part at the top feeding into it and then we're going to see the remainder of the important aspects, organization roles a roadmap or a plan for the future. Quality measurement policies and standards just like before and something new a business glossary. So but I thought there was another slide there, I'm sorry.

I wanted to linger on business glossary because a couple years ago, we and NRC did a fairly in-depth data governance analysis for a federal department. And so we went in and one of the main things that we did was to to draw an update their business glossary and the specific contribution that I made there.

One of them was to analyze all of their public facing materials on everything, the report and the minister public facing publication documents, etc and list. You know, anytime they talk about data any evidence that they use or any statistics that they quoted, I pulled out what they call it and where it came from.

And what I discovered was a mishmash of cross category, inconsistent, vaguely defined terms. And one of the realizations here and that they had is that if they're going to talk about the data that they have, they need to have words to talk about that data. And in a sense, I'm always passive him to say it this way.

In a sense, they need a shared vocabulary, and I don't like saying shared vocabulary because that's not really the concept. I mean, because everybody interprets words differently. So, maybe what I should say, is they needed a vocabulary? Anyhow. Okay, let's look at some aspects of these data governance frameworks that are really important and really worth pulling out as being informative with respect to the ethics of analytics in AI.

So one issue that comes up is Dion D anonymization and and I saw another article about that just today where somebody is arguing that in social media. There can't be any more anonymous comments. And here we have Pernell Trianberg saying, almost the opposite. We need. We must prohibit the D anonymization of anonymized data and only me.

He's a tough one and it's not clear necessarily which way we should go with respect to ethics on this and probably it's going to be the muddy middle. But, you know, you had for example, Facebook at the beginning of Facebook time, with the idea that everybody on Facebook should have a specific and real identity.

And they tried to enforce that for many years and eventually, they just gave up and allowed people to have multiple identities. But in a lot of environments, a lot of applications being anonymous, it's just not an option. Like, you know, you don't get a university degree anonymously. For example, you don't get our drivers license anonymously.

But on the other hand, in a lot of cases, anonymizations, very important. For example, if you live in a fascist state and you want to write a criticism of the government, they really make sense to do that anonymously. Or if you're a member of a minority group, that persecuted in your country and you want to write about the affairs of that group.

And again, you would want to do it anonymously. Now in either case, are you doing anything? I'm ethical. And in fact, you're probably contributing to the overall ethics of society. But being anonymous is protecting you and that and then there's just plain privacy. And yeah, there's been more discussion of privacy than I can shake a stick at over the years.

And this is not going to be the final word on privacy, but there is a presumption on the part of many efficists that people should be able to go about their day today, lives in private without being watched and the reason for that the reasons for that are varied, they include not being targeted for no particular.

Reason by law, enforcement agencies to be free from manipulative advertisements based on a knowledge of your interests and preferences to undo political influence by micro targeting. Advertisements to people of your specific demographic while saying something different to someone else. Three, simple examples. Now, Dean anonymization is the process where you take multiple databases.

Here's one, here's one. Here's one. You cross reference them in order to create individual profiles based on the partial records from each of the three databases and we've seen in analytics and AI in our discussion of that. That AI greatly eases and accelerates this process. Greatly augments this process, making it possible to use partial data to identify whole individuals.

Now, a couple of things here, first of all, they might get that area notification wrong, you know, cases of mistaken identity could be generated by AI but even where they don't get it wrong. There are cases where these identifications could be misused and this takes us all the way back to the types of issues that we have with AI and analytics.

So literacy argument that data should not be anonymized. Now what we have here from Tranburg is an effort to make it illegal so that's taking us right out of the data practices area and putting it right back into laws and regulations back up the staircase. And I find there's there's an awful lot of pull trying to drag us back up to the top of the staircase.

And you know the sort of question here we have to ask is not simply, is this ethical is the anonymization ethical the sort of question we have to ask is what sort of risk is involved in. DNI, the anonymization. Is it enough of a risk to make it something that we should regulate?

Or even prohibit under law or is it not or are there ways, we could subdivide it. So that in some cases it is in some cases it's not. This is an aspect of data governance and we we would probably want data governance practices whether or not it's made illegal, we'd want our data governance practices to have some kind of basis for a determination of when and how and if data can be anonymized Another aspect of data governance that doesn't come up so much in the checklists.

But shines through clearly in data. Governance is the discussion around policies, roles and teams. Now policies are a lot like ethical codes and the sense that they're really trying to do perhaps too much with two blunt and instrument policies. Like codes are sharp, I just said, there are blunt instruments, whether blunt instruments, with sharp points and they tend to to cleave practices in predefined ways that are not always appropriate to the circumstance, But still there are areas of policy roles in teams that a data govern.

Framework ought to describe. And if not deciding what they are exactly. At least suggests that these are things that need to be managed or considered, for example, the data management and classification policy on what basis are you going to classify data? We look at a number of the variables involved in classifying data in the previous module.

Now, we come to the point of the question. On what basis are we going to classify data? Is it going to be to do with business models, departments processes value and benefits, or some sort of ontology? It's hard to say, right? And it's going to vary from case to case.

Similarly the need for a data manager's team, consisting of representative from various departments, presumably some sort of mechanism for data requests which is required for openness, transparency and scalability among other things. And so values here would be it being responsive timely and consistent data culture and helping people value the ethical use of data.

And we're going to come back to that in the, the next set of videos. And then, finally, everyone's concerned data security, another aspect of data governments, governance is informed consent and like anonymization and DNonization. This is one of these areas thats merkey and fuzzy and not really amenable to a simple principle and probably better governed under a data governance process.

Why? Well, because as a nuclear hack said in 2020, quote, the principal of informed consent is dead. You might think. No, no, no, you can't have responsible data management without informed consent. But think about it, think about it. First of all, if one person gives consent, many people are affected.

I raised the case earlier of the surveyor who wanted me to give information about, Andrea not my data to give them. There's also the case where somebody submitted their DNA for analysis and the analyzed DNA was used to convict the person's brother on a crime because of the similarities of DNA there.

Many cases like that. Secondly, and I think this is true. It's unlikely. That anyone is truly informed. When they give consent, we give consent with our reading, the terms and conditions. It's just a box. We check. And even when we read the terms and conditions, I turns out that the law will interpret.

Many of these terms, many of the words differently than the average person would. And even then these terms can change without notice. In fact, that's one of the terms that these terms can change without notice. I then when the company is acquired, all bets are off and you cover it under a completely new set of terms and conditions.

And that's what's happened to me. Actually during this course, mailchimp was acquired by intuit and so the terms and conditions of male chimp changed, would basically known notice halfway through the course. Then third consent is meaningless without the ability to opt it. And so, you said, well just give them the ability to opt out.

Well, put yourself in an educational environment where you're taking a course, you need to access the learning management system. You need to provide your personal information to that system and possibly your credit card or banking information. So you can access resources plus your transcript, from another institution plus, a range of other relevant information.

Like your phone number, your email address, your Twitter handle, whatever, tri-opting out of that and still graduating the class. It's not going to happen. You know, participating in this data governance, environment is a condition of getting your education. So there's no meaningful object for consent in such a case.

And if there's no meaningful object, then there's no meaningful sense in which consent has been given. You know, it's funny. And again, like, sit on a research I think sport and in pretty much every ethic or ethics application for research. We see questions about, you know, was consent obtained.

And there's always that little condition in there that allows for can the need for consent to be waived if there's no practical way of getting consent. And it's funny how often there's no practical way of getting consent. So, rue hack argues for an alternative sort of collective consent, in which our rights are managed by fiduciary.

Someone with a legal responsibility to look out for your interest rather than their own and arguably, arguably and the role I play on a research ethics board. Is that fiduciary except where we don't have fiduciary responsibilities actually at all.

But there is the need for something like that. But now now we're in the situation here where without really practical mechanisms for informed consent. We're being pulled back up the stairs. Towards some sort of regulatory framework maybe similar to the European GDPR. General data protection regulations in order to manage the data that will basically be taken without consent from people.

This is a tough issue. We give lip service to the principal of the principles of autonomy and consent. But in order to participate at all, in a modern information, age economy, your data is going to be used, whether you want it to be or not. And so I think we need to be looking at more than simply a principle of informed consent and more.

Well, first of all, this broader framework of data governance at least to handle the worst case scenarios. But we're also going to need more of a sense of ethics generally on the handling of data. And if that troubles you, it probably should. Because now what we're doing is, we're relying on the ethics of the people who are doing the data collecting, and the marketing, and the spying, and all the rest of it.

And I think most people don't feel very comfortable about the ethics of those of those companies but keep in mind the different models of governance that we have, we're not working in democracy and the people who write these rules either institutionally or for that matter, across government are very often, the people who are themselves managing the collection and use of that data.

So I don't think we can depend on the regulations. I think we have to depend on the ethics and it should give us pause another aspect of data governance since once kind of on a different time. But I think it's relatively important is the concept known as single source of truth.

And I said the word truth and in today's day and age of false information and fake news. No word has been more maligned than the word truth In the field of database management. However, this is a technical term but it's also really critical advice and the idea of a single source of truth is that for any given data point in your database There is one and only one source for that data and that source is considered authoritative.

It is the single source of truth, you might think. Well let's gonna be pretty obvious, isn't it? But take something like say a person's name. A student in one of your classes saying. I think about how that person's name ends up in the database. Well, there's their basic application in the application process, the document, they submit when they actually register for a course paperwork, they submit if they're going for student loans or grants or some other kind of payment process the name they provide to the professor or the teacher in the class.

They mean, they put on their assignments, the name they use when they sign up for the learning management system, etc. We can think offhand of half a dozen to a dozen ways. A person's name, could end up in the database. And the thing is, if two or more of those ways, end up as being ways for those for that name to end up in the database, you have the possibility.

Now of that name existing in two places which would be fine except you hit the question of what happens when the game is different. And one of those places then in the other common case where this happens is middle names, right? I simply refer to myself as Stephen Downs but on some formal documents.

Like my passport, I have to use my full name Steve and Frederick Downs. All right, so out there in the world, there are documents with my name. Steven Downs and documents with my name. Steven Frederick downs on them. There is a possibility for difference. Also, when I give my name to people, if there's signing the up for something or whatever I say steaming down say, right?

STEVE and downs. DWMS, but my name is spelled STEPHEN DLW and ES. So there's all kinds of ways for my name to be. Misspelled again, possible possibilities of different records. This is bad because the system might think there are four or five people where there's only one it's bad.

Because if Stephen Downs, with the V is trying to get records about Steven Downs with a pH, those records, can't be found or accessed. I've run into situations where Steven Downs as registered through the Canadian version of a site Xbox is not the same as Steven Downs, registered through the American version of the site.

Xbox with the result at all, that's really bad and expensive and and made me not use any Microsoft games for years and years or years. So you need to single source of truth. Well, what does that mean? Technically technically, it means no duplicate data entries or version control issues, technically it means timely data values at the right moment.

Technically it means reduced time, spent validating records and data types improved data, warehouse, and intelligence and improved, communication and productivity. But how do you pull it off? Well, it goes. Back to your data glossary but it also goes back to the time. Consuming work of identifying each piece of data that goes into the system and asking yourself where did it come from.

And if there's a place that it could have come from, that is the signal source of truth, that's where it should come from. What that means in terms of things like interface design is picking, from a drop-down list instead of always typing in the name, or providing a single code, that will always populate the name and other fields, informs things like that.

It also means though that is definitely not anonymized and it opens the door to wider uses of data without consent. So again, hard a lot of the stuff on blockchain that you've heard about applies, precisely at this point where the single source of truth also needs to be unulterable in incontrovertible, you have to be able to count on this being the information of record and putting in putting it in the blockchain is a widely recommended method of doing that.

The other thing with a blockchain approach is that it's supports not just a single centralized database but a distributed database because in an environment where there are multiple data managers like say, a network of institutions or perhaps a vertical cluster of service provider application developer institution, jobs board. You know, that all of these separate databases and separate enterprises need to be able to refer or reference, a single source of truth, about that person's name, and that person's information Again.

They'll privacy is a lot harder. So you kind of want it to be managed by the person in question, but that's kind of hard to do as well data. Ethics issues are going to get involved with truth, kinds of issues. Now, one of the things that I said in my e-learning 3.0 course that I'll reiterate here is that as time goes by, we will define community as consensus.

And what I mean precisely by that is that a community will be defined by the single sources of truth that it uses the mechanisms around that for establishing, those mimickingisms around that for storing those and retrieving. Those typically decentralized database kind of network along the lines of blockchain or some other type of consensus algorithm.

And that will, I mean, that's going to have a long term, ethical impact, you know, when community becomes consent, what we believe together is, what defines a community. And how we shape those beliefs, how we obtain those beliefs, who can alter those beliefs become huge, ethical issues. And that's why I think the European GDPR is.

So on point with putting control of information about an individual's information in the hands of that individual. So that the single source of truth about a person, ultimately becomes the person themselves. But making that in ethical principles, kind of hard and implementing that in practice is even harder. And it needs to be supported.

Not just with regulation, not just with data governance frameworks but with a wider sense of community around this concept which is where we will get into culture in the next set of videos. There are various tools available for data governments. I won't go into them in detail. I got a list of them at the bottom of the slide and IO Tahoe, IBM's Watson and Cognose, which is what we worked with informatica another's and they will provide these various functions that you need the glossary data discovery stewardship, reference data by that.

For example, you know, the list of provinces in a country, the list of days of the week, the list of zip codes. Stuff like that. Where's all that coming from policy management, David quality tools? Which I really haven't talked about much at all and of course, data mapping to draw organize ways of organizing and connecting the individual data elements.

Finally, all of this is a diverse up. We looked at the workflow and artificial intelligence and analytics, and data was one step in a multi-step workflow and the ethical governance and management of AI analytics is much bigger than data governance. I agree that you know things like having unbiased data diverse data, etc is important.

But as Juliet Powells and Helen Nissenbaum. Say, solving for bias or recognizing and acknowledging bias can be seen as a strategic, concession, one subduced. The scale of the challenge, there's so much more to ethics and AI. Then data bias, there's more in data. I mean, data bias doesn't even talk very much about source of true, single source of truth where we're going to locate it.

It does talk about issues of consent and anonymity but often makes the the single handed claim that those are good things without considering whether they're even possible things. And overall data governance is going to include. As we've seen just a huge string of different decisions related to the management, the governance, the organization, the structure, and the source of data.

So yeah, it's nice to talk about AI bias and by his data sets and all of that. But if that's the, some total of the AI ethics initiative, then it's missing most of the issues involved in ethics, analytics, and the duty of care. So, I'll leave this discussion with this line, and we'll move on to part four of ethical practices.

In the next video, I'm Stephen Downs. Talk to you then.

Force:yes