Testing and Application

By Stephen Downes
Dec 13, 2021
Transcript of Testing and Application

Unedited audio transcription by Google Recorder

Hi everyone. Welcome back to ethics analytics. And the duty of care. Still working our way, through module seven, hit into a home stretch here. Module seven is the decisions we may make as we apply analytics and AI in learning technology. This video is about testing and application and we'll look a bit at the huge role at testing an application plays in the development deployment of AI and analytics solutions and learning.

It's not something that's often. Mentioned by people talking about the ethics of AI, but I think it's probably one of the most significant aspects of AI and certainly one, where a number of ethical decisions come to the four. So to begin with this, let's look for first at just testing and application generally.

Here's a sort of a broad. Look at some of the things that are considered, not just in AI machine, learning-based systems. But in applications generally everything from configuration to data collection, feature, extraction and verification analysis, tools infrastructure, monitoring and the rest. There are some differences that are fairly significant between AI and analytics applications and your routine software applications.

But over the years for regular software applications, and then, for AI applications, the significant testing infrastructure, and methodology has developed and it's worth taking. Look at that to begin with interesting. We want to think about the object of the objections, the objectives of our testing protocol, this is a fairly typical process here.

It's it's reflected in different formats. So the idea of course, is to prevent defects to evaluate the work product, maybe verify requirements. Build confidence in the application resource, reduce the risk of using the application that of course, to find failure and defects. There's a whole bunch of different models to talk about different areas, different objectives, of testing, the valuation of software.

In general, there's an overall approach. And it's interesting to note that the overall approach to testing software applications is very similar to the overall approach for the actual use of these applications. First, you need to define your goals. Perhaps identify key performance. Indicators collect the data in this case.

Now to be the testing and evaluation data andalize that data perform, some test alternatives, which will talk about and then implement whatever changes our required. By the results of the test testing can be depicted and what's called a V model for testing. I've produced here that the double the model is a single BV model, which is some simpler.

And I even found a triple V model which I thought was probably a little bit too much for our taste. Basically, what the TV model does is it begins with what's known as a waterfall development framework and then work from that. Now the waterfall framework you sort of think about software development as a waterfall, it flows from user requirements to system requirements.

Then to the architectful models, component design, and then unit design. And so, there are tests that follow all of those. But then for the overall software testing, you go up the other side to form your V. So when you're you've done your unit design, then you'd be unit testing once got your components developed, you do component testing, and in particular component, integration testing, Similarly with the architecture, you're now looking at subsystem testing, then system testing.

And then finally we get back to the users and we're looking at acceptance or operational testing This same process is going to be similar to what adopted in an AI or analytics framework. And perhaps it won't won't be structured as a pure waterfall because in a lot of applications today, a much more dynamic or agile software development methodology is used.

But nonetheless, all of these testing steps are going to be required at some point for another in the process and becoming more iterative in software design is reflected in something. More iterative. In software testing, one of the things that makes artificial intelligence and analytics distinct is the necessity of testing data others.

Again, a huge industry devoted to defining the data collection storage and management process and I've kind of illustrated the major steps of that. In the diagram here, I've kind of illustrated that borrowed the diagram from the the web page here. The basically you start with source data you be.

You drop all of that into a data warehouse, which me can change a data lake, which is a whole pile of undifferentiated data, which is then divided into data pools, then it stage and presented in what's called a data mark or sometimes cubes and output in the form of reports and statistics.

Now, the analytics and AI process takes advantage of this data flow and can actually pull data from any point of this. But the point here is that all of the quality insurance quality assurance or data validity, metrics that apply for data management. Generally also apply to data management for artificial intelligence.

In addition, AI and analytics. Look at what may be called. The six Vs for data data volumes where your testing for semantics and processing scalability. Data variety, here. We're looking at different types of data, different types, of objects, how those objects interrelate data federation, which is to say data that's located in multiple places or a variety of locations and perhaps in a variety of formats data velocity, this is real-time data.

How real-time data is coming in? How data can be tested, as they say on the fly? How real-time data can be integrated into the data process and then on demand storage, in other words, a mechanism for bringing in and storing data as it arrives. And so you increase your storage capacity as you increase your data, there's also the validity of data here we apply rules for, you know, simple example.

Making sure that you're calendar. Dates have a month day and year of making sure that your telephone numbers have the correct. Number of digits, making sure your addresses have a street name, maybe about a pause or a country. If that applies things like that. And then, of course, removing invalid data variability and data, not all data.

Comes in the same formats, I just mentioned dates, we have the European system of dates versus the American system of dates, where the month and the day are transposed. There are also wide variations in data regarding addresses phone numbers. A lot of systems standardized on a North American model but that in a global world would obviously be a mistake things like that.

And then finally the veracity of data is the data accurate, is it a true reflection of what it purports to describe is the data? For example, on sales and accurate reflection of the same else, there's the data on grades or marks the actual grades or marks that were submitted by the instructors.

So, all of these need to be tested. Once we're into the actual development of a learning analytics application. What were in the process of is making requests of that system. Now, remember from previous episodes that the AI model is first developed or trained with data and then in practice we typically feed it some new data and then get the results back.

So we need to test for all of these stages or at least you know, that's best practice. So for requesting, it's important to ensure that the correct data is being collected. This is by the application that is going to send a request to the AI system. It needs to make sure that the format of the request is correct.

Typically a JSON data. Object would be used to send the request to the AI system. This needs to be validated with a JSON person and again, needs to be checked to make sure that all the fields are all the data variables or field. Elements have been properly filled out and then to ensure that the request is properly sense, but actually gets to where you want it to go.

And then you get the response back process also has to be validated in general. Again, you have dynamic data, what they calling. The data world is, crud, create read update, and delete. Now, delete is cruds least favorite application. You don't want to delete data, ever. If you can avoid it a lot, you know, one of these designed decisions.

But really, for track ability for reliability, it's normal to mark a piece of data as deleted, but do not actually delete it. Not of course, that's implications for things like general data. Protection regulations are the European GDPR, they're also checking for things like duplicate requests. I've had that happen to me in my program where I read a little subroutine and then for some reason, I'm calling it twice.

I don't know why I'm calling it twice, I get the exact same answer back both times. Obviously sometimes when you do that, though, you get fascinated results rather than a workplacement result, which, of course, breaks the system missing requ. If your AI system isn't responding to every request that is sent to it, that's obviously a problem.

And then obviously cross browser functionality and cross platform functionality, does it working different browsers? Does it work on mobile devices, etc? All of these may sound really picky, but all of these player role in how an analytics and AI system performs when it is actually used and therefore how it continues to collect and analyze data and application that rolls out and is broken for whatever reason is going to produce, inaccurate predictions, or projections, or categorizations, and then these would be carried over into the eventual use of that application and will result in consequences that may be harmful the application itself requires common every day software application.

Testing processes again, this is very well covered in the field so there's not a need to go into this in depth. But again, from the V model that we saw before, unit testing integration system, and acceptance testing, and then also there's non-functional testing. For example, performance, how fast is it security which should be obvious usability and then compatibility again with other software with various platforms etc, there's nothing more annoying than trust me.

I know that and AI application about will run because you've got some other software that's incompatible installed when system. I mentioned usability testing, and this is going to apply, not just to the actual system that creates the calls here applications. But but also with things like dashboards and other systems that present, the data, or present the results of the analytical process In usability, it's very common to use what's called A/B testing, and that's where you present, two different versions, a control version of your user interface, which is generally what you're using now.

And then an new version, a variation, and you compare the results and you compare them over a period of time. Because yeah there's a certain comfort with the existing control version and that will be reflected in the testing but then over time you might see the new version getting a better response rate, whatever that means.

And again, you need to define that better response rate and that would suggest a reason to change your interface multi-variant testing is similar to A/B testing, except of course, you can test, multiple variables. What's interesting in all of this discussion of testing is how much of it, depends on what you value, as you enter into the testing process.

We began remember by what the objectives, the testing objectives were and we also go all the way back to what the objectives of the analytics system is itself and then that's what tells us, what we're going to be testing for there are some common parameters, like, how well, it functions are the coding mistakes.

Is it dropping data when it sends back and forth? Which are, you know, independent of any particular ethical perspective, you may have or, you know, maybe some people like the random software results ethical respective. The most people don't but others really do come into play, for example, what your attitude is toward the user might have a lot to say about how you're going to evaluate user acceptance testing, who you think.

The stakeholders are, who is a significant factor, you might not care, for example, how students react to the application because they're not the ones paying for it, they will just have to adapt. You may care a lot however, about how the instructors are able to use the application in order to learn about their students, these kinds of decisions coming to play at all steps of the process.

The end report that I mentioned that before, the usability aspects etc. One of the best remarks that I read in this automated testing guide is that, if you're testing at the end report stage, you've probably started your testing too late, but the time you get to your end report, most of your testing really needs to have already been done.

And your report is the presentation only of fully validated and fully tested, andalytics applications. Now, of course there's a you know the the usability of the report itself and there are going to be aesthetics which are of a concern and also to there are ways of presenting data and statistics that are more or less misleading ethics can apply here, but it's not so much.

A matter of testing, as much as it is, a matter of ethical decisions about how you're going to present your analytics. Are you going to do that in an honest fourth right? Way or as a so often the case, are you going to do it in a way that serves your best interests?

There's I don't have a slide for it unfortunately, but but I think I could find it. Can I find it quickly right in the middle of the presentation? I'll bet you I can. So let's do the hacked. Okay, so I'm going to go to my pocket application. So just pop into that quickly and all right on top, how about that?

So this is something called simpsons paradox and it's as as the slide here says, it's a problem to statistics where trends appear in different groups of data, the disappear or even reverse when these groups are combined. So here we have data grouped into pink, and blue, presumably women. Although it could be any pink and blue.

And you see this nice upward trend in each of them. But if I were to combine all of these into one piece of data and ignore the fact that some are pink and some are blue, I got a very clear downward line almost perpendicular to the upward line. That was actually described by the data.

So we see this in our document called a nation at risk, the imperative for educational reform. And here are the scatter plots that were used to show that the education system is declining, but look what happens, right here we have two groups and you can see it's actually not declining at all.

It's actually improving quite a bit. But if you eliminate, the distinction between the groups and yet have the exact same data, you can make it look like it's declining. This is a concern because as it says here, you know, it was shown that, you know, the data were declining.

It resulted in a whole new approach to education. And here we have the cartoon discussion of what happened. Quote? Yeah, yeah I heard it all before I sleep through class, I don't study and never do my homework with teacher, merit, pay bill passes. Then we'll see who's F. That is and quote and that's what happened.

The report led to the creation of no child left behind which tied education, funding and teacher evaluation, it's actually to standardize test scores, etc. All based on a very particular representation of the data. So that's the sort of thing that can happen when you're doing data analysis and when you're not careful to present the results of your data and a fourth, right?

And ethical manner, what's really significant here, is that, this can still happen that now it's happening when it's being done by an AI algorithm. And so if you're only noticing something like that, by the time it gets to the end report. In the end report says something like oh education is declining and you know from your own observation that you shouldn't be reaching that conclusion.

It's probably too late to fix what the problem was the problem isn't in the report. The problem is way back when you segmented the data into one group instead of two groups and well we did some some slides on how the data might be segmented, how the data might be clustered and how many clusters you decide to have with your data?

Well, that's a direct bearing on what your end report is going to say, just to know, a lot of testing is currently done by hand or manually, but large volumes, which is what's going to be needed, if analytics are going to be deployed to any great degree in the education system, what will be necessary is automated testing, and the whole process starts again, right?

Because you need to design your automated testing application, you need to be sure that, you know what your automated testing application is testing for how you're collecting your data from your application or you say matching it properly, etc. There's a whole and greens and rings of material on automated AI testing and it's interesting, you know I haven't seen discussion on the part of AI ethics about the ethics of automated AI testing.

Presumably all of the same concerns apply and more. You know what is it? That these automated testing systems are going to test for. How is that? Decided, whereas the transparency in those sorts of decisions, usually they're designed by software engineers for software engineers. And it almost wouldn't make sense to have one public input into automated testing.

Because what's the public going to be able to say about it? And not, in the last automated testing is what's determining whether the actual software applications are passing the tests and are considered usable by the wider community. So there needs to be some consideration of the ethics of automated testing as well.

Well this is this and related issues, is where standards for validation and transparency. Come in I thrown in a diagram from IAB. Of course there are many standard organizations ISO Canada. Standards council. Many more. We standards define how rigorous the the testing needs to be if the analytics is for a process where there is considerable risk to the individual.

For example, medical procedures, then the rigor will be higher, but if it's, for textbook recommendation, it's hard to imagine that the standards would be as rigorous generally, there's the requirement, which is tested for by the standards body. That data is not shared with identifiable. Data is well there's a need to be clear about the use of variables such as age, gender primary payer to US system, right?

And again, it's a health system. So inpatient, utilization blood pressure, it's cetera, but would be interesting if they did blood pressure tests for educational purposes, but that's the sort of thing that you really want to be transparent about, right? If you are testing students blood pressure before, and after exams, say, and they, using that in an analytics process.

That is something that should be transparent. People should know that the data is being used in the beat, even how it is being used. And then, of course in education as well as health care because of the potential of conflict of interest. I love the way they say this in the health of their document scientific peer review and independent validation are desirable.

I'm going to include under testing and application something like an outcomes assessment. This could also be categorized under the heading of evaluation which I'll talk about later on. There's a distinction to be made between the testing of the software to make sure that it actually works the way it's supposed to work and then the evaluation of the software which describes whether the software is doing the sort of thing that you hope that would do.

And that ladder question, will be the subject of the next video and I'm focused more right now on whether the software does, what it is designed to do, but it's nonetheless relevant to talk about outcomes because part of what is designed to do, especially in education, is to improve things like, say learning outcomes and the actual impacts that it has on learning outcomes is a significant part of the testing of any AI around a solution in education.

The reason why I put that here is as we'll see often times the testing and the development of the software. Go hanged in hand. We'll talk a bit about how that works. But what I want to say here is that there are various kinds of outcomes and it's important that developers and and implementers look at, and take into account, all of these different kinds of outcomes, the disruptive educator paper, where I got this particular diagram or sorry.

The the health affairs, the document though, I've been calling throughout here, talks about three types of outcomes and I've kind of translated them so that their education outcomes. But you get, first of all hard, what they call hard and points they talk about readmissions and relapses etc in educational like talk about readmissions or having to take the grade again, failures exam failures, grades stuff like that, then there are secondary outcomes such as care, trust, anxiety, and activation.

And then third provider centered outcomes. These are outcomes in our case related to teachers instructors professors etc. So what their workflow ends up looking, like how satisfied they are with the product, and with their work, in general. So these comes RSS using various assessment methods. Presumably by the learning analytics tool and they not feeds into teaching and learning activities.

When doing outcomes assessment, it is arguable. And I would argue that multiple outcomes should be considered you see. So many assessments of learning technology based on something simple, like, course grades are in the, in the world of moves completions very simple, very one-dimensional, hard outcome to use the taxonomy from the previous slide, but it's important.

I think to collect multiple outcomes in the health affairs. Document occurs because each outcome might tell a different story, each outcome presents, a different picture of how the analytics or AI system is working in this particular educational environment. Maybe stores are up but maybe people hate it. For example, it's interesting.

Indeed that when we look at multiple outcomes, we can look at multiple applications in multiple contacts. And the key question that comes up here, is whether the model, the AI model we've talked about that. That is developed can be used in multiple contacts or whether you need to rebuild the model each time.

You apply it in a new context. Obviously, the second option is a lot more expensive and time consuming, it doesn't even seem like AI makes a whole lot of sense in that kind of context. And that pressures, I think a lot of people to say, well I'll just take this model that was developed here and I'll use it over here whether it can be reconfigured to local contacts impacts were reliability of the model and the utility of the model and raises questions as to whether it should even be used in these alternative contacts and then again how you're making that decision on what basis you're deciding.

But the model is, as they say transferable in determining, whether a model is transferable. Again, it's important not just to be measuring the single outcome, but looking at multiple outcomes and ensuring that different student voices, and instructor voices are heard to be extent that you're taking in these considerations.

You are making, I would argue ethical decisions. There's, you know, the look at the different levels of assessment and types of assessment that can be made of students and technology in an educational environment. Now, this is looking at student assessment, this diagram in particular, but we can apply that diagram as well, to learning analytics applications.

We can measure the knowledge. That's developed the skills, the attitudes and values. The behaviors. And the anything on blooms, taxonomy for that matter, from various perspectives, whether or the individual or, or of the group. And whether from a teaching perspective or whether for an accountability perspective, In other words to use the terminology of the field formative assessment where we're trying to inform, and help the individuals or groups or summative, assessment where we're trying to evaluate the individual or the group.

And that seems sort of process takes place with educational software as well earth. The other thing about testing, the software are the individual settings where the software is tested, is very common to test soccer applications in a lab using artificial or generated data, and other tools out there that will create and reads of test data that you can use as what and most are useful for scalability, tests performance tests, etc.

But when you're testing the model itself, you need to make a decision about how real the data is that, you'll be testing it against, whether it's simple forms or shapes or characteristic shapes or actual real objects. In this case, right? This would be for a machine vision type of thing or something like that.

Similarly, you're going to be cooking, you're going to be making decisions about the fidelity or the accuracy of the testing environment for rainy analytics, or AI application is test tempting, isn't it to want to test it all the time on real people in a real educational environment but questions come up right?

What about consent certainly in the GDPR world in Europe. It's arguable that consensus is going to be required but the health affairs document which talks about testing of AI solutions. In the health context suggests that I quote, it is unclear, whether explicit consent to the use of personal data in predictive analytics is legally orthically required and that seems like an odd conclusion to draw but look at the reasoning.

First of all patients might not even be aware that their physicians are using computerized decisions. Similarly students might not be aware that their teachers are using computerized decisions. Secondly, if they could ought to out of those systems that might give them priority over other people. You can imagine how a student opting out of a computerized decision aid that the teacher is using, would then, as a result, get better, and more personal scrutiny by the teacher and that could give them an advantage.

Could also give them a disadvantage, really hard to judge. And then third the institution under and Mrs. Quoting him, the institutions under consideration should be required. To explain whatever predictive analytics development, can evaluation there and you're going and the rule, and the likely benefits and risks. Now that's kind of a generic thing and it's kind of like saying at least as I'm interpreting, this tell patients or students in this case that you're using predictive analytics they're under development.

And there is testing taking place. And here are the risks if any and then you don't need to worry about getting their permission per se because of these other factors. If you think about it, a kind of makes sense, he might. So but doesn't make sense. But here's how I use in my work and including what I'm teaching and application called Microsoft Word.

Now, this application is constantly being a tested by Microsoft and that includes, you know? Every once was, you know, do you consent to send back test data back to Microsoft? That's a. Yeah, sure. Because I want Microsoft to have a better product. Now it would seem on today for me to them have to turn around and tell my students.

That Microsoft Word is being tested by the developers of Microsoft Word. The reactions should be something like, oh yeah, of course, they are the be surprised to stay. Weren't that could also be the case using Excel Excel's spreadsheet program. I bet you recording their grades on it. I could be using it to do statistical calculations about trends in my class and excel is constantly being tested by Microsoft and it's being tested using real world data.

But again it doesn't seem that it follows that there's an obligation for me to inform my students that Microsoft is testing Excel. Of course, they're testing Excel, they're always testing it. So again, we would be surprised if they weren't. So the argument here I think and I'm not sure it's a bad one is that testing of AI and analytics applications is a necessary and continuous process that is going to happen.

And we can say in general the software that we're using is undergoing testing and evaluation and that includes when we're using it in this class but it doesn't make sense to have an opt-out consider an opt-out clause for that process. How do you opt out of software testing? When the application itself is being tested, anyhow but food for thought on the other hand and there is another head here.

A lot of software is not the software. A lot of research in general takes place in conditions where the users of the research are directly implement implicated in the development of the research. From this phases. Of this development First is something called knowledge translation, a term that was coined by the community.

And institutes of health research or CIHR back in 2000. They can be defined. As I quote, the exchange synthesis and ethically sound application of knowledge within a complex system of interactions among researchers, and users. And the idea here is that you're taking your research, you want to be able to realize some benefit from the research.

In this case, it's AI and analytics research. So, you translate that into practice, in our case into classroom or online learning practice and then evaluate for the benefits or presumably realize the benefits. Now that felt a bit one way, where all the research was done being done by the researchers, and all the incrementation was being done by the implementers inflammatory.

That's a great word actually implementers by the implementers and never the twain shall meet a bit people envisioned and wanted to work toward, I think reasonably a more interactive process and that's where we get knowledge, mobilization front. So this is quoted, this is defined in a shirt document, so social sciences and humanities research council of Canada as and I quote, activities related to are relating to the production and use of research results, including knowledge synthesis dissemination transfer exchange and co-creation or co-production by researchers and knowledge users, which is a mouthful.

So I added a quote, from a University of Winnipeg document. That makes it a bit clearer, a term to use to define the connection between academic research or creative works and organizations, people and government. And the idea of knowledge mobilization is that the research and implementation of the research is designed and conducted by people working in the application area and people working on the research site in conjunction with each other, and they're working together, rather than one group during the research, and the other group doing the implementation.

So what this does is it makes the development and deployment of artificial intelligence and analytics applications something that is done, not only by AI developers and researchers. But the people about whom the AI or analytics is intended to operate. This is especially the case. If the knowledge mobilization actually includes students on which these applications would be used on which the evaluations would occur.

It's kind of hard to get that. Exactly right. Because students are removing target, the analytics that you plan with grade 10 students by the time you apply them, those great tents students are in grade 12 and you're applying them on a brand new group of grade tends tools. Maybe you can design a system that grows as the students grow.

That would take a more coordinated project I think. But, you know, and that might be worth doing again. It's hard to weigh the different options and the best decision on that probably varies from context to contents. And and we arguably would know the right approach when we saw as usual.

We probably wouldn't agree on that. So with respect to application we especially with things like knowledge, translation and knowledge. Mobilization in mind, we look at some of the issues and decisions made in the application of AI and analytics in the actual classroom or online learning environment. In the first kind of question that comes up is a question of access.

And there are a couple of ways to, to draw this out and draw some of the implications. First of all, there's a risk that not everybody will benefit equally from the models. This is a, especially the case of the costs money to use them and it's a sort of thing.

You know, that advanced analytics would thus be applied at asperry collegiate where the rich kids, go to school in Ottawa as opposed to township high school, which I would is out in the country and it's easier the last to get these sorts of benefits. It's argued in the health affairs.

Document that quote, as a matter of fairness, those who contribute most to developing a model, including the patients who contribute their data should proportionally. Enjoy. Its benefits. That's a very particular definition of fairness, right? It's basically a definition of fairness along the lines of whoever pays gets the benefits.

It's definitely not fairness as defined along the lines of from each according to their means to each, according to their needs. So different way of looking at things. Also very good run and brings up the point asking what about the impact of feast costs and other factors with respect to the marvel itself and stop?

And think about that for a second. If the only people who can access your analytics model are rich, people, then only rich people are feeding in to the development of your analytical model, which means that you're analytical model is going to be designed to meet the needs of rich people.

And so even if poor people can access this model which may be they can't the model won't have been designed to meet their needs. So access barriers can actually have an influence on the design of the AI or analytics application to begin with. We don't necessarily see that so much on a school versus school or institution versus institution basis.

But with the bulk of work and the development of analytics and artificial intelligence being done in North America, Europe and China people who are living in working in other areas in the world I'm thinking especially of the rest of Asia Africa and South America. They're looking at the development to my own analytics and asking themselves or they should be asking themselves.

Are these models being developed in such a way that they can be adapted to our circumstances as well. Or are we going to have to do the whole work of developing AI ourselves as well? And then let's a global problem because that puts them further behind and international development and does not narrow.

If you will the gap between rich and coronations, Also the application of analytics has a lot to do with access and power and again this can have an impact on how the development of the annexed analytics plays out, analytics gives you a really excellent view of the data such that it's almost like you can do almost anything with it.

And this Stacy Higginbotham said in this week in Google last year, I quote, when you have near a mission, how you choose to apply, that becomes a matter of importance and to, which I would say. No kidding. Look how analytics has been applied in the field of crime prevention.

For example, where we see, not just the resources of the police but also the, the media and politicians and everyone else applied to solve the robbery of a well-off white woman, but not of a poor black woman. And if you think that's not really a thing outside the United States, we could point to cases in Canada of the.

If I may say disgraceful treatment of murdered and missing indigenous woman in this country which was for many years just simply felt to be unimportant by police investigators and so access and power and equity, and justice are all going to play into the application of a AI. But because the application of AI is so integrated with the development and testing of AI, they feed right back into how we evaluate and how we improve our AI analytics systems.

And you know, if we end up tweaking, them more and more or more to meet the needs only of those who have access and power, then it's arguable that there will be ethical uses or sorry unethical. This is of this technology in the future and I'll talk a little bit about that in the next presentation with respect to application.

There's also the human element where we have to keep in mind that it's humans, who are using all of this technology. It's humans who are applying artificial intelligence and analytics and it's human center being described by it. James, Claywright. We must not forget the human element of data in analytics, it's not enough to deliver accurate analysis predictions, and visualizations staff and students in university and colleges need to be data literate to enable them to understand and act on that data, appropriate and effective interventions will only be possible if staff and students are able to understand what is being presented to them and know what, and how they could act as as a result and quote.

Now we've seen this a lot in software deployment and subsequent evaluation where a perfectly good application or a perfectly good piece of technology is dumped on a school with no instructions and no support. And there have been stories, right? Of the laptops sitting in covers or, you know, they the training application that never gets used in the same sort of thing could and probably will happen with learning analytics.

And so when we're testing and evaluating learning analytics, it's important to take into account how these are being used in the sense of first of all, are they being used? Secondly, are the people using them where they properly inform or trained or whatever, where they properly supported in the use of these applications.

Now, in an ideal world perfectly, well, developed developed software wouldn't really require support. I can't remember the last time I had elevator training. For example, I go into the yellow bitter. I pressed the button, it takes me there, perfect. Right? But the was a time when it was hard and there was a time when we had elevator operators, I actually remember seeing elevator operators, you never see them anymore, but anymore.

Similarly, you know, ideally, you wouldn't need any support or help with an AI application, but this is all new for everyone. And so if we're going to evaluate it, we need to evaluate it in conditions which are conducive to the successful use of the application. And again, what counts or support, how much support you need, what kind of support you need, all the answers to all of these questions are going to have ethical implications because they're going to speak to how we expect the analytics to use.

And how we expect it to be used, is a major component of whether or not, it's use is ethical. If the people who are helping instructors are not helping them toward ethical use of analytics. But instead to our unethical, use, whatever. That is where our assessment of whether the technology is ethical or unethical.

We'll be adjusted or altered or changed.

Implementation errors for AI and analytics. May be caused by zeal. It's a nice word and then that's the health affairs document that I've been rare referring to used to that particular term or by pressure to cut costs may result from poorly constructed workflows this kind of a variation on garbage in garbage out except not quite it's like garbage is still garbage, no matter no, that's a bad phrasing but you get the idea, right?

If you haven't adapted your workflows to take advantage of the data and the analytics that it doesn't matter what you do with the data in the analytics, you still got an issue and now you can't evaluate the, you know, the dead and the analytics without taking into consideration, whether you're workflows, we're designed to take advantage of it, they may be insufficient consideration of client preferences, there may be inadequate checks and balances on machine decision making.

And there may be cases where AIs designed for one purpose are actually impressively used for another purpose and it's all still very as they say brittle. And this is especially the case for full fledged, what they call end to end techniques which eliminates all the levels of human processing.

So this would be for the most part unsupervised. Learning for example, a speech to text system that learns to map directly from sound, waveforms through to letter strains, right? With no intervention in between, these are as Mark Lieberman says, especially brittle. So wait now and for the foreseeable future, these applications, no matter how they're designed are going to be limited to specific domains.

And that means that errors in how they're implemented is going to have significant impact on how we test them and how they fair with respect to these tests. Finally, there's and this goes back to what James plays out about humans. There's always the implication of choice in the application of any system and when we're testing a system, this continues to apply now, the health affairs document says, and it's so quite rightly to help consumers in the model.

Both patients and providers the model, must present them with choices. Now they're different kinds of choices. For example, what data to consider, what options would be considered on it, unacceptable, perhaps demographic data that you want to put or not input etc. Another hands. Well, not all choices are free choices and the term choice architecture.

Refers specifically to a concept developed by cast sunsteam. And Richard Towler, which says, basically that decision making is impacted by how the options are shown. Now again we can do a whole presentation on that and I won't but it's important to take into account how we've presented the options for how people make choices which determine how well the artificial intelligence or analytics application work for them.

The results of which will show up in the testing that we do especially testing at the higher end with respect to systems testing and user satisfaction. So that's what we have to say for now on testing and application of AI and analytics. And again it's a broad area. You've seen how many sorts of decisions are taken a lot of what counts as good testing and good.

Implementation has already been studied and reported on in other domains of software, software development and research application and development. Generally AI doesn't does bring in its own considerations and its own wrinkles, especially with respect to data and especially with respect to complexity, but also related to the bridalness of the technology and related to the context sensitivity of the technology.

It's not the sort of thing that you can just move from place to place. The way you can move a word processor or a calculator, or a video game nonetheless, even when in those constraints, the way you approach, testing the things that you decide are worth testing the outcomes that you want to measure with the tech with your testing process that you expect to achieve as you know, software functioning as it should all of these have ethical implications.

How you see your AI and how you test for it is an ethical perspective. And it's not one where it's clearly the case that any of the ethical theories or any of the ethical codes that we've talked about really applies directly, and in fact, as I commented, most of the discussion that I've read on the ethics of artificial intelligence and analytics this pretty much silent on the testing process.

It will talk about applying AI models out of scope or out of domain, but in the overall software testing methodology, we don't hear a whole lot about it. And as a result, I think there's a lot being assumed here about what costites ethical development, ethical testing, ethical application of artificial intelligence that perhaps could stand to be scrutinized, more closely by ethicists.

And certainly should be taking into account in a complete and comprehensive, look at the ethics of analytics. And AI. That's it for this video. I'll be back with another one. On the, I mean deployment of AI the evaluation of it and the use of these applications out there in the real world.

For now, I'm Steven Downs. Talk Jay next time.

Force:yes