By Karina Gibert, director of the Intelligent Data Science & Artificial Intelligence research center, and Javi Creus, founder of Ideas for Change

(Original publication in Spanish)

Algorithms based on Artificial Intelligence (AI) have multiple applications and are currently capable of guessing who likes whom, what music you love, how much water a crop needs, when a traffic light should turn red and how to change prices in a business, among many other things that we do every day and that involve the intervention, often transparent, of artificial intelligence.

Tik Tok's proposal represents an important paradigm shift that will be followed by other social networks, such as Meta, and moves us from a content recommendation system based on individual contact networks to a system governed by an algorithm that proposes content in based on continuous testing to maximize service time.

Currently, artificial intelligence can recommend at what temperature to program the combustion of waste based on its composition, what combination of molecules can save you from a deadly disease or how, in the DGT announcement for Easter 2022, to anticipate who was going to die on the road during that vacation.

The Catalan Data Protection Authority published in 2020 a complete report reviewing cases of algorithms that make automated decisions in different contexts in Catalonia [ACPD 2020] and revealed the existence of countless cases in use in areas such as health, justice, education, mobility, banking, commerce, work, cybersecurity, communication or society.

In legal matters, Artificial Intelligence algorithms could complement or even replace (although they should not) a regulation that can be consulted, is published in the Official Gazette and can be interpreted differently by various parties, or by a decision-making body whose decisions are in most cases appealable.

In the field of law [Bahena 2012], beyond the intelligent search engines for relevant documents in the resolution of legal cases, which use AI to search for legal precedents based on keywords, there are numerous cases of intelligent systems that support to the elaboration of demands, answers, and even to the opinion of the sentences and their subsequent argumentation under different forms of AI that go from the classic systems of reasoning based on rules (such as GetAid, of the Australian government to determine the access of office to advice law in criminal and family matters) [Zeleznikov 2022], even the most advanced hybrid architectures that combine, for example, automatic reasoning with artificial neural networks (such as Split-up that proposes the division of assets and custody of children in cases of separation and divorce in the Australian Court) [Zeleznikov 2004].

A close example is the request made by the Civio Foundation to have access to the source code of the BOSCO application, developed by the government and used by electricity companies to find out if a user in a vulnerable situation can receive the "social bonus", that is, say, discounts on your energy bill. Despite having verified that many eligible applicants did not receive such aid, the claim was denied in the first instance against the report of the Transparency Council, alleging that it posed a danger to public safety (!). Civio has filed an appeal against this decision, which leaves citizens unprotected against automated decisions that do not respect their rights.

It is important to note that the European Commission has taken determined action to define a reference framework for the development of safe, ethical and trustworthy AI, and is in the process of drafting the European AI Act (AI Act) [ CE AI act 2021], where an attempt is made to ensure that this method in Europe will be oriented towards the common good, will put the person at the center and respect the fundamental rights of the people, unlike the Chinese or North American vision, where respectively the control The government owns the data (surveillance and social credit systems) or the company that owns the application that collects it (and monetizes it however they want).

In effect, already in 2018, in a pioneering way, the CE elaborates its ethical recommendations for a safe and reliable AI (TrustWorthy AI, TWAI) [CE ethics 2018] and the first of its axes is dedicated to the Human Agency and Human Oversight in a clear attempt to prevent AI-based applications from being able to make decisions autonomously. In other words, the place that Europe proposes to reserve for AI-based applications is that of an intelligent assistant for the user, which will effectively make the decision, in such a way that there is always human validation of the recommendation, prediction or diagnosis. suggested by the algorithm.

On the other hand, the Charter of Digital Rights (Catalan [CDDcat 2019] and Spanish [CDDSpain 2021]) recognizes the right of people to be informed of the intervention of an algorithm in any decision that affects them, and the person has also the right to know with what criteria the algorithm has evaluated it and how the result of said evaluation has been prepared. This should make it possible to detect possible biases in the operation of the algorithm, which may eventually increase social inequalities or limit people's rights.

Bias and explainability

Among many others, there are known, and quite scandalous, cases of gender discrimination in AI-based algorithms that evaluate funding applications in different channels. Without going any further, AppleCard, the credit card launched by Apple in 2019, offered up to 20 times more liquidity and a longer payment term to men than to women, even under equal conditions [Telford 2019]. The granting of loans to entrepreneurship has also been the subject of scandalous comparative grievances for applications headed by women in many countries, such as Chile [Montoya 2020], Turkey [Brock 2021] or even Italy [Alesina 2013] and although the problem is detected Already in 2013, in 2021 there are still cases.

Preventing this type of situation has a direct impact on the type of algorithm that can be used to assist in decisions that affect people, because they must be algorithms capable of explaining or arguing why they make a certain recommendation and not another or why they make a recommendation or a certain prediction. There is in fact a relatively new branch of AI that has gained a lot of strength and is known as explainable AI, which deals with these issues. And at the moment, it is difficult to get deep learning methods, which are making very good and very fast predictions about very complex realities, to justify these predictions adequately. This actually happens with all black-box algorithms, including not only deep learning, but also all those based on artificial neural networks or evolutionary computing.

Open AI's ChatGPT application open to the general public on November 30, 2022 has an amazing ability to write all kinds of texts or lines of code on any topic that is asked about; the barriers to the universalization of the electric car, the drafting of a typical rental contract or the code of an application that makes an avatar appear waving only when there is a human in front of the screen are some of the topics on which you can provide answers.

In all the tests carried out by the authors and many other users, the speed and credibility of the answers offered by the application is surprising, although there are also absences of content or relevant arguments in specific fields that make it difficult in any case to consider the results offered as fully reliable. .

ChatGPT has been trained with millions of documents published up to the year 2021 but its creators have not revealed which documents it is. The millions of people experimenting with the application at this moment would like to know what the "universe" of knowledge used has been in order to interpret the possible biases in the results it offers.

Too promising to pass up

It seems clear that the ability of algorithms to encompass complexity, get closer to a desired objectivity, but above all their ability to generate economies of scale in activities related to knowledge constitutes an opportunity that is too attractive for progress to pass up, despite the risks already known, and those others that we will discover in the coming years.

Only through algorithmic organization can the Hong Kong subway organize in the most efficient way the ten thousand workers who every night carry out the 2,600 repair and maintenance tasks necessary to have a public transport service with extremely high levels of punctuality (99.9% since 2014!), and generate savings of 2 days of repairs per week and about $800,000 per year [Chun 2005] [Hodson 2014]. The Barcelona metro has had a system based on artificial intelligence since December 2020 that allows controlling the capacity of the platforms and trains, and opening or closing the accesses to them to generate the safest conditions for passengers from the point of view of the spread of the virus [Nac 2020].

When an algorithm is codified in a programming language understandable to machines, it becomes a tool that can scale its impact to a dimension that is not achievable by means of communication between humans. Thus, for example, updating the software of a fleet of connected vehicles or robots makes it possible to incorporate the improvements resulting from the learning of the whole to each one of them. In parallel, each update of the algorithms that manage our search or navigation tools open and close opportunities for businesses and citizens to discover each other.

In reality, we are witnessing the emerging development of a powerful technology, Artificial Intelligence, which, like all new things, generates certain fears and where good information can help dispel doubts. In this sense, the Catalan Artificial Intelligence Strategy [catalonia.ai 2021] of the Government of Catalonia launched in 2021 an informative course on Artificial Intelligence aimed at providing basic training to citizens in general. The course, designed by the UPC research center, IDEAI (Intelligent Data Science and Artificial Intelligence research center) is free and can be accessed from the website https://ciutadanIA.cat.

Like all technologies, AI can present more or less ethical uses, with greater or lesser risk, and the challenge today is to find that delimitation in the uses of AI that allows us to take advantage of all its benefits without impacting us with damages.

If we go back in history, fire or the knife are technologies that, when they appear, radically change the history of humanity. Both, like AI and so many others, have two faces. The fire allowed us to warm up and overcome the frosts, and also to cook, but if the necessary precautions are not taken, it can cause burns and fires that can end in great natural disasters. The knife allowed for new manipulations of food and the shaping of new tools, contributing to the development of civilization, but it is also used to attack people. Proof of this is that in all cultures we have developed norms that penalize the undesirable uses of these technologies. However, despite these risks, it does not occur to anyone to prohibit the manufacture and use of knives to protect us from their dangers and risks. And this is precisely what should also happen with Artificial Intelligence; it is rather a question of identifying the risks and regulating their uses to allow a beneficial development for all.

Trust, the limits of machines and the power of data

If machine learning makes it possible to maximize a function by applying the computational power, speed and learning capacity of machines to a mass of data, then it is convenient to evaluate those dimensions in which the machines are reliable and those situations in which the mass of data is appropriate.

So the question is: Who can we trust with our “mass of data”? Who can we trust to control the machines to work for us?

The Edelman 2022 report indicates that globally we are at the lowest moment of confidence in companies, NGOs, institutions and the media since this series began in the year 2000. The succession of financial crises, institutional corruption, false news and videos (fake-news) and Covid-19 have installed mistrust as the default sentiment towards institutions. The full Dutch government was forced to resign on January 8, 2021, when it could be shown that the AI-based SyRI system it had been using since 2014 to identify fraud in welfare recipients suffered from a bias that only imputed families migrants from vulnerable districts and had wrongfully prosecuted 26,000 families, forcing them to unfairly repay benefits. Hundreds of families suffering this unfair institutional harassment, which caused depressions, ruins, stigmatization, suicides, or imprisonment BY MISTAKE, because nobody critically reviewed the algorithm's recommendations [Lazcoz 2022].

In the digital sphere, Tim Berners Lee, creator of the www, criticizes how his own creation, initially intended to be the greatest tool for the democratization of knowledge in history, has degenerated into an instrument of division and inequality through the capture of information, attention and control of behaviors and tries to develop a better alternative.

On the other hand, the development of new technologies has resulted in a concentration of algorithms and data in the hands of a few global supercorporations that increasingly influence more aspects of our lives, with the risks that this entails.

Guaranteeing that algorithms do not incorporate biases by design is one of the biggest challenges we must face. In reality, the design of algorithms rests on the developer's understanding of the real process that is being represented computationally and to acquire this understanding, it is necessary to interact with the expert in said process and capture the relevant aspects to be taken into account in the implementation.

In this transmission from the application domain expert to the computer specialist, implicit knowledge plays very tricks. Neither the domain expert is aware that he has it, and that he uses it in his reasoning, decisions and actions, nor does he realize that he is not including it in his description of the world, nor is he transmitting it to the interlocutor, nor is the developer conscious that makes hypotheses (many times dangerously simplifying) that guide its implementation and that can bias the behavior of the algorithm.

Much of the implicit knowledge has a situated cultural component, that is, many social values are valid in a specific society or situation but are not universalizable and some criteria are activated opportunely in humans in certain exceptional situations, but they remain in the unconscious. the rest of the time, and therefore, they cannot be passed to the verbal, much less to the algorithm.

Machines can apply computational power, speed and throughput but are not context sensitive unless given a good formal description of the context, they are not incapable of handling exceptions if they are not implemented to have them. into account, nor are they capable of dealing with unforeseen events, unlike humans. This means that biased behaviors may appear in algorithms that deal with very complex phenomena.

Such biases are not always intentional. Often we do not practice sufficiently neatly the analysis of the scenarios for which the algorithms are built. Defining criteria for the majority, for the general case, is almost always a bad idea, because it is in the exception, in the violation of the minority, where injustices appear.

Further use of lateral thinking and a more thorough analysis of possible exception scenarios are needed to reduce bias in algorithm logic. Certainly, having diverse teams facilitates this task, since the combination of different perspectives on the same problem provides visions that complement each other and naturally reduce logical biases, but also biases in the data.

On the other hand, algorithms are fed by data, and we have lost the good habit of using the old theory of sampling and design of experiments to guarantee that the data that we will use to train an AI will correctly represent the population under study and will not carry biases. that corrupt the predictions and recommendations of the resulting systems.

As Kai Fu Lee explains in his book “AI superpowers”, the availability of data is more relevant than the quality of the algorithm. For example, the algorithms for playing chess existed since 1983, but it was not until 1997 when Deep Blue beat Kasparov, just six years after a database with 700,000 games between masters was published in 1991. In a world where basic algorithms are public it is the availability of data that determines the advantage.

However, currently data is no longer reduced to numbers or measurable magnitudes that traditional statistics have been analyzing for so many years. The voice, images, videos, documents, tweets, opinions, our vital signs or the values of virtual currencies are today extremely valuable sources of information from which to extract relevant information in all directions, and constitute a new generation of complex data that are subject to intensive exploitation from the field of artificial intelligence. Video games, virtual simulations or the much-promised metaverse are all built with data, and are computational metaphors for real or imaginary worlds and, as Beau Cronin states, may be preferable to the real world for those who do not enjoy “reality privilege”. It is shocking to see that today 50% of young people already consider that they have more life opportunities in the online environment than in the real world.

Decentralized or distributed data architectures, and emerging federated data science technologies are part of an intense debate on how to develop data policies, which not only boils down to the supporting architectures where data is housed, but also how it is shared. produce the same, the policies of opening and possession of data and the property and business models (commercial or institutional?) and the licenses of use. Some argue that if the production of the data is distributed - Google's "facelift" facial recognition algorithm is based on photographs of 82,000 people - their property should also be distributed. In some cases, the courts have also forced the destruction of the algorithms generated through deceptive data collection, such as the Kurbo application that captured data from eight-year-old children without the knowledge of their elders.

Challenges and proposals

There is no longer any doubt that human activity modifies the conditions of life on the planet and that it has done so rapidly over the last two centuries, to the point of making us aware of the climatic and social emergency in which we live.

Our collective challenge is now life: how to generate decent living conditions for the 8,000 million people who inhabit the planet, and how to do it without our survival threatening the quantity, quality and diversity of life of other living beings or of future generations.

We cannot do without Artificial Intelligence as a tool to address the complexity, simultaneity and scale of these challenges, but as we have argued before, neither can we let what is technically possible, although not necessarily desirable, guide us in its development.

AI is a technology created by humans and therefore includes many of the characteristics of our species, including the possibility of being wrong. It may have prejudices and preferences when training with biased data or its creators define criteria that are not fair, or do not consider relevant cases, and this can contribute to increasing social inequalities or injustices. For this reason, it is essential to refine the methodologies for the construction of training data, the design of the algorithms, and their validation in order to guarantee a solid and reliable AI, as well as to develop a regulatory and legal framework that accompanies the sector, and the citizenship for sure.

We believe that it is possible to formulate a new framework for the development of Artificial Intelligence at the service of life if we combine four interdependent areas of action: technological, methodological, legal and governance. We detail each of them below.

Technology area: Decentralized technologies

If, as we have seen, the temptation of social or commercial control comes from the centralization of data, it is in our interest to promote technologies that, such as federated learning systems, make it possible to extract knowledge from databases distributed in different locations, distributing the execution of the algorithms to be optimized in each of these sites and then drawing conclusions of general validity after various iterations.

If the massive deployment of the Internet of Things is to help us manage the complexity and simultaneity of these challenges in each local context, we must also drive advances in edge computing that enable autonomous and situated response. at all times without the need to transfer or process the data to a centralized system. An example is the article published in Nature on September 15, 2021, which describes how an algorithm was generated that predicted with 92% reliability the oxygen needs for patients with Covid from vital data and chest X-rays. of patients from 20 hospitals without the need to centralize all the data in a single database.

The blockchain can also help to accredit and validate the interactions between distributed agents for the generation of value, thus facilitating the development of complex systems oriented towards shared objectives.

Methodological scope: Knowledge-based/data-driven hybrid methodologies with expert assistance

Since its birth, AI has gone through several stages. In the initial one, the focus was the knowledge of the experts and the systems based on knowledge (knowledge-based); Given its limitations, a paradigm shift took place towards machine learning (automatic learning) where the focus has been placed on data and inductive learning processes.

After almost forty years of machine learning and data-driven models, we have verified that not everything is representable through data, and the pure data-driven approach also presents limitations for the ambitious objectives that are proposed with the use of the AI.

It seems that what is reasonable is to move to a third paradigm, that of hybrid Artificial Intelligences -those that combine knowledge-based components with data-based components in a cooperative way- that preserve the benefits of both approaches and mutually mitigate their limitations; a paradigm where the data goes where the implicit and non-formulable knowledge of the experts does not, and human knowledge provides the context that cannot be represented in the data.

The third fundamental ingredient is the intense collaboration with the expert in the application of the AI-based system during the entire process of design, construction, selection of relevant data, data preparation, evaluation, validation and supervision of the system; especially in the contextualization and interpretation of results and in the supervision of the recommendations given by the system already in the production phase (human agency).

The scheme that includes the "human in the loop" is the only one that guarantees that the data is relevant, that its use is justified and that what is done with it makes perfect sense, in addition to being the only one that guarantees the correct setting. in the context of the results obtained.

Of course, it will be essential to have experts free of intentionality and in this sense it is interesting to have diverse teams of specialists who provide consensual knowledge in the field of application, leaving personal or biased views aside. This, however, has neither more nor less risk than the absolute need for the data that feeds the system to be of the highest quality, and whose guarantee also falls into the hands of the same human experts involved in the process.

Legal area: Legal and regulatory framework for data and algorithms

As has happened in other areas, the European Union leads the development of a legal and regulatory framework that regulates the use of data and algorithms, respecting the rights of citizens to achieve an adequate balance between the common good, economic competitiveness and social progress.

The general philosophy of said regulation, which states, regions and municipalities are developing in parallel, is to avoid scenarios in which data and algorithms are used inappropriately for commercial or social control of citizens.

In addition to the mandatory legislative initiatives, groups of citizens and various professionals are also promoting initiatives that aim to guarantee proper use of data and a controlled application of algorithms.

Thus, for example, the citizen data cooperative for health research Salus.coop, designed the data use licenses together with citizens through the TRIEM initiative, in which it presented them with various scenarios in which They requested their data and asked for their acceptance or rejection. Data is not simply “content” that can be shared under Creative Commons licenses and each scenario presented detailed (i) who was requesting it (ii) what the research was about (iii) how the research results would be shared and ( iv) what risk of re-identification was assumed by those who shared their data.

In parallel, groups of experts, professionals and governments are promoting the creation of seals and certifications, such as the PIO ethical seal of the Artificial Intelligence Ethics Observatory of Catalonia (OEIAC) [PIO 2022], which certify the explainability of the algorithm. , the adjustment of its recommendations to the desired objectives or the representativeness of the data used in the training with respect to the population affected by its application. On December 23, 2022, the Barcelona City Council has put into force the Protocol for the Definition of work methodologies and protocols for the implementation of algorithmic systems of December 15 of the Government Commission, as part of the actions provided for in the Measure of Government of the Municipal Strategy of Algorithms and Data for the Ethical Promotion of Artificial Intelligence. This protocol, following the European guidelines, establishes in each step of the life cycle of an ICT service of Barcelona City Council based on artificial intelligence, the studies, controls and strategies to be developed [Protocol IA-BCN 2022].

In the corporate sphere, a balance must be sought between the level of transparency and openness of the criteria used by an algorithm, and the company's own trade secret or competitive advantage that guarantees its survival and development in a global market. In this context, it seems socially required to make the decision criteria used explicit, but it is more difficult to open the code for examination.

In the same sense, backrunning policies (inverse execution) of the algorithms should be established on databases representative of the population that they are going to affect in order to objectively validate the absence of bias and the social quality of the resulting recommendations.

Finally, it is essential to inform citizens of those situations in which an AI is making decisions that are relevant to their lives, as well as to implement the right to know the criteria used in decision-making and to resort to a qualified human interlocutor to attend to their claims. The digital rights charters in the process of being drafted or already applicable, developed in various fields, including Catalan, or the state, also formulate the right of citizens to refuse to have their data used for undesired purposes, even if it is personal data. public already in possession of the administrations (opt out).

Governance area: Participatory governance

Data governance, the raw material that feeds Artificial Intelligence, is essential for defining its ultimate goals. As we have observed, the corporate or institutional governance of masses of centralized data can jeopardize or limit fundamental citizen rights, or divert the potential of its use in conflict with the general interest or socially relevant challenges at each moment and society.

Faced with the legitimate fears of citizens about the misuse of their data at the individual level, a set of new social institutions have emerged around the world that implement the collective governance of data for the common good known as "data trusts". literally data trusts.

Data trusts currently adopt a multitude of legal forms; public, private or mixed foundations; data cooperatives or unions; or even DAOs (distributed autonomous organizations) based on a set of rules that operate on the blockchain.

In all of them, the objective is to maximize the collective potential of data through participatory and transparent governance mechanisms that guarantee its use for those where citizens or the regulations of the institutions that manage them on their behalf have directly supported and not for others.

Want to talk about it?

Keep reading:

Blog archive