There is a more philosophical way to frame this question: Can a robot get to know you better, without that knowledge being considered personal information about you?
The problem of machine learning in the current GDPR age is just one of the many afterthoughts that have come up since the law went into effect. Digital services that alter the content they present to users according to behavior data use algorithms that feed on that data to “improve” their “Feeds.” The sophisticated automation of these algorithms can, by design, manipulate and make use of user data in ways that engineers did not intend nor predict. And, potentially more concerning, they cannot inspect. The inability to explain what happens with user data is behind some of the lawsuits in the torrent that ensued minutes after GDPR became enforceable.
Ask a machine learning engineer and, chances are, they cannot imagine how you can create algorithms that allow personal data use to be “auditable.” A few may even scoff at the idea, tantamount to hindering the march of progress. This is not a new sentiment, and in fact, it tends to follow suit whenever a new regulation intervenes, especially into innovative practices, no matter the industry. But at the end of the day, examples of compliant machine learning continue to appear —Moodle conspicuously included— that help us answer definitively: While it is likely that a large segment of machine learning practice is illegal, machine learning is not by definition non-compliant. There are algorithm development approaches that can let the robots get closer to you without compromising your privacy.
GDPR-compliant Machine Learning and the Moodle way
The lengths Moodle has gone to offer GDPR compliance are hard to find in other systems, especially for-profit ones, suggesting both the divide between users and corporations’ best interests, as well as the importance of governance and enforcement. Let’s take a look at the possible ways a company could shift their practices regarding machine learning algorithms, and how Moodle is approaching them. It is still worth mentioning that GDPR-compliant Moodle and the Privacy API are available for Moodle 3.5, 3.4.3, and 3.3.6 and newer only.
Auditable algorithms, which should be a fair and straightforward resolution, and something that Moodle has championed with the creation of the Privacy API. The Privacy API is akin to a special funnel through which all data considered personal must be sourced. As a recent screencast explained, the Privacy API keeps track of every instance of access of personal data use at a high level of granularity. Furthermore, the API requires retrieval and removal options available by the program in order to allow access. Failure to follow through would deem the plugin, and consequently Moodle sites that have it installed, uncertifiable. In all fairness, this new process exerts a considerable burden on physical resources compared to non-compliant access, and Moodle HQ developers recommend using the API as sparingly as possible.
Encrypt the algorithm. Encrypting learning data should also be an obvious practice, were it not for the fact that, pending the replication of recent developments, encryption has been shown to hinder the efficacy of the learning by the machine. A less than satisfying route would be to enable unsuitable machine learning, but to ensure that at no point in the process any individual has access to raw data (nor users to your model). This would make algorithms even more obscure — and potentially near-impossible to test—, but could get the job of protecting data done, and is relatively easy to implement. If poor practices are too widespread to enforce, authorities might consider a middle-ground solution such as this one. In Moodle, this is highly impractical given the open source nature of the machine learning engine, but individual organizations might be tempted to take this route in their custom developments, which the LMS could not impede, but of which there are no known cases yet.
The “pothole” conundrum. Suppose you drive down a road day after day, and over time a pothole starts to form, a part of which you are responsible for, no matter how small. Do you own that pothole? In machine learning, an algorithm will inevitably be different after user data roams through it, and will remain changed after a data removal request. (Or until the storage expiration date comes, which Moodle now adds automatically.) There is still a debate to have about how reasonable erasure requests can be. Should a data removal request also demand that the algorithm “unlearns”? Moodle has already some practical answers, if not for machine learning: In case of a data removal request from a teacher, the LMS will still keep grades and feedback given to students by them. The legal team at Moodle HQ has found this practice to be GDPR compliant.
Both the science and practice of machine learning, as well as GDPR and similar legislation around the world, will change and evolve, in harmony at times and clashing at others. In any case, traceability should win the day. It will be an increasing demand as lawmakers gain more understanding. But it is also a good data practice, and arguably a more ethical one. In this sense, Moodle deserves credit for being in front of the debate, rather than a reluctant responder. ■