What our analysis includes
For each individual analysis we explore and present:
An overview of the ID system
This includes information about where and when it was developed and by whom. We also include a list of advocates, representatives and funders for each particular system, so decision-makers can have an idea of who the particular solution is being pushed by.
In this overview we also provide information about whether the system is open source or a proprietary. Open source systems are characterised by an underlying code that is made publicly available. This means anyone can inspect, modify and distribute the code, meaning it’s developed in a decentralised way, relying on peer reviews and community production. Having an open source approach to the development of an ID system has several advantages:
- increased transparency as anyone can see how implementation is done in practice, the project becomes open to a large community of developers,
- testers and other contributors who can constantly provide feedback, contribute actively with new features, or fix bugs,
- on top of this open source solutions are often cheaper,
- more flexible and
- offer more longevity that proprietary alternatives because they are developed by communities rather than a single author or company.
Some might argue that making a codebase open source is a source of risk as motivated enough malicious attackers can take their time to look for vulnerabilities or ways to exploit the code. In reality, keeping the code away from the eyes of the public will not stop this either way, instead missing out on public scrutiny and community development.
From the foundational ID systems currently analysed two out of the three rely on open source software:
- MOSIP is fully open-source and all its code is publicly available.
- e-Estonia services rely on X-Road – an open source data exchange layer. On top of this, earlier this year the Estonian government decided to make all government software publicly available in this repository.
- Aadhaar relies on proprietary technology that belongs neither to the Indian government nor the UIDAI (Unique Identification Authority of India) which makes the system a “black box” in the sense that one cannot know of how operations are de facto being handled.
Insights into the infrastructure makeup
Authentication attributes and personal data need to be appropriately safeguarded. Encryption is the process through which data is encoded so that it remains inaccessible to unauthorised users. On top of storing and dealing with people’s personal information such as name and date of birth, several foundational ID systems systems across the world increasingly rely on biometric data, which data protection frameworks largely recognise as sensitive, and therefore a special category of personal data.
The use of such sensitive and uniquely identifying personal data should mandate that the tightest security safeguards are in place. Encryption is crucial to keeping data safe from unwanted third parties and to provide users with a reliable authentication process, providing veracity when determining if an entity – user, server, or client app – is who it claims to be.
In any modern software the use of strong encryption should be assumed, and as such we will generally not discuss implementations unless we cannot find any information, or what is being used is wildly and obviously dangerous (such as using encryption protocols which are obsolete or no use of encryption at all).
Is user data stored in a centralised or decentralised way?
Electronic databases are the preferred option to store, reference, validate and authenticate identity data. These databases can exist as a central repository or as distributed systems depending on the country and the implemented solution.
Physical V Logical (de)centralisation
(De)centralisation can take two forms; distributing the database across several geographically dispersed physical computers (either through a process known as “sharding”, where different parts of the database are held in different places, or with identically replicated databases spread over several computers), or through holding logically separate data in separate (preferably geographically dispersed) databases which use some form of API (Application Programming Interface) to interact with each other.
Whilst logical decentralisation is a design decision, physical decentralisation should be expected in any at-scale system to reduce (and attempt to eliminate) single points of failure, as well as potentially speeding up access by “moving” the database being interacted with physically closer to the user.
Insights into how de-duplication is handled
Identity de-duplication is a procedure used to attempt to 1:1 match a natural person to a unique “identity” within a system — the ultimate goal being no cases where one person has two “identities”, or where one “identity” is shared by more than one person.
Some systems may attempt de-duplication based purely on demographic data, while others will attempt to compare against biometric identifiers already stored by the database.
In the particular case of Aadhaar for example, the UIDAI published a report with a proof of concept on biometric de-duplication. In this report, the UIDAI states that the occasional false positive is expected due to reasons such as:
- faulty biometric collection equipment.
- human error in assigning biometrics to the correct “unique” profile.
- one person enrols twice with a different name.
- two people may happen to have the same biometrics.
This proof of concept is based on a sample of 40,000 people where the UIDAI (Unique Identification Authority of India) is aiming for a false positive identification rate of 0.0025%, meaning 2.5 false positives for every 100,000 comparisons. In this context a false positive means a biometric system incorrectly matches an individual to someone else’s biometric identifiers. This may not seem like a lot but when scaling up the sample size to India’s population of 1,380,000,000 it adds up to each person having over 17,000 false positives to resolve when using their biometrics for identification purposes.
In practice this means that solely based on biometrics an individual cannot prove to be one of 1,380,000,000 citizens. Rather they can only prove that they are one of the 17,000 people with that one biometric identifier, within the whole population. This shows that in reality we are dealing with a lot more than occasional false positives when doing biometric deduplication and that it is impossible to claim that all the IDs in UIDAI’s CIDR (Central Identities Data Repository) are biometrically unique. You can find a breakdown of the simple maths used to get to these numbers on our research into Aadhaar.
Principles of engagement for each system
The technological choices in the design and deployment of digital ID systems can express a variety of political motives, which impact how societies operate at the deepest level. The design, deployment and governance of large socio-technical systems can establish a different public order and power entrenchment among social and political groups.
With this in mind, besides presenting information on the technical infrastructure makeup of each ID system, we will also delve into the engagement principles or governing principles published by developers, when existent. Such guidelines on external conditions and safeguards present what developers believe should be in place so that the risk of abuse and exploitation in the use of their tool is minimised.
Countries where the particular ID system is deployed or where there’s evidence it will be deployed in the future
Reported examples of abuse in countries where the researched ID system is in use
In our research PI found reported example of abuses in all systems we analysed, despite the fact that some of them proclaim to be ‘safe’, ‘inclusive’ and ‘privacy friendly’.
For instance, Aadhaar in India has had countless reports of abuse over the years including massive data leaks, exclusion from access to benefits and even issues around de-duplication. This list by itself contains almost 40 isolated cases of breaches between February 2017 and May 2018. The data leaked are not limited to Aadhaar numbers and demographic information. In some of the cases sensitive data such as data on pregnancy, people’s religion and caste and even bank details were leaked alongside Aadhaar numbers.
The Indian government has also made enrolment in Aadhaar a mandatory requirement to access a myriad of social protection schemes. This measure creates thick barriers in accessibility for a lot of people and has led to cuts in accessing food rations which has been linked to several deaths by starvation across the country.
On top of this, the Indian government and the Unique Identification Authority of India (UIDAI), have ignored privacy concerns as well as sample test results of its pilot project that showed that there could be up to 17,000 false positives each time an Indian citizen engages in an identification process. This systematic fail has led to cases where one person somehow ends up getting two different “unique” Aadhaar numbers.
MOSIP is being deployed in Morocco and there have been concerns regarding exclusion through language. Morocco’s General Directorate of National Security announced a new generation of identity cards in 2020, but according to a draft law the card would only be in Arabic – one of the two official languages of the country – and French – a foreign non-constitutional language, leaving Tamazight – the second official language – out. This goes directly against regulations aimed at gradually including Tamazight in Morocco’s public life and encouraging the usage of Tamazight, alongside Arabic, in administrative documents, including national identity cards.
Regarding countries where e-Estonia or X-Road based ID systems have been implemented there haven’t been any similar reports of abuse to this day.
The information presented about each of the systems was all collected from publicly available sources. As stated above, some of the analysed systems will be open-source meaning they have increased levels of transparency, including public access to the source code being used. This allows for anyone to confirm any technological statements made about the said system, which we must stress is an invaluable feature. On the opposite end, there are “black box” systems for which our research was based on documentation made publicly available by the ID system designer or owner. In these cases it becomes impossible for the public to verify that the system behaves as described in the documentation, and hence the public’s knowledge is trust-based and limited to whatever information is picked as fit for public consumption.