The first step when looking at technology is to find basic information about it – to define and categorise it. Looking at the Wikipedia page of a given technology is often a good starting point and will help you clarify what is implied within a technology (for example for Facial recognition). This is particularly useful when there is no specific technology that is mentioned in the partnership or when you are looking at a tender. You might also want to check the company’s marketing materials to get a sense of what they specialise in and the type of product they offer.
Sometimes a partnership will involve more than one technology, sometimes across more multiple contracts and/or partnerships – such as an ID system that may require a fingerprint scanner and a database, that may be supplied by different companies.
Getting a broad sense of what you are looking at is a simple but very important step to be able to move forward and identify risks. Your goal is to be able to give a broad but accurate high-level definition of what technology is at stake in the partnership.
Examples of high-level descriptions of a technology:
- Facial recognition system – A system capable of matching faces identified in a given image or video with a dataset of previously identified human faces.
- Ankle tracking bracelet – A physical bracelet attached to someone’s limb capable of recording and transmitting geolocation or proximity with a base tag.
- Unmanned Aerial Vehicle – An autonomous or remotely controlled aerial vehicle capable of taking predefined actions and collecting, processing and transmitting environmental data such as images, temperatures, and sounds.
With this first step done, you will rapidly realise that technologies will usually rely on multiple physical and logical elements to function. Breaking it down and identifying each layer is therefore the next logical step in understanding how the technology functions and what are the potential points of failure. For example, a facial recognition system captures, transmits, stores, and processes data. Different elements play a key role in each of these steps.
These layers can be hardware, software, or a combination of both. The collection of technologies behind a simple term such as “a database” might be complex. The more thorough you are in dividing it into blocks the better an understanding you’ll have of what’s at stake and its potential risks.
Using a data-centered approach, the different elements composing the technology will usually fit into one of the following four categories:
i. Data collection/capture system (hardware/software)
Data collection amounts to capturing information. This could consist in a camera taking a picture, a sensor capturing information like temperature, software registering an action such as a click on a button or a Mobile Phone Extraction device grabbing data from a phone. Data collection systems can be physical devices, such as a satellite equipped with sensors, or virtual, such as an app or a web scraper (a piece of code that crawls the internet to collect data).
Why it matters
Understanding what part of the technology is in charge of the data collection enables you to understand what data is collected (images, sound, user entered data), where it comes from (sensors, user interactions) and under what circumstances it is collected (with or without the person knowing, how often, etc.). This enables you to identify potential issues regarding the legality of the collection or the accuracy of the data collected.
Examples of a data collection system:
- A network of cameras in a city
- A website to register for a public event
- A satellite with a variety of sensors taking photos of a given area
- A fingerprint reading machine at the airport
Potential risks in data collection
Data collected can be incorrect, sensors can be rigged, data can be collected without consent or other legal basis, the physical sensors can be degraded over time, the logic or set of instructions (for a software) can be biased or incorrect, the device can be vulnerable to attack (overloaded, provided with incorrect information etc.).
ii. Data transmission system (hardware/software)
Once the data is collected it might be transmitted to another system for storage or processing, for example a server. Transmissions will usually happen through existing solutions with well-defined protocols such as the internet protocol suite (TCP/IP) for communication between devices on the same network (like two servers connected to the internet or a smart camera and a computer connected to a private network) but might sometimes be the innovation at stake (e.g. the New IP proposal made by China at the International Telecommunication Union (ITU) or 5G New Radio, the global standard for the air interface of 5G networks).
Why it matters
Understanding if and how data is transmitted enables you to identify potential security risks (if transmission isn’t secured, for example using an unsecured Wi-Fi network), existing concerns (if a protocol/network is outdated and has known vulnerabilities as 2G) or technical requirements (e.g. distance at which Bluetooth can function to reliably transmit data) to better assess suitability in the given context.
Examples of data transmission systems:
Potential risks in data transmission
Technology can be insecure (poor or low level of encryption, known vulnerabilities and so on), data can be degraded/lost in transit, data can be intercepted/tampered with, the system can pose health threats, the system can be interrupted by external factors (Denial of Service attack on a network, destruction of emitter/receiver and more).
Note: To learn more about protocols, standards, standardisation bodies, see “A note about technical protocol and standards” at the end of this chapter.
iii. Data storage system (hardware/software)
After capture and transmission of the data, it might be stored somewhere for processing or archiving purposes. Storage systems will usually rely on some form of storage device such as a hard drive, an SD card, a USB stick, often as part of a bigger system if regular access is required (laptop, server…). The variety of software in charge of storing and accessing this data is massive, from database software such as MySQL to blockchain-based systems that offer immutability.
Why it matters
Identifying where and how the data is stored allows you to better understand the implications (the software/method used might be prone to security vulnerabilities or frequently targeted by attacks such as an ElasticSearch database or similar Bucket product), retention (the system might only allow for data to be stored for a certain amount of time or, on the contrary, store data indefinitely such as a blockchain system), access-control (a system with too-loose permission can allow unauthorised access) and durability (the expected lifetime of an SD card is lower than for a SSD for example). Is the chosen data storage system adequate to the purpose it intends to achieve? Knowing where a storage system is, what it is connected to and who has access to it also gives you keys to better assess risk.
Examples of data transmission systems:
- A hard drive/USB key with a given filesystem (NTFS, exfat, ext4…)
- A SQL database (a software database designed to be accessed using the SQL language)
- A blockchain duplicated across multiple client
- A non-rewritable CD (CDR-R)
- A spreadsheet software such as Microsoft Excel
Potential risks in data storage
Poor permissions management enabling unauthorised access to the data, fallibility of physical hardware storage (e.g. a disk can fail and lose the data it stored), inadequate data retention capabilities (e.g. blockchain storing data that should be erased), inadequate storage space (e.g. can’t store new data because it’s full), poor life expectancy (e.g.: choosing a database software management that’s not supported by its manufacturer and does or soon will not receive security updates), etc.
iv. Data processing system (software)
Upon capture or after storage, the data can be processed to produce new information. This could be an image analysing software defining what are the objects visible in an image captured, an algorithm giving the solution to a maths problem or a program predicting temperatures based on previously collected data. Data processing systems can either process data on the fly (without storing data in intermediate steps) or use stored data. These systems are usually software developed using a set of programming languages (Java, Python, Go…) and might function in connection with the data storage system. They can run on a variety of devices from a server to a smartphone or a single board microcontroller. Some systems such as neural network-based Artificial Intelligence will perform differently depending on the data they are processing but also on the data they were trained on. In this case it may be worth looking at the training dataset for the AI system as a separate block within the Data collection/capture category. The better you separate the different components the better understanding you’ll have of what’s at stake.
Why it matters
Data processing systems can produce biased and inaccurate information, both because of the data fed into the system (incomplete, inaccurate, non-representative…) or because of logic flaws (something unaccounted for in the algorithm’s logic). Understanding what the processing is designed to do, what data is processed and what type of information it outputs can allow you to spot potential flaws in the code logic, missing variables or to assess how appropriate a system is to make a decision.
Examples of data processing systems:
- A facial recognition software processing photos taken by public cameras
- A boat movement detection software using AI and satellite imagery
- An advertising system that infers your personality traits based on data collected online
- A virtual assistant such as Siri/Google/Alexa
Potential risks in data processing
Flaws in the algorithm logic (something unaccounted for or a human mistake rendering the results false), poor manufacturer support (software/program isn’t supported after a certain time making future development and fixes complicated or impossible for the buyer), low transparency/accountability due to licencing (proprietary software make auditing process complicated or impossible), bias due to the dataset it was trained on or data that has been included, security vulnerability (unauthorised access, hack…) etc.
v. A note about prioritising blocks of technology
If you’re looking at a specific company or contract, you can use this breakdown and information to focus on the layers where the company is mainly involved. If you are looking at a company specialised in software and data processing (such as Palantir) then you know that this is likely to be the key layer.
That doesn’t mean you should neglect the other layers of the technology deployed, on the contrary. These elements, because they are not necessarily part of the company’s or public body’s expertise might end up being overlooked and poorly managed. For example, the UK stored and ultimately lost covid related data in an Excel file, exposing how little the data storage system had been considered, especially compared to the effort put in collecting this data.