Information posted to social networks that are publicly accessible may be scraped and aggregated by third parties regardless of the social media sitesâ€™ terms and conditions or even technical means taken to prevent data mining, according to the U.S. Court of Appeals for the 9th Circuit.
In an opinion issued Monday, Sept. 9, the federal appellate court affirmed a lower courtâ€™s opinion granting a preliminary injunction against professional social networking site LinkedIn that prevents the company from blocking access to automated bots deployed by hiQ, a data aggregator.
The case opens opportunities for companies that collect personal data from sites that do not restrict public access, discourages claims that social media sites have property rights in their usersâ€™ data, and limits the scope of the Computer Fraud and Abuse Act as a means for preventing automated bots from scraping publicly visible data.
Although the case is only at the preliminary injunction stage and has yet to go to trial on the merits, the opinion has significant implications for the personal data marketplace.
LinkedIn allows people to create professional profiles, post articles and comments, search for jobs, and connect to others using the site to grow their professional networks. Members may specify which portions of their profiles are visible to the general public and which are limited to their network, as well as switch on a â€œDo Not Broadcastâ€� option that prevents notification to their network when their profiles are updated.
In addition to prohibiting data scraping or copying in its User Agreement, LinkedIn works to prevent access to its servers by unauthorized automated bots and uses other technical systems to detect non-human activity indicative of scraping and to block suspicious or disfavored IP addresses.
Among the users LinkedIn has terminated for allegedly violating its User Agreement is hiQ Labs, a data analytics company founded in 2012 that scrapes information from LinkedIn usersâ€™ public profiles (including name, job title, work history and skills) and sells that information to business clients, such as eBay, Capital One and GoDaddy. HiQâ€™s analytics are designed to identify employees at risk of being recruited away or identify skills gaps in employersâ€™ workforces so they can offer internal training and mobility.
LinkedIn also offers a product to provide companies with similar insights, using the data of the more than 500 million users of the platform.
In 2017, LinkedIn sent a cease-and-desist letter to hiQ asserting that hiQâ€™s use of scraping bots violated LinkedInâ€™s User Agreement and the CFAA, among other laws. HiQ responded by filing a lawsuit seeking a declaration that it was not violating any law and an injunction preventing LinkedIn for blocking its access to usersâ€™ data.
The district court granted hiQâ€™s motion and ordered LinkedIn to remove any technical barriers to hiQâ€™s access to public profile information. LinkedIn filed an appeal.
Limitations on LinkedInâ€™s ownership and control of usersâ€™ data
The 9th Circuit found that hiQâ€™s business model depended on access to LinkedInâ€™s publicly accessible data and rejected LinkedInâ€™s arguments that hiQ could gather workforce data from other means. It also rejected LinkedInâ€™s arguments that allowing hiQ to scrape LinkedInâ€™s site threatened its usersâ€™ privacy and put at risk LinkedInâ€™s goodwill with its members.
Regarding LinkedInâ€™s economic interests â€” avoiding competition from third parties that also want to profit from selling its usersâ€™ data â€” the court found that LinkedIn â€œhas no protected property interest in the data contributed by its users, as the users retain ownership over their profiles.â€� Users, moreover, â€œquite evidentlyâ€� intend their profile data to be accessed by others, â€œincluding for commercial purposes.â€�
HiQâ€™s lawsuit against LinkedIn claimed that, by blocking hiQâ€™s access to LinkedIn data, LinkedIn had tortiously interfered with hiQâ€™s contractual relationships with third parties and thereby harmed hiQâ€™s business. Rather than challenging the facts underlying this claim, LinkedIn instead sought (at this stage of the litigation) to justify its actions on the basis of its â€œlegitimate business interests.â€� Although this issue will be further explored at trial, the 9th Circuit struggled to find LinkedInâ€™s technical practices for blocking hiQâ€™s scraping activity were â€œrecognized trade practicesâ€� â€” they were not â€œsimilar to trade practices heretofore recognized as acceptable justifications for contractual interference.â€�
Indeed, the court held, â€œif companies like LinkedIn, whose servers hold vast amounts of public data, are permitted selectively to ban only potential competitors from accessing and using that otherwise public data, the result â€” complete exclusion of the original innovator in aggregating and analyzing the public information â€” may well be considered unfair competition under California law.â€�
Finally, the court rejected LinkedInâ€™s arguments that it was protecting its membersâ€™ data and enforcing its User Agreement, emphasizing again that LinkedIn â€œhas only a non-exclusive license to the data shared on its platform, not an ownership interest.â€� The court identified LinkedInâ€™s core business model as providing a platform for professionals to share their information with each other that could continue to exist even if third parties use that information for commercial gain. The fact that LinkedIn had developed its own data analytics tool to generate revenue from its usersâ€™ data only served to support the courtâ€™s position that LinkedIn didnâ€™t have â€œits membersâ€™ privacy interests in mind.â€�
CFAA not applicable, enacted to prevent â€œhackingâ€�
The CFAA forbids access to protected computers â€œwithout authorizationâ€� or in a manner that exceeds authorization. The 9th Circuit held that LinkedInâ€™s servers were â€œprotected computers.â€� Key to deciding this case was the courtâ€™s interpretation of the term â€œauthorization.â€� In short, the 9th Circuit was not willing to find that hiQâ€™s scraping bots were unauthorized or that they exceeded authorized access.
LinkedIn attempted to prevent hiQâ€™s automated systems from gaining access to usersâ€™ personal data by deploying blocking technology and by communicating â€” at least through a cease-and-desist letter â€” that it disapproved of hiQâ€™s practices.
The court focused on the public accessibility of LinkedInâ€™s site in general. Because LinkedInâ€™s site is accessible to anyone who visits the site, it is by default freely accessible â€” everyone is authorized, the court noted. Thus, denial of access is a ban, not withdrawal of authorization. By contrast, the court found that the phrase â€œaccess â€¦ without authorizationâ€� in the CFAA applies more appropriately when permission is typically required of everyone, when access is generally restricted only to those â€œspecially recognized or admitted.â€�
This case has significant implications for privacy. It sets a precedent that data entered by users to a social media website does not belong to (but rather is merely licensed to) the site owner.
To justify its reasoning, the court opined that the CFAA is a â€œcomputer hackingâ€� law, citing legislative history that references the prohibited conduct of â€œbreaking and entering.â€� Accordingly, the court found that informing someone via contract (or notice) that their conduct on the site isnâ€™t welcome or permissible does not, should they access the site anyway, constitute conduct â€œwithout authorization.â€� It even cited a law review article by the late professor Ian Kerr that an â€œauthentication requirement, such as a password gate, is needed to create the necessary barrier that divides open spaces from closed spaces on the webâ€� to support the notion that authorization for data scraping is only required when sites are password protected or otherwise not visible to the public.
Finally, the court interpreted the CFAA to divide the information universe into three categories:
- Information for which access is open to the general public and permission is not required.
- Information for which authorization is required and has been given.
- Information for which authorization is required and has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed).
Because LinkedInâ€™s public user profile information was found to fit into the first category, the court found that the â€œbreaking and enteringâ€� analogy did not apply.
Importantly, the court distinguished the Facebook, Inc. v. Power Ventures, Inc. case, in which Facebook had successfully prevented a social networking aggregator from accessing Facebook usersâ€™ data, on the grounds that Facebook had â€œtried to limit and control access to its websiteâ€� and required â€œits users to register with a unique username and password.â€� Here, the court found, hiQ was scraping data from LinkedIn that â€œwas available to anyone with a web browser.â€�
This case has significant implications for privacy. It sets a precedent that data entered by users to a social media website does not belong to (but rather is merely licensed to) the site owner. It undermines the significance of user agreements to set the terms for non-users who might, in contradiction to the agreementsâ€™ terms, collect and use data made available on those sites. It also narrows the definition of â€œauthorizationâ€� in the context of websites that collect and host personal data to those sites that require usernames and passwords and increases the responsibility on such websites to inform users of the benefits of privacy settings.
The case now returns to the district court for a trial on the merits, provided there is no settlement. But lawyers, privacy professionals and privacy scholars will no doubt be pouring over the implications of this case â€” including the novel interpretation of the CFAA â€” for months and years to come.