Legality of data scraping in India

What is data scraping?

Data scraping (or web scraping) is a method through which a software/’bot’ is used to import any data or information from a website into a readable output format. It is generally an automated process of extracting data from a website.

What are the potential legal issues with data scraping?

1. Possibility of infringement of IP rights: It is possible that in the exercise of data scraping, the automated tool may pick up such information that is protected under a trademark or copyright. In a case before the Delhi High Court, OLX had successfully obtained a permanent restraining order against a company to prevent them from using automated/manual means to scrape any data, including commercial data, pertaining to OLX’s website.[1] The company lifted off listings, photographs and other information from OLX’s website, and posted it on its own website. OLX had contended before the Court that all this information qualifies as a ‘proprietary database’ of OLX built through tremendous amounts of skill, labour and creativity. It said that such database of information qualifies as ‘original literary work’ and hence, is entitled to protection under copyright law.[2] The Court ruled in favour of OLX. This can also apply similarly to trademark infringement.

However, it is important to note that the data scraping was illegal here because the company was posting the information collected from OLX’s website on its own. There would have been no copyright infringement if the company had used the data for its own private use.[3]

2. Violation of terms of use of a website: It is general practice for most websites to include a clause in their terms of use that disallows scraping of data. Some examples:

a. LinkedIn- “You agree that you will not…(D)evelop, support or use software, devices, scripts, robots or any other means or processes (including crawlers, browser plugins and add-ons or any other technology) to scrape the Services or otherwise copy profiles and other data from the Services.”[4]

b. Facebook- “You will not engage in Automated Data Collection without Facebook’s express written permission.”[5]

Additionally, most websites contain a file known as ‘robots.txt’ which is a readable file used to identify the portions of the website that crawlers can and cannot scrape.[6] Even if the terms of use of a website prohibit data scraping, the websites mentioned in the robots.txt file have been technically permitted to do it. Hence, it is good practice to specify the robots.txt file in the terms of use itself, to prevent any conflict. For example, Twitter’s clause states that- “crawling the Services is permissible if done in accordance with the provisions of the robots.txt file…however, scraping the Services without the prior consent of Twitter is expressly prohibited”.[7] This provides clarity to the entity carrying out data scraping. Another reason to mention a robots.txt file in the terms of use is that the file by itself cannot enforce crawler behaviour on a website. It is up to the crawler to obey the parameters in the file.[8]

Some jurisdictions have looked at the question of whether scraping restrictions imposed through terms of use are legally valid or not. The Court of Justice of the European Union had held that an entity is legally entitled to impose contractual restrictions on the use of its database by third parties.[9] In this case, a travel aggregator platform was pulling off flight data automatically from a private airline’s website.

3. Position in India: Other than very few cases dealing with IPR infringement, Indian courts have not expressly ruled on the legality of web scraping. However, since all common forms of electronic contracts are enforceable in India,[10] violating the terms of use prohibiting data scraping will be a violation of contract law. It will also violate the Information Technology Act, 2000, which penalizes unauthorized access to a computer resource or extracting data from a computer resource without the owner’s permission.[11]

However, a US appeals court had interpreted a similar provision in USA’s Computer Fraud and Abuse Act (“CFAA”) differently.[12] The matter concerned a suit filed by a data analytics company called ‘hiQ’, which scraped data from public LinkedIn profiles, such as name, job title, work history and skills. Among other contentions, LinkedIn had contended that hiQ’s actions violated the CFAA as it continued to intentionally access LinkedIn’s servers ‘without authorization’ and obtained information from there. However, the Court disagreed with LinkedIn’s arguments, and allowed hiQ to continue its data scraping activities. It had held that the prohibition on unauthorized access is applicable only to private information, which has restricted access through a password or other technical barriers. Since hiQ was only using publicly available information on LinkedIn, it did not violate the CFAA.

While Indian courts have never examined the relevant provisions of the IT Act in this context, it can be argued that the penalty under the IT Act does not apply to scraping of publicly available information. Under the rules governing sensitive personal data or information (“SPDI”) under the IT Act, information that is freely available or accessible in the public domain is excluded from the definition of SPDI.[13] Even the Personal Data Protection Bill, 2019 allows processing of publicly available data without the consent of the data principal.[14]

Authored by Arpit Gupta, Senior Associate, with inputs from Aman Taneja, Senior Associate and Nehaa Chaudhari, Partner.

For more on topic, please reach out to us at contact@ikigailaw.com

[1] OLX BV and Ors. v. Padawan Ltd., Delhi HC order 15 December 2016, http://delhihighcourt.nic.in/dhcqrydisp_o
.asp?pn=245500&yr=2016.

[2] OLX BV and Ors v. Padavan Ltd., Delhi HC order dated 31 March 2016, http://delhihighcourt.nic.in/dhcqrydisp_o
.asp?pn=71402&yr=2016.

[3] This is one among the many exceptions to copyright infringement given in Section 52 of the Copyright Act, 1957.

[4] User Agreement, LinkedIn, https://www.linkedin.com/legal/user-agreement.

[5] Automated Data Collection Terms, Facebook, https://www.facebook.com/apps/site_scraping_tos_terms.php.

[6] https://www.tutorialspoint.com/python_web_scraping/legality_of_python_web_scraping.htm.

[7] Twitter Terms of Service, https://twitter.com/en/tos.

[8] Understand the limitations of robots.txt, About robots.txt, https://support.google.com/webmasters/
answer/6062608?hl=en.

[9] Ryanair Ltd. v. P.R. Aviation BV, Court of Justice of the European Union, 15 January 2015, https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:62014CJ0030&from=EN.

[10] Sections 4 and 10A of the IT Act grant legal recognition to electronic contracts.

[11] Section 43 of the IT Act.

[12] hiQ Labs Inc. v. LinkedIn Corporation, US Court of Appeals for the Ninth Circuit, 09 September 2019, https://cases.justia.com/federal/appellate-courts/ca9/17-16783/17-16783-2019-09-09.pdf?ts=1568048483. Also see this EFF article- https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data.

[13] Proviso to rule 3, The Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011.

[14] Clause 14(2)(g), PDP Bill.

RELATED WRITING

Adopt experimental regulatory approach for emerging technologies

Satellite broadband- The world is gearing up for India, but are we ready?

Internet Exchanges in India: A Legal Anomaly

Our comments on the draft National Geospatial Policy, 2021

Mapping the contours of the new Geospatial Guidelines on India

Labelling of Plant-Based Meats: Regulatory Landscape

Overview – Regulatory Issues Surrounding Cultivated Meat

Key Highlights of the Union Budget 2021-22

How do you exhibit a legal contract in an art gallery?

Ikigai Law and Good Food Institute’s joint representation on the FSSAI Draft Dairy Analogue Notification 2020

Challenge the status quo

Challenge
the status quo