The legality of crawler technology application causes controversy, and it is urgent to regulate illegal data crawling

The legality of crawler technology application causes controversy

Ensuring data security urgently needs to regulate illegal data crawling

Core reading

In the context of the era of big data, more and more market entities have invested heavily in collecting, sorting and mining information. If web crawlers are allowed to arbitrarily use the data resources obtained by others through huge investment, it will not be conducive to encouraging commercial investment, industrial innovation and honest operation, and may even directly violate the wishes of the data source users and the right to know, and will eventually damage the healthy competition mechanism. .

□ reporter Zhang Wei

□ Xing Guohan, trainee reporter of Legal Network

With the rapid development of social economy, the value of data has become increasingly prominent, and it has become an essential element of enterprise technological innovation. However, when enterprises obtain data through technical means, whether the application of data scraping technology is reasonable and legal is a question worth pondering.

In recent years, “crawling data” by web crawlers has become a hot word, and relevant judicial cases continue to appear. According to incomplete statistics, there have been more than ten judicial cases involving web crawlers in recent years, including both civil and criminal cases. Such cases are even intensifying.

At the Yangtze River Delta Data Compliance Forum (Phase 3) and the Legal Regulation Seminar on Data Crawlers held in Shanghai recently, Chen Chaoran, deputy director of the Research Office of the Shanghai People’s Procuratorate, revealed that the procuratorial organs are actively promoting corporate compliance reform pilot work , Data compliance is the focus. “Currently, cases of crawling data are very common. When network platforms or individuals use technical means to grab data from other platforms, whether this behavior is legal, who the subject of the platform data is and who uses it, is worthy of in-depth study.”

Guo Bing, deputy dean of the Hangzhou Yangtze River Delta Big Data Research Institute, believes that data crawlers, as a neutral technology, have been widely used in the Internet industry. It should be noted that if the crawler technology is improperly applied, it will damage the legitimate rights and interests of other competitors, and even suspected violations or crimes will also have a very large negative impact on the healthy development of the industry.

Crawling data is suspected of infringement

From a technical point of view, crawlers use programs to simulate the behavior of humans surfing the Internet or browsing web pages and apps, allowing them to efficiently crawl the information needed by crawler makers on the Internet.

Liu Yuchen, head of digitalization at L’Oréal China, said that most websites refuse to access crawlers for reasons that include both commercial interests and the safety of their own website operations. In addition to crawlers that may crawl to data that the website does not want to be crawled, website operators often worry that crawlers will interfere with the normal operation of the website.

Irregular crawlers automatically continuously and frequently visit the crawled party, and the server load soars, which will also bring “unbearable” weight to the server: dealing with inexperienced websites, especially small and medium-sized websites may face website attacks. It doesn’t open, the web page loads extremely slowly, and sometimes it’s even directly paralyzed.

Sina Group Litigation Director Zhang Zhe said that whether it is a crawler or the technology for other purposes, it is neutral in itself, but the application of crawler technology is not neutral, and the application of technology has the purpose of the user. At this time, we should not evaluate the principle of technology, but need to evaluate what technology is used for and whether this behavioral means is justified.

When it comes to web crawlers, the robots protocol is an inevitable topic. The full name of the robots protocol (also known as the crawler protocol) is the “web crawler exclusion standard”. The website uses the robots protocol to clearly warn search engines which pages can be crawled and which pages cannot be crawled. This agreement is also referred to by the industry as the “gentleman’s agreement” in the search field.

Liu Yuchen said that when a web crawler visits a website, the robots protocol is like a sign standing at the door of his room, telling outsiders who can come and who can’t. However, this is only a gentleman’s agreement, which can only serve as a notice, not a technical defense.

In practice, malicious crawlers do not comply with the robots protocol of the website when crawling, and may crawl data that should not be crawled. This situation is not an isolated case. Zeng Xiang, head of Xiaohong’s calligraphy affairs, said that malicious crawlers often occur on content platforms and e-commerce platforms. The content that is crawled is more of videos, pictures, texts, internet celebrity interactive data, user behaviors, etc. In the field of e-commerce, it is mostly business information and product information.

“Content platforms generally stipulate that the relevant content intellectual property rights belong to the publisher or the publisher and the platform jointly. These crawlers obtain user authorization without signing an agreement, and are suspected of infringing on the rights of intellectual property owners.” Zeng Xiang said.

Or the website rights should be clarified

This involves the ownership of the data and whether it can be opened.

Xu Hongtao, judge of the Intellectual Property Division of Shanghai Pudong District People’s Court, believes that data is the core competitive resource of the content industry, and the data collected and analyzed by content platforms often have extremely high economic value.

“If content platform operators are required to open up their core competitive resources to competitors indefinitely, it will not only violate the essence of the spirit of’interconnection’, but will also be detrimental to the continuous change of high-quality content and the sustainable development of the Internet industry.” Xu Hongtao said.

Behind the frequent cases of malicious crawlers crawling data is the increase in the value of data, and the market competition with data as the core is becoming more and more fierce.

Gao Fuping, a professor at East China University of Political Science and Law, said that in the era of big data, the value of data has once again been highlighted. The current crawler technology has moved from the original web crawler to the crawling of the underlying data. The data crawler problem will become more and more serious.

In the context of the era of big data, more and more market entities have invested huge sums of money in collecting, sorting and digging out information. Insiders in the industry are concerned about this: If web crawlers are allowed to use or use the data resources obtained by others through huge investment , Will not be conducive to encouraging business investment, industrial innovation and honest operation, and may even directly violate the wishes and right to know of the data source users, and will eventually damage a healthy competition mechanism.

Gao Fuping believes that if a website legally accumulates data resources, then these data resources should belong to the website’s assets. “Allowing data producers and controllers to open data for commercial purposes is beneficial. Through licensing, exchange transactions, etc., more people can enjoy data services. Looking forward to confirming the control of data by all legitimate data producers in the future , Use rights.”

Orderly circulation is equally important

At present, although websites can specify corresponding strategies or technical means to prevent crawlers from crawling data, crawlers also have more technical means to counter this anti-climbing strategy.

Liu Yuchen said that anti-climbing and crawling technologies have been iterating. In the technical field, there are no websites and apps that cannot be crawled. There are only questions about whether they are willing to crawl and how difficult it is to crawl.

It is understood that in reality, malicious web crawler makers often associate the robots protocol restricting crawling with data flow when defending. Xu Hongtao believes that in the context of “interconnection”, “order” and “circulation” are equally important and indispensable. It is necessary to exclude behaviors that hinder fair competition and endanger user data security under the guise of “interconnection”.

“For the legitimacy of non-search engine crawlers, it is necessary to consider whether it is sufficient to ensure the security of user data. User data, including identity data, behavior data, etc., is not only a competitive resource for operators, but also has The user’s personal privacy attributes, and the collection of such data is more related to the public interest of society.” Xu Hongtao said.

It is understood that in recent years, laws and regulations on data security are constantly being improved. As the basic law of data security, the data security law carries an important task of solving the core system framework of data security in our country. In addition, there is also the cryptographic law passed in 2019, the Ministry of Industry and Information Technology plans to issue the “Industry and Information Field Data Security Management Measures (Trial)”, etc. Some places such as Shenzhen and Shanghai are also exploring the formulation of data management related regulations.

The legality of crawler technology application causes controversy, and it is urgent to regulate illegal data crawling

Share this:

Related

Why does NATO play up China’s “strategic threat” with a major change in its China strategy? _Xinhua Newspaper Net

Confcommercio: ‘5.3 billion euros of consumption at risk’

You may also like

Leave a Comment Cancel Reply