USA/UK English

What is web scraping and what are its ethical implications?

For some years now, a Digital Marketing tool has been gaining importance in the environment and many defend it as being indispensable for carrying out a really effective work. We’re talking about web scraping.

It’s quite likely that if you’ve come this far, you want to know more about it, if it’s all they say and how this practice can contribute to improving your results.

We will address all this, but we will also make some reservations about its use and the respective reasons.

Therefore, if you are interested in the subject, read carefully the that we prepared about it!

What is web scraping?

This is one of the foreign words that has occupied a prominent place when it comes to obtaining information to support the work of Digital Marketing.

The term scrap as a verb means to scrape. Scrapper, in turn, is the noun that gives name to spatulas in general and which, in turn, serve to scrape something off and leave nothing left. By adding scraping to the term web, I mean the act of scraping the Internet and getting everything possible.

As the Internet – or Web, if you prefer – is made up of data, web scraping is the maximum possible collection of data available on the world wide web.

Once in possession of this data, those who performed the “scraping” or collection can use them for the most diverse purposes, tabulating them, organizing them and generating information from them.

Despite the nomenclature being used more frequently as part of Digital Marketing actions, technically and as a practice, even if in a more rudimentary way , it has been around since the first World Wide Web sites, when the first internet robots were created to assemble e-mail databases for sending SPAM.

In the past, it was common for many contact pages of the most different sites, did not have a contact form. Instead, e-mail addresses were entered to deal with different matters, such as commercial, shopping and customer service.

It was easy for a bot to scan the web, identify this type of data and build a base with millions of e-mail addresses, to later be sold and used to send SPAM.

Therefore, web scraping is not exactly a new thing, it just got a name and a purpose apparently more “noble” and justifiable by certain interests.

How is web scraping done?

In essence the principle is still the same of email address gathering tools, except that it has become more sophisticated and powerful.

There are numerous tools aimed at web scraping today and the biggest of them promising – and delivering – the task of obtaining a high amount and variety of data / information and with a high degree of personalization in relation to what can be achieved.

It is easy to find videos demonstrating how to configure them and after a few minutes, get a complete database in terms of product list, prices, features and even quantities of a large e-commerce site, almost identical to the one that has the owner of this same site!

What the scraping robot does, is basically reproduce what a human user does, but unlike this one that can take hours or even days to view and copy all the data from the hypothetical e-commerce site, is an automated process, extremely faster and more reliable in terms of the accuracy of the copied data.

Services of this type, proliferate in droves and find company sites and scraping tools from the web”, it is very easy.

The possibilities are numerous and apply to different types of sites, such as social networks, for example, extracting and compiling user data that have been configured as public. Or even private or restricted ones, if there is and a system failure is identified.

There is even a lot of material, in the form of tutorials and tips on websites and forums, that teach how to create a web tool scraping, using the Python language, which has proved to be a good option for this purpose and sometimes, there are even tricks to circumvent the weakest security of some sites and obtain data that is not public.

What is the importance of web scraping for Digital Marketing?

Once you have understood what it is, it is easy to see that, in fact, with a good universe of information, decisions now have a solid foundation, instead of relying on “guessing”, which is essential for effective actions.

Since web scraping tools became popular, professionals in the field now have information that previously could only be obtained through extensive and often expensive research.

From the example d electronic commerce, but which has already become a reality, a company in the segment is able to verify how the prices charged by its competitors are, the delivery times, the freight and even the quantities they have in stock and from there, see how it is in relation to them in these aspects.

It is possible to discover the most commented subjects and trends in a certain social network, the movements in a certain economic segment, behaviors of certain consumer audiences and whatever else you can imagine, as long as it is known from which sites the data should be obtained.

But if you have also managed to glimpse it, there is a “dark side of the force” in it.

As it was when it didn’t have the name we use today, web scraping can – and often is – can be used for at least questionable causes.

Data leaks often occur by a tool that does exactly the same thing, that is, scans the network in search of data and in particular, those that shouldn’t be public, but due to some flaw, are.

The subtle difference can only reside in who is behind the tool, a hacker or a Marketing professional.

Situations like this lead us to think about it and consider its practice within some real and worrying scenarios.

Web scraping is illegal ?

Depending on how it is done, that is, what data is scraped and for what purpose, yes web scraping will be considered illegal.

Suppose that the data obtained are personal data, which according to the LGPD must happen under very clear and specific circumstances. Will the users to whom the data refer know for what purposes they will be used? Can they request their anonymization? Do you even know that your data is in the possession of someone or a company different from the one for which it was originally provided?

In this case, even if the data is public and was collected on a social network, the user to whom they are associated, has consented to provide them exclusively within the scope of the network and for the proper uses, under the terms of its service provision and its privacy policy.

Not given explicit consent for use outside it, nor by third parties.

Even when it is not personal data and under the regulation of the LGPD, there are moral, ethical, professional and marketing implications, that need to be observed:

  • Digital heritage – collect data on different sites, depending on what data is collected, for what purpose , means appropriating the digital heritage constituted by the efforts and resources of third parties. Especially when you make a profit from such a practice, which would not occur without web scraping and depending on the circumstances, you may have a situation of infringement of intellectual property;
  • Local legislation – in Brazil there is still very little legislation regarding the digital environment, however, depending on the country of origin of the website, it may be infringing local laws by which the website is protected ;
  • Terms of service – in addition to the privacy policy that every website that deals with third-party data must have , there are times when there are terms of services that determine the scope of use of the information contained, as well as the provision of associated services. By circumventing or ignoring such terms, one may be committing a crime provided for by law;
  • Robots.txt file – in addition to the privacy policy and terms of service, by convention, out of ethical and professional commitment, out of good tone and respect, the web scraping bot must act based on the robots.txt file of the sites it scans;
  • Robot name – a direct result of the above item, it is convenient and consists of good practice, that the scrapper has a known identification , as well as the most popular bots, such as Googlebot or Bingbot;
  • Performance – depending on the site scanned and its technical characteristics, the volume of data and which are “scraped”, web scraping can work as an attack or invasion attempt, and can even harm the target site, affecting its performance;
  • Copyright – as in the case of personal data and sensitive data that can be classified classified as intellectual property, there are still those that are protected by copyright laws and that likewise cannot be collected or used without proper permissions.

It is important to emphasize that our objective is not to condemn the practice of web scraping, but to start a discussion so that it occurs in a responsible and professional way, so that it is not another harmful attitude that jeopardizes the future of the Internet, but on the contrary, how to make use of technology in the service of the collective good.

The owner of the website from which the data were collected cannot be held responsible for not having adopted the necessary security measures to avoid their exposure. Otherwise, it’s the same as taking someone’s car and claiming it was their right, since the owner forgot the vehicle open and with the key in the contact.

As users, too It is up to us to be careful and reflect on the information we make available on the web, such as not leaving public profiles on social networks, being fully aware of Internet privacy issues and their implications.

It is necessary to understand that reading privacy policies and terms of service provision of the many sites we frequent, is a habit that most do not have and that neglect its importance, but it is a behavior that needs to be changed.

Conclusion

Web scraping has played an important role in Digital Marketing work, but its indiscriminate application can have legal and ethical implications.

Mostrar mais

Artigos relacionados

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Botão Voltar ao topo