Intelligent Issue 23 - Page 36


Simply said , public web scraping gives you the same level of Internet transparency as the typical user . To ensure that your data collection is being done ethically , there are clear hazards and important standards you must fulfil . These standards are an absolute need that all operators must follow , without exception . They are neither optional nor a ‘ good to have ’ addition to your company policy .
Myth # 4 : Information sources are mostly private
Totally false : the majority of web-based data is public . Internet growth statistics from Statista show that 4.66 billion people are using the Internet ( as of January 2021 ).
That ’ s close to 60 % of the world ' s population . Considering that most of the world ’ s data has been generated within the last two years alone , it is estimated that close to 70 % of the data being generated is public ( out of which , humans are responsible for close to 60 % of that generated data ). Although these statistics only give us a rough indication , the trend is clear to see .
When it comes to web scraping providers can only gather the information that is open to the public . To further simplify this , that means anything that you or I could access using a standard browser on the Internet without logging in . The data is off bounds if you have to log in , simple .
Myth # 5 : Online data collection is only carried out by " dodgy " businesses
Wrong ! Companies of all sizes , from Fortune 500 firms to start-ups and SMEs , gather and utilise public web data to inform their decision-making . The only difference is in the type of data they require and how frequently they need it . In today ’ s real-time economy , companies can ' t thrive without being able to see the full market reality , and to do that they need access to the largest data source . When our reality is mostly led by digital innovation , it is no surprise that public web data has become the ‘ no-brainer ’ solution .
As the CTO of a market leader in the data collection domain , you might think it is a given that I am fighting for this corner . However , for this industry to succeed , we must be our own harshest critics and ensure that we and others looking to collect data aren ’ t tempted to engage in illegal or unethical activities in lieu of strict regulations .
With any emerging technology , especially within the data space , there is always going to be an analysis that explores its purpose and legalities . However , there is a cause for the greater good , allowing businesses to prosper from the latest , publicly available online insights . When analysing data collection , it ' s important to understand what is being collected and how it is being processed .
With so many leading brands dependent on data insights , this will become a fastgrowth industry , and it ' s up to everyone in this community to promote legal and ethical compliance , if anything , it ’ s our moral duty to do so . �
36 intelligent
. tech
Intelligent SME . tech