In several instances, the data collected is for arguably beneficial purposes. All speech recognition systems today employ NLP (Natural Language Processing) techniques, which requires data for improving system accuracy. Snippets of voice samples are utilised by all such systems for improvements, benefiting the particular user as well as all the other users. Google’s Location service uses crowd-sourced data for improving location detection, as well as product-specific features such as traffic analysis in Maps.
The issue lies in the matter of the collection of the data. Generally, the user is not aware of the collection and usage of this data, which has important repercussions, not understood till it’s too late. Web profiling can be used to build an eerily complete profile, without any indication to the user. Social media is one large nexus of data mining; information collected about the user is one important metric for success. This information is not always strictly guarded; after all, this information is easily monetizable for advertising and other related fields. Thus, it becomes feasible to glean a lot of information about a person with little to no efforts.
Digitising information in the fields of medicine, finance, legal/governmental records opens up various vectors to misuse such data. Till all the latest security specifications are implemented (not always true for government agencies), the systems remain vulnerable to exploitation, leading to instances of data theft and subsequent misuse. There is a need for strong laws regulating data storage and collection to prevent too much personal monitoring. Consumer awareness, implementation of strong security policies, tightening of social media security norms, etc. are important steps to take to ensure the emergence of a non-invasive/intrusive digital future.