Australia's Ultimate Marketing Technology Almanac Oct 2015 | Page 10
tools out there to make pulling data from various
types of APIs into a database a simplistic and
low-tech process.
She argued that an over-reliance on Twitter as a
“big data” source could lead to mistakes where
researchers over generalised from the data.
To make use of the data, you need to have analysis software that allows at the very least some
advanced statistical analysis as well as data
codifying or manipulation. Ideally, text mining
applications can take a lot of the leg work out of
social analysis, but as mentioned previously, a
top class analyst can easily derive a lot of insight
without ‘point and click’ tools (and is likely to
prefer to build their own solution anyway).
“For example, many studies of influence use Twitter
as an example and conflate ‘Retweeting’ with influencing. A user, however, may retweet for a variety
of reasons besides being influenced, including to
make fun of or disagree with a tweet.
But it pays to be cautious of analysis tools that
claim to be able to do full text sentiment analysis
without a technical user or training/classification process. In reality this is a fairly complex
and subtle analysis and should always in the
first instance be done with human guidance.
A word of caution
Like all data, social data comes with its own compromises and limitations. A 2013 study by Princeton University’s Centre for Information Technology
(CITP) called “Big Data, the pitfalls, methods and
concepts for an emergent field,” by author Zeynep
Tufekci, covered this in detail.
Tufekci a University of North Carolina professor
and CITP fellow told Which-50 at the time that
too many researchers treated Twitter as a “model
organism” – something akin to the fruitfly in
biology. (You can read more in MIT Sloan Management Review’s coverage of the paper)
At the time the paper was released it generated
some heat in the social research community.
Tufekci said “Twitter and all platforms have specific affordances – behaviours they reward and
behaviours they discourage at the level of infrastructure – as well as site specific social norms that
emerge overtime.”
010
“Twitter’s specific affordances that make retweeting
easy, as well as social norms (ie retweeting as a
common behaviour) can lead people to overestimate the level of influence on social networks.
“It is a bit like studying fruit flies and then generalising to larger creatures – you cannot. Fruit flies
were chosen (as model organisms) because they
were small and fit easily into the laboratory.”
As was also reported at the time, Tufekci said the
reason Twitter is used disproportionately in large
scale big data research, especially those projects
involving millions or billions of data points, was not
always related to its efficacy as a source of data for
accurate analysis. Instead, she argued it was more
about the Twitter data’s availability, tools availability and popularity, and ease of analysis.
She noted that while Facebook was (and remains)
the largest social media platform, there is less
truly public data on Facebook “and thus Facebook
is less accessible by scraping or via Facebook’s
API as many more Facebook users (estimated to
be more than 50 per cent) have taken their profiles
private compared with Twitter users (estimated to
be less than 10 per cent).”
She also argued that the Twitter stream was relatively easy to access through widely available and
popular methods (the Twitter Firehose, the spritzer,
white-listed accounts, etc) comp