Australia's Ultimate Marketing Technology Almanac Oct 2015 | Page 10

tools out there to make pulling data from various types of APIs into a database a simplistic and low-tech process. She argued that an over-reliance on Twitter as a “big data” source could lead to mistakes where researchers over generalised from the data. To make use of the data, you need to have analysis software that allows at the very least some advanced statistical analysis as well as data codifying or manipulation. Ideally, text mining applications can take a lot of the leg work out of social analysis, but as mentioned previously, a top class analyst can easily derive a lot of insight without  ‘point and click’  tools (and is likely to prefer to build their own solution anyway).  “For example, many studies of influence use Twitter as an example and conflate ‘Retweeting’ with influencing. A user, however, may retweet for a variety of reasons besides being influenced, including to make fun of or disagree with a tweet. But it pays to be cautious of analysis tools that claim to be able to do full text sentiment analysis without a technical user or training/classification process. In reality this is a fairly complex and subtle analysis and should always in the first instance be done with human guidance.  A word of caution Like all data, social data comes with its own compromises and limitations. A 2013 study by Princeton University’s Centre for Information Technology (CITP) called “Big Data, the pitfalls, methods and concepts for an emergent field,” by author Zeynep Tufekci, covered this in detail. Tufekci a University of North Carolina professor and CITP fellow told Which-50 at the time that too many researchers treated Twitter as a “model organism” – something akin to the fruitfly in biology.  (You can read more in MIT Sloan Management Review’s coverage of the paper) At the time the paper was released it generated some heat in the social research community. Tufekci said “Twitter and all platforms have specific affordances – behaviours they reward and behaviours they discourage at the level of infrastructure – as well as site specific social norms that emerge overtime.” 010 “Twitter’s specific affordances that make retweeting easy, as well as social norms (ie retweeting as a common behaviour) can lead people to overestimate the level of influence on social networks. “It is a bit like studying fruit flies and then generalising to larger creatures – you cannot. Fruit flies were chosen (as model organisms) because they were small and fit easily into the laboratory.” As was also reported at the time, Tufekci said the reason Twitter is used disproportionately in large scale big data research, especially those projects involving millions or billions of data points, was not always related to its efficacy as a source of  data for accurate analysis. Instead, she argued it was more about the Twitter data’s availability, tools availability and popularity, and ease of analysis. She noted that while Facebook was (and remains) the largest social media platform, there is less truly public data on Facebook “and thus Facebook is less accessible by scraping or via Facebook’s API as many more Facebook users (estimated to be more than 50 per cent) have taken their profiles private compared with Twitter users (estimated to be less than 10 per cent).” She also argued that the Twitter stream was relatively easy to access through widely available and popular methods (the Twitter Firehose, the spritzer, white-listed accounts, etc) comp