By: Evan Jones
I’m sure many of you reading this are involved in social media in one way or another. Given its importance within our society, it can sometimes feel impossible to avoid, especially as digital connection becomes an important tool to create new lines of communication between parties. Since social media has become increasingly used throughout daily life for the majority of the average population, data from social media is analyzed for numerous reasons that help one understand the content that drives more user acceptance. Social media analytics—the process of collecting and analyzing data from social networks, like Twitter, Facebook, or Instagram—can be used to evaluate marketing strategies, assess social media or business/firm channel development, analyze trends, get a better understanding of a client, or to give one a competitive edge.
Within the social media analytics process, there are four definite steps: data discovery, data collection, data preparation, and data analysis. While there is a substantial amount of literature that explains the possible problems or obstacles regarding different methods of the data analysis process, Stefan Stieglitz in his journal article Social Media Analytics — Challenges in Topic Discovery, Data Collection, and Data Preparation points out that “there hardly exists research on the stages of data discovery, collection, and preparation” (Stieglitz, 2018). Because social media as a means to communicate has grown wildly within the past decade, new opportunities and ways to use social media platforms have grown as well as new patterns of communication—meaning more ways to use, and more of a need to use, social media analytics.
One example that Stieglitz uses in his article is how two professors from Cornell University, Scott Golder and Michael Walton Macy, were able to analyze data found from Twitter to study the changes in peoples’ moods over different expanses of time from day to day, week to week, and throughout different seasons—all because of data analytics collected from Twitter. Projects like Golder’s and Macy’s are common in many different ways to analyze social media data. The existing research papers [regarding or using social media analytics] are isolated case studies that collect a large dataset during a specific time frame on a specific subject and analyze it quantitatively. (Stieglitz, 2018). The processes needed in order to get data and to turn it into useful information are usually like this, and why the “Social Media Analytics” field’s purpose is to “combine, extend, and adapt methods” for the goal of analyzing social media data.

Despite all of this, general models and approaches when it comes to analyzing social media data still come up short… To fight this, Sinan Aral, Chrysanthos Dellarocas, and David Godes came up with a basic framework with the purpose of organizing social media research and “to stimulate innovative investigations of the relationship between social media and business transformation” (Aral, 2013). Along with the framework, others were created in order to fill gaps in the general models and approaches. Wietske van Osch, Professor of Economics at the University of Amsterdam, and Constantinos K. Coursaris, Professor at Michigan State University, came up with a framework, as well as a research agenda, that is limited to only using organizational social media. While both frameworks were created in order to classify different areas of social media analytics research, they don’t actually help or instruct readers and researchers how exactly to collect the social media data to be analyzed—which is where Stieglitz points out challenges within this process will come to happen.
So, the real question is; what are the challenges that we face during the data discovery, data collection, and data preparation stages of the social media analytics process? When drawing on literature regarding “big data”—meaning data that has more complex data sets in a larger variety—researchers argue that many different characteristics are shared between social media data and big data. What sets big data apart from traditional analytics however are four main factors (or the four “V’s”) that were created by IBM data scientists:
- Volume, the storage space required for the data
- Velocity, the time it takes for the data to be created
- Variety, the aspect of data being in different forms
- Veracity, the uncertainty that comes with what the quality of the data is.
A fifth factor is sometimes used which is value, referring to “financial benefits generated by big data for an organization” (Stieglitz, 2018). The four V’s parallel only to technical difficulties that happen immediately when collecting data, so using big data as a comparison to social media data is not the only, or best solution.

Consumers and businesses alike are always continuing to find ways (new and existing) of collecting meaningful information from social media, but also sometimes have to face (new and existing) challenges when collecting this type of data. Because social media analytics is still a fairly new area of research, and because social media is constantly growing in our age of technology, many questions have still not been answered fully and researchers are constantly looking for new and improved—and simpler—ways to collect social media data.
References
Aral, Sinan, et al. “Introduction to the Special Issue—Social Media and Business Transformation: A Framework for Research.” Information Systems Research, vol. 24, no. 1, 2013, pp. 3–13., doi:10.1287/isre.1120.0470.
DSX Hub. “The Four V’s of Big DATA Explained by IBM.” DSX Hub, 15 Oct. 2020, http://www.dsxhub.org/infographie-the-four-vs-of-big-data-explained-by-ibm/.
Stieglitz, Stefan, et al. “Social Media Analytics – Challenges in Topic Discovery, Data Collection, and Data Preparation.” International Journal of Information Management, vol. 39, 2018, pp. 156–168., doi:10.1016/j.ijinfomgt.2017.12.002.