Mixed Methods Data Science: Qualitative Sensibilities

Timmy Chan
8 min readFeb 26, 2022

--

The research and design cycle always begins with some general questions about a topic, and a literature review on the topic before further actions. Self directed learning can be treated as a type of research. Thus, I was driven by habit to begin with a survey of existing literature. When beginning this article, I set out to search for scholarly discourse on what Data Scientists should learn. This led to a collection of references of good textbooks using Brown and Case Western Universities’ curricula.

Literature Review: Data Science as a Discipline

Figure 1. A comparison of multidisciplinary, interdisciplinary, and transdisciplinary approaches to innovation (McPhee & Bliemel, 2018)

First, we should operationally define Data Science as beyond interdisciplinary. At the forefront of research and learning such as the TRIPODs initiative, data science is defined to be transdisciplinary (Evans, 2014; Mahoney, 2019), and come from the world of Big Data.

Here we borrow an operational definition of Big Data: “In summary, scholars believe that the defining characteristics of big data are the four Vs: volume, variety, velocity, and veracity.” (Tang & Sae-Lim, 2016)

In particular, let’s focus on the words variety (potential for data arising from social contexts) and veracity (interpretation of datasets using particular framings — how does one ask better questions) in this article.

Ethics and Data Science: Current Research Discourse

At first glance in the newsfeeds on Data Science, the field may seem to be solely quantitative and technical in nature, especially since the theoretical side was inspired by a synthesis of questions and approaches from computer science, statistics, and mathematics.

What happens when Data comes with Human Context, Perspective and Bias?

Consider the operational definition of Big Data, especially “variety”. Data may come from existing discourse, which carry with it contexts and unavoidable biases. How do we approach data that is from a social context, without mentioning social science qualitative research considerations? See this article for another exploration.

“Where data scientists, who view themselves simply as socially disembodied, quantitative analysts, engineers, or code-churners go wrong is that they are insufficiently attentive to the commitments and values that undergird the integrity of their knowledge practices and the ethical permissibility of the projects, enterprises, and use-contexts in which they involve themselves.” (Leslie, 2021)

The easiest scenario is natural sciences —there is no subjectivity in the more “objective” measurements, such as mass of a bundle of steel rods, or the density of nitrogen at STP. Once the choice to measure a physical quantity is decided, and the method of measurement is decided, the expectation is regardless of who does the measuring, the data would be the same. However, this powerful “objectivity” assumption does NOT transfer from natural to social sciences.

Mixed Methods: Pragmatic Approach to Ethical Concerns

The current ethical debates in mass media can be traced to the lack of discussion about qualitative research training and its importance in data Science. I recall one of my mentors in the Disabilities Studies department reminded us in a qualitative research methods course —the researcher is a tool, through which data is interpreted. Each data set, due to collection by a human on human subjects, is a limited snapshot of a particular phenomena, and data can carry bias, beginning with the question and selection process.

From the questions one chooses to ask, to how we choose to code, analyze, generate conclusions and conjecture theories from data, all research actions are influenced by our positionality as humans first.

Examining scholarly literature from outside of pure STEM, by the late 20th century scholars have shown

“supposedly neutral or impartial norms have built-in biases that limit their putatively universal character with respect to race, gender, and disability” (Bohman et al., 2021; Mills, 1997; Minow, 1991; Young, 2002)

This discourse speaks to the parallels in ethical questions in the forefronts of Data Science research now. Conversations on topics of ethics and social impact are beginning to take shape. Kitchin (2014) contended that “a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology.” This clearly hints at the importance of qualitative sensibilities in knowledge generation in the field of data science. Seven years later, this was Sabina Leonelli wrote a rallying cry (2021), for data scientists experts and educators to

“abandon the myth of neutrality that is attached to a purely technocratic understanding of what data science is as a field — a view that depicts data science as the blind churning of numbers and code, devoid of commitments or values except for the aspiration toward increasingly automated reasoning.”

As Data Science continues to empower organizations with predictive models,

“data revolution has also enabled applications that violate expectations of consent, compromise public discourse, perpetuate discredited social theories , sow confusion among decision makers, and adversely impact minority populations.”

Diversity in interpretations of data begins with who gets to ask the questions, what questions are asked, who gets to do the research, and how data is gathered, from whom, in what historical/social/economics context? What of power dynamics, how do all these influence veracity? These questions require a modern data scientist to consider a post-modernist or pragmatic epistemological approach to research, borrowing from social sciences. To quote a recent article from the Harvard Data Science Review:

“quantitative and qualitative approaches should be seen as complementary, mutually reinforcing, and co-constitutive of data science when applied to the production of social knowledge.” (Tanweer et al., 2021)

The value of pragmatic use of mixed methods approaches is not new, and is prominent in many transdisciplinary fields such as Learning Sciences, Health Sciences, and Medical Research (Regnault et al., 2018; Warfa, 2016). When one considers data that come from discourse between humans (such as social media metrics and post content), then we hit a limitation on “objectivity” due to contradictory interpretations and perspectives coexisting. This leads us to need to examine some practices from the qualitative training path.

Quantitative Sensibilities for Data Science

So much of Data Science continues to impact social life, and the data is coming from a social context; utilizing qualitative sensibilities in understanding data is key to the ethical questions.

  • Interpretivism “theories about how we can gain knowledge of the world which loosely rely on interpreting or understanding the meanings that humans attach to their actions.(O’Reilly, 2009)
  • Abductive reasoningA mode of inference that updates and builds upon preexisting assumptions based on new observations in order to generate a novel explanation for a phenomenon,” in contrast to inductive reasoning like experiments or purely deductive reasoning like proofs.
  • Reflexivity “Examination of one’s own beliefs, judgments and practices during the research process and how these may have influenced the research. If positionality refers to what we know and believe then reflexivity is about what we do with this knowledge. Reflexivity involves questioning one’s own taken for granted assumptions.” (Reflexivity, n.d.)
  • Member Checking Confirming the analysis and conclusions with the participants, to directly seek feedback from the community can help ensure trustworthiness in the data science work.

Assessing Storytelling

Borrowing from Qualitative Research traditions, data science projects and research quality can be evaluated using four additional rubrics for trustworthiness (Elo et al., 2014):

  1. Credibility: does the data scientist present an interpretation of an experience in such a way that people sharing that experience immediately recognize it?
  2. Transferability: The ability to transfer findings from one group to another (similar to generalizability in STEM).
  3. Dependability: Can another researcher follow your decision trail? (Example: Why did you choose a particular coding scheme for a discourse?)
  4. Confirmability: Once (1), (2) and (3) are achieved.

Next Steps

While originally I set out to write on a learning trajectory to bridge pure math to data science, the literature review led me to make clear ethical concerns in the field and proposed some starting points by expanding beyond STEM standard academic training, to include in depth discussions on qualitative lenses on data that arises from human interactions.

Next article topic will cover the common core constructs found in data science curriculum.

Hindsight: My time spent in Learning Sciences opened my eyes to mixed-methods approaches. Seeing rigorous social sciences research made explicit the limitations of purely quantitative research methods, as well as contrast inductive, deductive and abductive reasoning in various fields.

Timmy Chan is a mathematician actively seeking a data scientist role.

LinkedInTwitterFacebook

Works Cited

Al Sarkhi, A., & Talburt, J. (2019). The Journal of Computing Sciences in Colleges Papers of the 17th Annual CCSC Mid-South Conference. https://doi.org/10.13140/RG.2.2.29810.12481

Bohman, J., Flynn, J., & Celikates, R. (2021). Critical Theory. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2021). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/spr2021/entries/critical-theory/

De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., … Ye, P. (2017). Curriculum Guidelines for Undergraduate Programs in Data Science. Annual Review of Statistics and Its Application, 4(1), 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930

Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Becker, J., Heutelbeck, D., Hemmje, M., & Brewer, S. (2016). EDISON Data Science Framework: A Foundation for Building Data Science Profession for Research and Industry. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 620–626. https://doi.org/10.1109/CloudCom.2016.0107

Elo, S., Kääriäinen, M., Kanste, O., Pölkki, T., Utriainen, K., & Kyngäs, H. (2014). Qualitative Content Analysis: A Focus on Trustworthiness. SAGE Open, 4(1), 2158244014522633. https://doi.org/10.1177/2158244014522633

Evans, J. (2014, July 29). What is Transdisciplinarity? — Purdue Polytechnic Institute. https://polytechnic.purdue.edu/blog/what-transdisciplinarity

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 205395171452848. https://doi.org/10.1177/2053951714528481

Leonelli, S. (2021). Data Science in Times of Pan(dem)ic. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbb1bdd6

Leslie, D. (2021). The Arc of the Data Scientific Universe. Harvard Data Science Review. https://doi.org/10.1162/99608f92.938a18d7

Leung, L. (2015). Validity, reliability, and generalizability in qualitative research. Journal of Family Medicine and Primary Care, 4(3), 324. https://doi.org/10.4103/2249-4863.161306

Mahoney, M. W. (2019). The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science. ArXiv:1909.03033 [Cs]. http://arxiv.org/abs/1909.03033

McPhee, C., & Bliemel, M. (2018). Editorial: Transdisciplinary Innovation. Technology Innovation Management Review, 8(8), 5.

Meng, X.-L. (2021). Data Science: A Happy Marriage of Quantitative and Qualitative Thinking? Harvard Data Science Review. https://doi.org/10.1162/99608f92.cee621a9

Mills, C. W. (1997). The racial contract. Cornell University Press.

Minow, M. (1991). Making All the Difference. https://www.cornellpress.cornell.edu/book/9780801499777/making-all-the-difference/

Nolan, D., & Stoudt, S. (2021). The Promise of Portfolios: Training Modern Data Scientists. Harvard Data Science Review. https://doi.org/10.1162/99608f92.3c097160

O’Reilly, K. (2009). Interpretivism. In Key Concepts in Ethnography (pp. 119–124). SAGE Publications Ltd. https://doi.org/10.4135/9781446268308

Reflexivity. (n.d.). Retrieved February 25, 2022, from https://warwick.ac.uk/fac/soc/ces/research/current/socialtheory/maps/reflexivity/

Regnault, A., Willgoss, T., & Barbic, S. (2018). Towards the use of mixed methods inquiry as best practice in health outcomes research. Journal of Patient-Reported Outcomes, 2(1), 19. https://doi.org/10.1186/s41687-018-0043-8

Salloum, M., Jeske, D., Ma, W., Papalexakis, V., Shelton, C., Tsotras, V., & Zhou, S. (2021). Developing an Interdisciplinary Data Science Program. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, 509–515. https://doi.org/10.1145/3408877.3432454

Tang, R., & Sae-Lim, W. (2016). Data science programs in U.S. higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 32(3), 269–290. https://doi.org/10.3233/EFI-160977

Tanweer, A., Gade, E. K., Krafft, P. M., & Dreier, S. K. (2021). Why the Data Revolution Needs Qualitative Thinking. Harvard Data Science Review. https://doi.org/10.1162/99608f92.eee0b0da

Warfa, A.-R. M. (2016). Mixed-Methods Design in Biology Education Research: Approach and Uses. CBE — Life Sciences Education, 15(4), rm5. https://doi.org/10.1187/cbe.16-01-0022

Young, I. M. (2002). Inclusion and Democracy. Oxford University Press.

--

--

Timmy Chan

Professional Software Engineer, Master Mathematician interested in learning and implementing multidisciplinary approaches to complex questions