From Pure Math to Data Science
Transitioning from pure theory generation using deductive logic to answering real-world questions by gathering and making sense of empirical evidence through data science is a journey. Here is a perspective from a professional scholar pivoting to industry, choosing to move from Math to Data Science.
Origins: Academic Researcher
When I began graduate school to study mathematics, like many others who were motivated by the pursuit of knowledge, my initial long-term goal was to generate foundational research as a tenured faculty at a public university. I was driven by an insatiable curiosity, and an innate desire to contribute to community and society through my work. Academia, a place for learning and mentorship, seemed a natural fit.
To me, the scholarship and fellowship applications were practice for grant writing; the graduate teaching assistantships and mentoring were scaffolding for eventually becoming an advisor; writing proofs and rigorous arguments was practice for generating high quality research; designing instruments to gather data, creating visuals and generating narratives for presentations paired with supporting claims prepared through meticulous literature review — the list goes on. Every action as a professional scholar was toward the ultimate goal of tenure track faculty position. Fueled by my lifelong love of proof and refutation, I stayed in academia far beyond a bachelor’s degree to pursue this dream of eventually taking on a permanent role in academic research.
Given that I am a first generation immigrant from a working class family, completing the ascend to the pinnacle of the ivory tower was far from certain.
When it came time for me to choose my thesis topic, I asked the math department to give me a project that would involve programming. "Just in case," I noted to myself. Luckily, my advisor gave me the opportunity to test a conjecture from cutting edge math research using Python. While I was working on my thesis, some of my peers mused that I was doing computer science on top of my proof. With this, I learned how to use GitHub, and automated much of the data visualization using python and latex.
After three years of teaching, presenting, coding, and proof based math research (which I genuinely and thoroughly enjoyed,) I defended my thesis and earned my master’s degree. At this point, I decided to apply my mathematical mastery towards academic research in adjacent fields.
Hindsight: these contingency plans were my first step towards building and demonstrating hard skills relevant to data science.
Post-Graduate: the Climb Continues
By a stroke of luck, I received a scholarship which also funded applications to PhD programs. With nothing to lose, I let the universe decide my path by applying to fifteen different programs in five distinct fields in which I was interested in using mathematics, and I accepted a University Fellowship to conduct research in Learning Sciences, where I focused on learning mathematics in virtual learning environments (VR/AR) combined with applied universal design for learning.
During the initial year, I explored different epistemologies, and contrasted axiological claims and ontological arguments of various disciplines. Pondering the philosophical underpinnings of mathematics and other fields leads to a more precise articulation of uniqueness of mathematical disciplinary literacy and discourse.
"[T]he methods of investigation of mathematics differ markedly from the methods of investigation in the natural sciences. Whereas the latter acquire general knowledge using inductive methods, mathematical knowledge appears to be acquired in a different way: by deduction from basic principles." (Horsten, 2022)
Mathematics is an awesome art — beginning with definitions and axioms (fundamental assumptions like a line can be drawn from a point to any other point), one can prove irrefutable theorems with rules of deductive logic. Such absolute certainty is beautiful, and found nowhere else in academia. The creativity involved in weaving together these arguments together is arguably a form of poetry with the strictest ruleset. The contrast between mathematics and other sciences in terms of the deductive to inductive logic using evidence based claims (which allow for real world imperfections and approximations) highlights the epistemological leap one must make when pivoting from pure math research into any other academic field.
Hindsight: by connecting with scholars in different fields, I was able to explore mixed methods research to expand my toolset to include understanding and analyzing qualitative data. Clearly articulating and contrasting between math and other types of research is helpful in reading writings from other fields.
Soul Searching
2020-2021 was hard, for everyone. The pandemic altered the lives and trajectories for many, including mine.
The shift in circumstances led to a re-evaluation of my unquestioned commitment to an endless climb towards academic excellence and beyond. I longed to contribute to society and community, and have decided to do so through educating myself and students along the way. The climb towards the apex of the ivory tower required for me a spartan lifestyle. While academic research is fascinating, I needed funding to operate effectively as a human living and participating in society. Even with tuition waivers and a stipend, the funding from the student perspective works out to be approximately minimum wage, if one is living in a large city.
As the MIT Graduate Admissions blog honestly notes:
“Summing up, you would have to go through nine years (four years of postdoc + five years of AP) of job insecurity and low salary after graduate school. Assuming you graduate at the age of 28, you would be close to 40 when finally getting a stable job.”
Chief among the pressures of academia, besides the long grind, besides the barriers to access, besides the systemic issues around mental health in graduate school (Evans, et al. 2018), is this: if I stayed any longer, I cannot help my family, friends and community beyond volunteering physical labor and time.
Staying for a terminal degree means I would have to delay achieving that financial goal. Undoubtedly, desiring a sense of security and the immediate pressures of the pandemic shifted my priorities towards seeking other opportunities and forge a different path before considering returning in the future.
Reorientation and Refocus: Math to Data
In the fall of 2021, I decided to take a leave of absence from the program, to reorient myself and reflect. After spending much time introspecting and getting some much needed therapy, I highlighted reasons academic research is enjoyable, in search for a suitable role:
- An environment to learn by examining complex problems and designing effective and efficient ways to gather data to address questions, with rigor and trustworthiness as the priority.
- Chance to contribute to a community of practice (Lave and Wagner, 1991) and to learn from people from a broad range of disciplines.
- Opportunities to utilize mathematics in problem solving.
Why does Data Science suit Math Graduate Students?
Mastering Data Science is near to impossible. […] Being a mixture of many fields, Data Science stems from Statistics, Computer Science and Mathematics. It is far from possible to master each field and be equivalently expert in all of them. […] Therefore, it is an ever-changing, dynamic field that requires the person to keep learning the various avenues of Data Science. — Rinu Gour
- Mathematicians are driven to learn forever. Expositions on data science like the above is exciting for academic researchers — knowing that there will always be something new to learn is enticing, since the allure of academia is often the pursuit of knowledge.
- Academics like to research and share findings, and teach. With a quick search on Google for “data science writing” and we see the importance of writing in this field — each data scientist publishing and learning from one another, just like an academic community. (One could even argue that this public space does not have as much gatekeeping!)
- Data Science is applied math with much computer science.
If we are trained to write proofs in math on the theory underpinning data science, then we have already done the work to understand how to read the proofs and theorems that justify algorithms. This gives pure mathematicians an advantage when acquiring new computer science knowledge.
Transferrable Skills of Academic Mathematicians
Here’s how pure math academic researchers already likely know based on common parts of our training and would overlap with what Data Scientists do (See this great article by Sophia Yang on Towards Data Science), where we mostly just need to show that we can apply the theory in projects:
- Knowledge of advanced statistical techniques and concepts (regression, properties of distributions, statistical tests and proper usage, etc.)
- Experience using statistical computer languages (R, Python, SQL, etc.) to manipulate data and draw insights from large data sets. Most of us have some training in coding and experience with specialized packages like SAGEmath, numpy, etc. Of course this varies among mathematicians (some folks only do proofs with pencil and paper!)
- Excellent written and verbal communication skills for coordinating across teams. Graduate studies in math often require that we be instructors and work in teams. Also, many math departments require their graduate students to teach, which means rigorous evaluation and supervision of work, giving structured, timely feedback and more.
- A drive to learn and master new technologies and techniques. Graduate students demonstrate we are capable of mastering new constructs and generating research independently. A major part of this requires that we are able to independently learn and appropriately use new technologies, be they abstract constructs like theoretical frameworks or hard skills like Python.
- Develop custom data models and algorithms to apply to data sets. Understanding statistics and probability theory (Combinatorics and measure theory from analysis) gives us significant speed boosts when reading the texts, since the mathematics here relate simply to Analysis and Topology, with some higher-dimensional linear algebra, which is core to the training of a mathematician.
- Data visualization: the presentation of data in a pictorial or graphical format, where design and simplicity is art. This requires patience, practice to build craftsmanship; however, the challenging theoretical aspect is already in our training. The rest is practice.
Next Steps: Forever Learning and Applying Math
These are items mathematicians will likely need to define and practice (or demonstrate mastery) outside of a math degree:
- Mine and analyze data from company databases to drive optimization and improvement of product development, marketing techniques and business strategies. This type of experience requires internships or projects during graduate school specifically dedicated to these topics.
- Knowledge of a variety of machine learning techniques (clustering, decision tree learning, artificial neural networks, etc.) and their real-world advantages/drawbacks. The branches of artificial intelligence based on mathematical algorithms and automation requires practice. For recently trained mathematicians, the theories underpinning the algorithms are theorems in mathematical optimization, often explored as a topics course in graduate degrees and intersects neatly with linear algebra, analysis, and combinatorics.
In order to achieve this step, the consensus among writers and practitioners is to capture the learning process by doing data science projects on large public data sets, and to create a portfolio to demonstrate this growth in the form of a blog, which can also serve as a resource for others.
Academic pure mathematics researchers have the knowledge, in theory.
The key to pivoting from pure math to data science, thus, must be to demonstrate that our broad theory based knowledge can be useful in data science projects.
Timmy Chan is a mathematician actively seeking a data scientist role.
Works Cited
Evans, T. M., Bira, L., Gastelum, J. B., Weiss, L. T., & Vanderford, N. L. (2018). Evidence for a mental health crisis in graduate education. Nature Biotechnology, 36(3), 282–284. https://doi.org/10.1038/nbt.4089
Horsten, L. (2022). Philosophy of Mathematics. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2022). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/spr2022/entries/philosophy-mathematics/
Lave, J., & Wenger, E. (1991). Legitimate Peripheral Participation. In Situated Learning: Legitimate Peripheral Participation (pp. 12–43). Cambridge University Press; Cambridge Core. https://doi.org/10.1017/CBO9780511815355
To stay in academia or not, that is the question | MIT Graduate Admissions. (n.d.). Retrieved February 15, 2022, from https://gradadmissions.mit.edu/blog/stay-academia-or-not-question
What is a community of practice? (n.d.). Community of Practice. Retrieved February 15, 2022, from https://www.communityofpractice.ca/background/what-is-a-community-of-practice/
Woolston, C. (2018). Feeling overwhelmed by academia? You are not alone. Nature, 557(7703), 129–131. https://doi.org/10.1038/d41586-018-04998-1