Some ways my education + experience colored my approach to data science

When I applied for Chemical Engineering as my major before freshman year, I was quite desperate. In high school, I had immense difficulty finding my “niche”, which didn’t suddenly stop senior year, so ChemEng seemed like the perfect combo of my strengths. I was good at math, loved chemistry, and tolerated physics.

Also, it sounded cool at the time.

I came in to my first semester with minimal familiarity with the discipline besides the coursework, so progressing through the rigorous curriculum did, in fact, inform me, which was painful at times, but great news for the return on my student loan “investment”. Through picking up my Computer Science minor, opting in to summer research opportunities instead of co-ops, and later starting my first job after graduating, I developed a sharper sense of how that generalized undergrad knowledge applied to me, how it can specialized, and how I wanted it to inform my impact in projects, which turned out to be through data science.

The curriculum gave me the necessary background

This is the most clearcut parallel - my classes that taught what would become the basic concepts of data science, taught me the basic concepts of data science. My Computer Science classes gave the classic undergrad “learning to code” starter pack of data structures and algorithms + object-oriented programming, and some ChemE required classes like Calculus and Statistics gave the math background needed for machine learning.

But my ChemE classes, particularly the (alleged) weed-out engineering computing classes (HTML, CSS, Linux, MATLAB + C++), introduced solving problems and completing analysis with code. I had dabbled in R and HTML/CSS in high school, but this was my first exposure to using code as a tool and the demands of knowing a resource well enough to bend it to your will and solve problems. I loved the class, but it didn’t love me back at first. It was a firm love/hate relationship that I indulged by later becoming a teaching assistant for that class during my last few semesters of undergrad. The best way to slay your enemies is to become friends.

In industry, data science was a means to an end, not the focus

The research I did in undergrad and the basic data analysis it entailed + the aforementioned classes started to come together in my first position out of school. During recruitment, I was debriefed with the prospect of working with color imaging data (which spoiler, I only managed in my next role), but my job turned out to be actually making paint. Most impressive bait and switch of the century. It was made emphatically clear that I was supposed to mix up the formulas that someone else made, but there was so much to be done with our data! So that’s when my coding side projects started.

The company had a scientific coding group centered around how chemists and formulators in other labs were solving problems in their projects using computational solutions using Python. As a new hire, the monthly meetings of sharing code, project presentations, and workshops were my first exposure to the subject matter, coatings formulation and chemistry, and coding being used directly for research and product discovery. It was also a “forced” deep-dive into Python, which I had despised due to the indentation and lack of Java-like “punctuation” - had to get over it. Even the transition from research in academia to industry was a learning curve in term of experimental priorities and communication.

The perspective I had of coding from undergrad to industry morphed from “fun and quizzical” to “utility and purposeful” (but still fun). Using packages was fun to learn how to use (Anaconda, less so), but exposure to new tools led to me thinking “where can I apply this”? So when not at the bench mixing paint, I worked on side projects using new and historical experiment data, like displaying run results in Matplotlib instead of Excel, quantifying sample corrosion with OpenCV instead of measuring by hand, and automating repetitive data entry with PyAutoGUI. Not much resulted by way of widespread adoption, but through this practice, I adjusted to the idea of data science as a tool for operational impact, not the main event.

Problems are everywhere, so products are, too

One of the most profound things I absorbed in my ChemE curriculum was in my product design classes where I learned that the purpose of invention is to fill “gaps”, this “gap” being the distance between what people have and what people need. Consumers are “people”, but engineers and colleagues are “people” too (no matter what the others say)! The people that you collaborate with can also benefit from the hodge-podge solutions you make when they’re “product-ized” with clean code + clear documentation. I can try offering my hodge-podge solutions to people, sure, but people experience and circumvent gaps all the time. If it’s not easier than circumventing the “gap”, it won’t be used, period. Not their fault, it’s just not the right solution! If you’re able to offer them a solution that is easier than circumventing, then voila- you’re adding value. So there’s always an opportunity for your product to solve problems other than the one in front of you.

This section started small and ended up extensive, as all the best things do, so maybe I’ll dive deeper in a later post.

Wrap it up

As a young adult in university, my job was to pursue the things I was interested in, but it was figuring out how to make them come together in a career path that worried me at first. Data analysis was the first thing that allowed me to combine my engineering degree, CS minor, and research experiences seamlessly and with purpose + impact. So not only did I find a nice niche, but it enables my habit of pursuing new interests regardless of relevance or cool factor. I do it for the storyline!