Karthik Guruswamy

Data Scientist - Trial/Error and the Culture of Failing Fast

Blog Post created by Karthik Guruswamy Moderator on Sep 23, 2017

Data Science can be an adventure in every possible way - just ask your employee who has been to tasked to solve data science problems. Did you know that the whole zen of data science thrives on Trial/Error AND a culture of failing fast?

If we talk to people in an analytics department in a company, you are going to find people who approach business problems two ways:

  1. I will try stuff that I know and probably can produce a visual. I've done it before many times before and so it should work. However, I know I can try this new stuff, but I'm not sure what will come out of it. So I'd skip this crazy idea. Just by looking at the data I can tell the visual insight will suck or I will fail miserably.
  2. I wouldn't know the visual will look when I try this. I'm willing to go for it anyways! I want to see what happens and don't want to guess. The data doesn't look that interesting to me at the outset, but willing to create a visual just for the fun of it even if it comes out boring. What if it's something useful?

#2 approach is what makes up the Trial/Error and Fail Fast culture. Imagine putting a super fast iterative tool (Teradata Aster Analytics or Apache Spark) on the hands of the person who practices #2 above!

Trial/Error and Fail Fast culture doesn't mean data scientists are unwilling to try time tested methods. It only means they are willing to take a lot of 'quickfire' risks for better results!

Just a bit of luck and off to the next iteration of failing fast and keep building!

A bit more on Trial/Error and Fail Fast. What exactly is it?

Trial/Error and Fail Fast approach is trying things with very little expectations on the outcome. Of course, the business outcome should be the eventual goal. We are willing to try something quickly and expect not to be rewarded immediately. Also not giving up just because we failed to get an outcome that's interesting the first time. 9 out 10 times, we are fumbling, but willing to get lucky once without giving up- which often proves to most valuable and actually works. Most successful data scientists will choose a fail fast tool for their pursuit for doing trial and error. The more we allow ourselves to fail quickly, the sooner we are going to stumble into something incredibly useful.

© Can Stock Photo / leowolfert

Causality Detection vs Quantitative Modeling

From a 10K feet point of view, most data science problems have two aspects to it:

  • Causality Detection - find the root cause of the problem or situation.
  • Quantitative Modeling - try to predict a situation outcome after learning from a bunch of data. You don't need to know the cause of the problem for prediction, just modeling with different variables. Algorithms take care of mapping the outcome to inputs done correctly and will do robot prediction.

Both of the above require a bit of creativity. Causality Detection is probably the hardest and is 100 times harder as it requires a lot of domain knowledge and some cleverness. It's great to know that I can predict a part failure 8 of 10 times, but knowing why and getting to the root cause is a completely different animal. You can get away with not being a domain expert with Quantitative Modeling. With Causality Detection, only a domain expert can say A leads to B definitely.

Applying Trial/Error and Fail Fast approach to Quantitative Modeling means we are trying different algorithms, model parameters, features in the data, new sources iteratively until we reach your accuracy goal *REALLY QUICKLY*. There is a systematic method to some of the techniques now, but still, requires TRYING many things before something works.

Causality Detection as mentioned earlier is a bit different. We can try and fail fast on a few A/B testing approaches but requires careful navigation through multiple steps with each step taken ever so carefully and surely. Causality Detection is about eliminating uncertainty as we get really close to the root cause.

Working in an uncertain environment

On unknown situations or problems, most folks want a cookie cutter approach - unfortunately, data science brings a lot of uncertainty to a table. Even with techniques like Deep Learning which works out of the box with a startup random configuration, getting to the next level often seems to be challenging and tricky. As architectures become more complex, the science often depends on the trial/error art form solely dependent on the creative data scientist's efforts in addition to best practices developed over time.

Hope you enjoyed the blog post.

Outcomes