Just some thoughts that one day might blossom into posts.
Potential Track / Tag: Upskilling as a data scientist. A series of posts that explains key concepts / techniques. Become THE definitive guide to data science practice.
Data Science as Scorecard
Data Science is about giving people a scorecard. Humans are motivated when they have indicators of progress. By building the tools to allow for measurement of progress (of their ideas!), DS contributes by supercharging the iteration velocity of companies.
The role of data science is to abstract away the complexities of how users respond by building a platform that has clear indicators of progress for people building product.
Previously, this mantle was held by analytics, but the problem was analytics is observational and retrospective.
We switch it with experimental and prospective.
Double-loop learning. Metacognition for companies. Building mental models of the user.
Low Uptake Experiments
We would like to release a feature that is “opt-in”. That is, we want to offer users an option to use a feature. How can we do this as an A/B test?
The concrete example is reader mode, where users are given the option. Only a small percentage of users end up using it. How do we estimate the treatment effect, since there is no counterfactual (essentially they are blended into the control, since we don’t know who would’ve taken the choice).
Identifiability, Exchangability, Ignorability.
Define these. So annoying.
Observational Methods for Data Science
AB testing, and experimentation, has been well fleshed out by various companies (see Kohavi). However, the sciences have another tool that they use. Observational studies! If we can unlock observational studies, that would be the next step difference for data science.
My Data Science Journey
Daniel Vassalo style post. Read his Twitter course first. Just write about how I got into data science, and my lessons along the way.
Creative Management, and how CEOs are failing Sofware Engineers
Data Scientists At Work / ESI style interview.
Series of questions on what DS work is really like at companies.
Ideas In the Ether
Richard Hamming expresses this idea as how great scientists always have the problems in the edge of their mind. Dan Mckinley uses this idea to express Pinterests’ growth.
At companies, it seems like there is a particular barrier to energy needed for features. If it’s too high, it doesn’t happen. You can either work hard to get over specific barriers, or you could work to lower the barrier across the board so more ideas make it through.
What does it mean when someone says “2x more likely”. Well it can’t mean multiplying the probability by 2, because if you are 50% chance of winning rock paper scissors, an advantage of 2x doesn’t mean you’ll win 100% of the time!
At 50%, you would win 50 / 100 games. If you were to double that win rate, you’d be winning two for everyone 1 game you lost. That is, to get the same number of wins, that is you’d be winning 2 games for everyone 1 game you lost. That’s a probability of 66%.
How do you resolve Absence of Evidence is not evidence of absence?
False Negative Rate! If you have sufficient power, then no detection is actually your strength of evidence for no effect!
OK, What the heck is ANOVA?
Just figure out what it is. Shows up everywhere. Maybe show the equivalence to linear models.
Tour through a field
People say to start with review articles. Eh. Instead, let’s just go through the top cited papers in a field.