Posts on Louie Dinh

Posts on Louie Dinh https://louiedinh.com/posts/ Recent content in Posts on Louie Dinh Hugo -- gohugo.io en-us Fri, 02 Jul 2021 00:00:00 +0000 Survival Analysis Part 6: Time Dependent Variables https://louiedinh.com/2021/survival-analysis-pt-6/ Fri, 02 Jul 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-6/ In our previous discussion, we always assumed that covariates has a constant (proportional) effect over time. These models are easy to interpret. However, we would also like to capture effects that change over time. We call these time-dependent variables. When a Cox model contains time-dependent predictors, we call it an extended Cox model. When we add time-dependent variables, they will no longer satisfy the PH assumption. For example, the variable \(RACE * TIME\) is time-dependent. Survival Analysis Part 5: Stratified Cox https://louiedinh.com/2021/survival-analysis-pt-5/ Wed, 23 Jun 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-5/ The Stratified Cox (SC) model allows us to control for variables that do not satisfy the proportional hazards assumption. Instead, we estimate a separate baseline hazard per group, and then estimate the parameters jointly. \[ h(t, X) = h_{0g}(t) exp(\sum_i^p \beta_i X_i) \] Notice the new g subscript, which indicates the stratification of units by some set of covariates. In our previous example with leukemia patients, sex does not satisfy the PH assumption. Survival Analysis Part 4: Validating Cox PH Model https://louiedinh.com/2021/survival-analysis-pt-4/ Fri, 28 May 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-4/ There are three ways to check the PH assumptions: graphical plots goodness-of-fit tests time-dependent variable We will discuss the first two methods in this post, and leave the last technique for a future post. Recall that the Cox PH model assumes that the hazard ratio between two individuals is constant over time. This is easily seen from the model specification since \(h_0(t)\), the time dependent portion, cancels out. Survival Analysis Part 3: Cox Hazard Model https://louiedinh.com/2021/survival-analysis-pt-3/ Tue, 25 May 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-3/ There are three statistical objectives for a survival model: Test for significance of the treatment variable. Obtain a point estimate of the effect. Obtain a confidence interval for the effect. The Cox proportional hazard (PH) model captures how the hazard function is modified by covariates. \[ h(t, X) = h_0(t) e^{\sum_i^p \beta_i X_i} \] We exponentiate a linear function of X to ensure that the entire hazard function stays non-negative. Survival Analysis Part 2: Estimating the Survivor Function https://louiedinh.com/2021/survival-analysis-pt-2/ Sat, 24 Apr 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-2/ In the previous section we discussed four functions used to describe failure times: density, distribution, survivor and hazard. In this section we will talk about how to estimate the survivor function, S(t), with the Kaplan-Meier method. After we obtain S(t), we will see how the log-rank test, a variant of the chi-square test, can be used to compare two different survivor functions. Estimating the survivor function Continuing with our example from last time, let’s revisit the failure times of our box of microchips. Survival Analysis Part 1: Describing Survival Times https://louiedinh.com/2021/survival-analysis-pt-1/ Wed, 10 Mar 2021 00:00:00 +0000 https://louiedinh.com/2021/survival-analysis-pt-1/ Why Survival Analysis Retention is now widely appreciated as one of the most important metrics for growing a product’s user base. This fact, however, takes time to appreciate. We often ignore retention because it is a pretty boring metric. There are no big spikes or dips. Instead, we look at things like new user acquisitions - with its big spikes from good press or heavy marketing spend. But without good retention, newly acquired users will soon leave for greener pastures. Paper: Pseudoreplication and design of ecological field experiments https://louiedinh.com/2020/pseudoreplication-paper/ Tue, 25 Aug 2020 00:00:00 +0000 https://louiedinh.com/2020/pseudoreplication-paper/ In 1984, Hurlbert raised the alarm on a common statistical error in his paper “Pseudoreplication and design of ecological field experiments”. His aim was to draw attention to how the assumption of independence in various hypothesis tests is often violated. He called the error psuedoreplication. The paper describes how this error voids a test’s false positive rate, and throws the conclusions of an experiment into question. In addition, Hurlbert also offers much statistical wisdom to experimenters. The Rise of Data Science https://louiedinh.com/2020/rise-of-data-science/ Sat, 01 Aug 2020 00:00:00 +0000 https://louiedinh.com/2020/rise-of-data-science/ The trajectory of data science has been stunning. As a profession it went from nothing to “the sexiest job of the 21st century” within a decade. Computers have been used to analyze data since their inception. So why did it take more than half a century for such a field to emerge? In this post, I want to explore the tensions that had built up before that fateful day when DJ Patil and Jeff Hammerbacher slapped down the name that took off. Best Of The Internet https://louiedinh.com/2020/best-of-the-internet/ Wed, 22 Jul 2020 00:00:00 +0000 https://louiedinh.com/2020/best-of-the-internet/ Just a collection of my favourite things on the internet. The only rule is that I must have read the post, and considered it a timeless piece. Understanding why SVD is magic, in one twitter thread: https://twitter.com/WomenInStat/status/1285610321747611653 Data Driven Products Now: https://mcfunley.com/data-driven-products-now Real Time Analytics Destroys You: https://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics Essays I Think About a lot: https://www.benkuhn.net/progessays/ Optimizing MySQL: https://www.xaprb.com/blog/2006/05/02/how-to-write-efficient-archiving-and-purging-jobs-in-sql/ Impact as a Data Scientist https://louiedinh.com/2019/impact-as-a-data-scientist/ Fri, 01 Mar 2019 00:00:00 +0000 https://louiedinh.com/2019/impact-as-a-data-scientist/ Vertical versus Horizontal Impact Two ways that you can make impact as a data scientist. Three main functions — exploratory analysis and metrics definition (output answers “what is worth looking at and how”), confirmatory based on experiments or causal inference on observational data (output is “we should do X and Y will result”). Predictive work is a bridge between these two things. If you predict doing X leads to Y, then we should try doing X. Become a 10x Networker - Start Writing https://louiedinh.com/2013/become-a-10x-networker-start-writing/ Sun, 04 Aug 2013 00:00:00 +0000 https://louiedinh.com/2013/become-a-10x-networker-start-writing/ I hate the word networking. I imagine it means some guy in a expensive suit slinging business cards left and right. Let’s call it connecting instead. I have a confession to make; I suck at connecting. Wait! Let me be more specific. I suck at in-person connecting. Put me in a room full of business majors and I will awkwardly slink into the corner nursing my apple juice. What do you expect? Just In Case versus Just In Time Learning https://louiedinh.com/2013/just-in-case-versus-just-in-time-learning/ Sat, 13 Apr 2013 00:00:00 +0000 https://louiedinh.com/2013/just-in-case-versus-just-in-time-learning/ “Self-education is, I firmly believe, the only kind of education there is.” ― Isaac Asimov Every human is born with a super power: the ability to learn. Everything you know is learned. Like all important endeavours, a lifetime of learning merits some planning. We will classify the types of learning and then explore the effective use of your super power. In very broad strokes, learning can be split into two categories: Just-In-Time and Just-In-Case. Degrees in Studying https://louiedinh.com/2012/degrees-in-studying/ Thu, 02 Aug 2012 00:00:00 +0000 https://louiedinh.com/2012/degrees-in-studying/ The first two decades of our lives are spent studying. We study numbers and letters, then algebra and history and finally calculus and literature. In the end, a stern academic hands us a degree. That degree could say art history or literature or economics or zoology. It’s all the same. We are now experts at studying. I got my degree two months ago. It reads “Bachelors of Science in Computer Science”, not that it matters.