Why I don't Like BLUP

Down the rabbit hole into bizarre 20th century statistics

Dec 06, 2024

I was getting into GWAS methods for estimating heritability, and it turns out there are two major statistical practices used for this, GCTA and LDSC. The older one is GCTA, so naturally I started reading about it first.

This set me down a rabbit hole of confusion. Whereas LDSC and everything else I’ve seen uses standard OLS, GCTA does not. GCTA uses “BLUP.” Why not just normal OLS? To understand this, we have to understand what BLUP is.

What is BLUP?

BLUP is an acronym that stands for “Best Linear Unbiased Predictor.” Spoiler! It’s neither best nor unbiased. While it is a linear model, estimating it will rip you from the realm of linear algebra into nonlinear optimization methods. If it is not unbiased, it what sense is it a predictor?

It gets its name from some mid-20th century statistics journal articles, which are not as rigorous as you might think. The most cited BLUP article is an overview from the end of the century. It is pro-BLUP, but mysteriously refers to uncited “criticism.” Near the end of the article, we get,

So even its defendants must admit that BLUP’s naming is scammy and it’s not actually unbiased.

Also pay attention to its founder Henderson’s quotation in the first quote image. If it sounds bizarre, its because it is. I could hardly believe what I was reading the first time I saw it. Did other statisticians eat this up? It’s pure nonsense — there’s no difference between a “realized” or an “unrealized” random variable. All your data is “realized” and you try to predict future “unrealized” observations with it in BLUP, as well as OLS.

What BLUP is really doing is assuming that the effects are normally distributed. The way in which BLUPists talk about BLUP is extremely obscure for something so simple — this is likely because stating it simply reveals that their method has no place in inferential statistics.

Let me get more specific. OLS takes the data, which can be written as a matrix X, and finds the vector B that best predicts the observed predictee variable y. BLUP takes the data X, and finds vector B that does something in between being normally distributed and giving the best prediction of the predictee variable. BLUP very plainly, when described clearly, does not give the best predictor. OLS does. BLUP is penalized regression.

But they don’t describe it this way, likely for two reasons. The first is the age of the literature on BLUP. The second is that in the 21st century, loss function penalties are rightfully associated with machine learning, not inferential statistics.

The goals of machine learning and inferential statistics are different. The former is about maximizing validation accuracy, the second is about understanding natural causal processes.

It would be fair enough if BLUP were honestly described as an early machine learning technique, which is inappropriate for scientific inference. But it isn’t. Instead it’s described as a method for finding “random effects”. This term is another massive form of confusion.

This is the same paper as before, the most famous paper on BLUP everyone is directed to. It basically equates BLUP with random effects. Coming from typical OLS, I was aware of a fixed effect as a dummy variable representing group membership. For example, if you analyze GDP and democraticness by country, you might use race fixed effects to control for race, by have a new column for each major racial group, where each country gets a 1 or a 0. The rest of the betas will then be the “within race” effects — this overcomes Simpson’s paradox.

If a fixed effect is a dummy variable, what is a random effect? It’s probably not a normal effect. And it’s not a binary dummy variable, that’s a fixed effect. The paper says:

Fixed effects are parameters while random effects are random variables? A distinction without a difference at first sight. What he is hiding is that random effects are assumed to follow a distribution, which is weird. Why would OLS parameters be normally distributed?

Ok, so the paper doesn’t define random effects well.

So we go to Wikipedia and it’s even worse. It’s a variance components model? That makes no sense, OLS can give variance components. It’s a hierarchical linear model? Hierarchical sounds subjective. Where is that in the BLUP paper? OLS dummy variables handle hierarchical groupings.

But I remember that fixed and random effects are used in meta-analysis models.

Oh wait, in meta-analysis, fixed effect means no effect and random effect means fixed effect, i.e. dummy variables for groups of studies. Why is the terminology different? I got my definition of fixed effect from an econometrics book. Do they talk about random effects too?

Here fixed effects is an OLS dummy variable, and random effects use generalized least squares, which is not BLUP, although it seems similarly scammy, though not quite as bad. But that’s for another time.

I need a textbook on BLUP, similar to an econometrics textbook on OLS. Where is all the knowledge about BLUP? Maybe I can find a citation. Let’s go to the Wikipedia article and BLUP and see if we can find something.

Even Wikipedia calls Henderson’s terminology “strange.” We see that he had a student named Shayle Searle. Searle is heavily cited in the Robinson article (that BLUP is a good thing), and is usually recommended on internet forums when you ask about BLUP. He wrote several books, but the one that is often recommended is called Variance Components.

I don’t find this table of contents economical — there are too many pages devoted to too little. They are clearly thinking wrongly about estimating linear effects. There is no distinction between balanced and unbalanced data in OLS. You’re telling me these people made “distinguished” careers on thinking about linear models wrongly?

We get another strange definition of random vs. fixed effects. This is never deeply explained and after getting into the math elsewhere, I can say this is total nonsense.It’s closest to the definition of random effects as “when you estimate with BLUP”, because BLUP can be derived from the MLE of assuming your BLUP effects are Gaussian. However, even if they are Gaussian, BLUP is still biased and does not correctly estimate the effects. OLS does even if they are Gaussian. Searle is either being misleading or does not fully understand fixed vs. random effects. This is at the end of his prolific career, too.

What do we take away from this? We also see on Wikipedia that Searle introduce matrix algebra to statistics. Boy, does he love his matrix algebra. In fact, the rest of the book looks like this:

What is going on here?

I suspect what is happening here is basically the same thing as p-hacking in social science. This is as close as you can get to that in a hard field like statistics. Henderson and Searle are right up against the limits of rigor in statistics, and managed to create a mostly closed off school of thought that mutually reinforced each other’s reputations. The econometrics OLS centered view is rightfully on the rise and the ANOVA/BLUP school are mostly forgotten. I would have never read any of this if it weren’t for GCTA-GREML, and that hasn’t gone without criticism. It has mostly been phased out for the OLS based LDSC method and genomics is being theorized under the appropriate OLS causal mechanisms framework.

This is why the jargon is so dense coming from a more general statistics and econometrics background. This is why it’s called Best Linear Unbiased Predictor even though it’s neither best nor unbiased.

BLUP is neither Best nor Unbiased

I made a paper which is now up on PsyArxiv analyzing BLUP in the case of uncorrelated data. I showed it is neither best nor unbiased. Consequently, I strongly discouraged any usage of BLUP. I don’t think it’s appropriate for inferential science.

Why did people gravitate towards it in the first place? Basically to p-hack with bad data. It allows you to get some kind of estimate when you have many more columns in your data matrix than rows, which OLS can’t do. However, this means you need more data, not that you can cheap with BLUP or other methods.

In CS there is a phrase, GIGO: garbage in, garbage out. BLUP is a trash compactor.