Talking to the dead study: does it hold up?
A little while ago, I was listening to The Skeptics Guide to the Universe podcast, when Steven Novella mentioned that he’d been on the Skeptiko podcast debating Near Death Experience research with the host, Alex Tsakiris. I subscribed to Skeptiko to hear the debate. My initial reaction was that Alex was trying to honestly evaluate the evidence. However, the way he was interpreting it, was unsatisfactory to my skeptical mind. Thus, I decided to listen to a couple other episodes to see if my initial interpretation was correct.
In the next episode, Alex had as guests the hosts of a skeptical podcast I wasn’t aware of, called Righteous Indignation, and one the main thing that the 4 of them spend a lot of time discussing was a study about mediums and communication with the dead. The study is titled “Anomalous Information Reception by Research Mediums Demonstrated Using a Novel Triple-Blind Protocol” by Julie Beischel and Gary E. Schwartz. I have sent Alex an e-mail to ensure that this is in fact the study in question. He has replied confirming that this is indeed the study they were discussing in that show.
Alex took exception to the skepticss comment that the study’s methodology was questionable. After reading the study myself I find myself agreeing, not surprisingly, with the skeptics. This study has glaring issues, and leaves too many important pieces of information out. I tried to reach out via e-mail to the study’s author, Julie Beischel to ask her a few questions, but the e-mail address listed on the study came back with an error message. Unfortunately, those questions will remain unanswered.
So without further ado let me get into the meat of things.
The study’s purpose was to investigate the “anomalous reception of information about deceased individuals by research mediums under experimental conditions that eliminate conventional explanations.” In other words, the authors wanted to set up conditions which made it impossible for the mediums to get information in any way besides “anomalous reception”, a.k.a. psychically, and then figure out the success rate.
8 students were selected, 4 with a deceased parent, 4 a deceased peer. Each student was paired with a student from the other group, thus each pair of 2 students had one deceased parent and one deceased peer, both deceased individuals of the same gender, resulting in 4 pairs of “sitters”. An unrelated third person, who had no knowledge of the sitters or the dead people served as a “proxy sitter”. In other words, the proxy sitter was given the names of the 2 dead people, which he/she then relayed to the medium over the phone. The medium, working solely with the first name of the dead person would then go on to produce a reading for the pair (2 readings per medium one for the dead parent, one for the dead peer). Each pair of sitters received readings from 2 separate mediums.
So to summarize, 8 students organized in 4 pairs. 8 mediums. Each pair got reading from 2 mediums. We have 16 readings altogether. Next comes the scoring.
I’m not going to spend much time on the technicalities of the scoring process. For purposes of the summary it suffices to say that each student was presented with the readings for the pair and asked to choose the one that better fit their deceased person. So if you were the student with the dead parent, you’d get two readings : the one meant for your dead parent and the one meant for the dead peer of the other student in your pair. You would not know which was which and had to pick the one that best fit your dead parent. After doing this 13 out of the 16 readings were correctly identified.
The authors concluded with strong words:
The present findings provide evidence for anomalous information reception but do not directly address what parapsychological mechanisms are involved in that reception. In and of themselves, the data cannot distinguish among hypotheses suchm as (a) survival of consciousness (the continued existence, separate from the body, of an individual’s consciousness or personality after physical death) and (b) mind reading (ESP or telepathy14)or super-psi1 (retrieval of information via a generalized psychic information channel or physical quantum field, also called super-ESP).
So what is the verdict here? Does this study really provide convincing evidence for anomalous reception?
Basic Criteria for evaluating a scientific paper
Before we start analyzing how well, or not, this study followed basic methodological principles, it is important, I think, to review the basic characteristics that we expect to see in a well designed and run scientific study, and they are:
- No fraud – This one is pretty obvious; the very first requirement is that there was no fraud perpetrated by the authors, no hiding of data, no making up data and that sort of stuff.
- Statistical competency – We would also expect the authors to have done their statistics properly, that the correct analytical techniques were used and such.
- Sample Size – This refers to the number of people drafted to participate in a study. For any given level of statistical confidence interval, a minimum sample size (referred to as n in statistics) is necessary. The smaller the sample the less reliable the results of the study are. Sample size is directly related to the total population for which we’re trying to come to a conclusion, the confidence level and the confidence interval. For a quick calculator and a quick refresher of what these terms mean, you can check out this website.
- Randomization – of test subjects is important because it helps to reduce the effect of bias in the study results.
- Control Group – Very important to weed out perceived, but not real, effects/benefits from whatever is being studied. Thus, when testing a drug, there will be one group of test subjects receiving the medicine being studied, and another group, separate and distinct from the first, receiving a sugar pill. Neither knows what they’re being administered. The results from the control group are compared with the results from the medicine group to see if there is a real effect, beyond placebo.
- Blinding – Single/Double/Triple. Blinding comes in many flavors. The gold standard is double-blinding, when neither the test subject, nor the person administering the thing being tested know what they are dealing with. Triple blinding is also possible, when the people doing the statistical analysis of the raw data are not told which one they’re analyzing. So for example, in the drug scenario double blinding means the test subject does not know if he’s getting the medicine or the sugar pill, the person handing out the pills does not know if she’s handing out the medicine or the sugar pill. In the triple blinding case, the statistician would not be told “here is the data for the medicine and here is the data for the sugar pill”. Instead, she’ll be told “here is data set A and here is data set B”.
These are the core, basic requirements of a properly designed scientific study. Now going back to the study at hand, the skeptics claimed that the methodology, a fancy way of saying the design, of the study was inappropriate, “highly dubious” I believe were the exact terms, if my memory serves correctly. Let us go through the list and see if that is indeed the case, or if Alex was right that this study has very good design. Only one of them can be right, so let us try to find out who is.
I will skip over #1 and #2 and give the study authors a “Pass” for the simple reason that I am not aware of any evidence that there was any fraud, so unless such evidence comes to light I am inclined to believe no fraud was present, and because I am not an expert in statistics, I cannot scrutinize the statistical methods and results so I am willing to give this study the benefit of the doubt in that regard as well. Let’s look at the other criteria, those that any lay person can evaluate for themselves.
#1 Sample Size – Was the sample size appropriate? Well what is the sample size in this study? Is it the number of students recruited? The number of mediums? Well, given that what is being studied here is not the effect of the reading on the sitter, but the effectiveness of the medium to give a correct reading, I would suggest that the sample here would be the total number of readings performed, thus n=16. Is this sample size appropriate. No, not to enable us to reach any conclusions whatsoever. Even if everything else is done perfectly, all the other criteria were followed to the dot, a sample size of 16, at best, indicates that a larger sample is needed. No conclusions can be drawn from 16 data points.
You do not have to take just my word for it. Let us refer to the calculator I linked to before. How can we apply it to this case? Simple: the study concludes that 13/16 readings were picked up correctly, therefore that is strong evidence for psychic powers, or anomalous reception of information. The unstated premise is that those 13 readings must have been on target. So we can look at the number of readings. According to this study, 13 out of 16 medium readings were correct, which would be impressive. However, let us think for a moment: how many such readings take place, in the US alone in any given year? I would venture a guess of something in the hundreds of thousands. Let us say for argument’s sake that we have a population of only 100,000 readings.
Now we ask the question, what number of readings do we have to study in order for the sample size to be appropriate? That depends on the desired confidence level and interval. No study I’ve ever read has had a confidence level of less than 95%, and if I am not mistaken, this study is using a 99.9% confidence level, but for argument’s sake we’re going to use the lower level of 95%, which will require a smaller sample size. The interval is the + or – that usually follows poll results. I’ve usually seen a few digits, so let us go with 5. Please type all this information in the calculator:
- Confidence Level – 95%
- Confidence Interval – 5
- Population – 100,000
The result? 383. In other words, you’d need to look at 383 readings to be 95% sure that the result is within 5% of the true value. All of a sudden 16 looks really, really tiny, doesn’t it? Strike One!
#2 Randomization – Were the test subjects chosen at random? No, neither the sitters nor the mediums were chosen at random from their respective populations. While I do see why that would be so with the mediums, you want to test the best of the best after all if you want to sort this thing out and you don’t want the charlatans in the medium population to dilute the effect, I do not understand why this simple requirement was not followed when it came to the sitters. The authors had a pool of 1,600 students to choose from, more than enough to get a nice, random sample out of. Instead the sitters were selected based on answers “yes” or “unsure” to questions about his/her belief in the afterlife and mediums. Furthermore, the final 8 were chosen based on their answers and based on the desired paring, in order to optimize “the ability of blinded raters to differentiate between two gender-matched readings during scoring”.
What does all this mean? Well, simply put it means that the authors hand-picked who they wanted to be a sitter based on the survey questions, and even went so far as making sure that the paired deceased were as different from each other as possible. That basically takes randomization and throws it out the window, no questions asked.
So what exactly were these survey questions the volunteers had to answer? What were the answers of the final 8? We do not know, and unfortunately Dr. Beischel’s e-mail did not work so I could not ask these questions. But these are crucial pieces of information to have. What if all 8 had answered “Yes” to the question “do you believe in an afterlife” or “do you believe in mediums and their ability to contact the dead”? Wouldn’t you think that would severely bias the way they look at the readings? Strike Two!
#3 Control Group – This was a sticking point between the skeptics and Alex in the podcast. Alex kept insisting that there was a control, that the fact that each person got their intended reading and another reading constituted a control. However, he’s missing the main point about controls: it is supposed to be a control group, separate and distinct from the “treatment” group. The magnitude of the placebo effect, random chance etc. cannot be gauged by having the same test subject choose between treatment A and the placebo. That’s just not how science works, and if we are pretending to be running a scientific experiment we must play by the rules of science. You cannot make up a new definition for “control”; that’d be having your cake and eating it too!
So what would the control have looked like in this experiment? Sticking with the way this experiment was run, the control group would be a second group of 8 students, identical to the first 8 who would be getting the same readings but not from a “medium” but a mentalist that can produce such readings without claiming paranormal powers. Then you would run the exact same experiment and tally the results. If there is a statistically significant difference between the first group of 8 students and the control group of another 8 students, then one may reasonably say that more study is needed. This study as run, lacked a control group. Strike Three!
#4 Blinding – Is this really a triple blinded study as the authors proclaim? Well remember triple blinding means that the participants are blinded (meaning they don’t know if they are getting the real or the control treatment), the person handing out the treatment does not know what they’re giving out, and the statisticians analyzing the results do not know what they’re analyzing. This study fails on all three counts.
First the test subjects were not blinded, simply put because there was no control group. Every student knew they were getting a “real” reading indeed. You cannot have participant blinding without a control group, and having the test subject choose between a fake and a real reading does not constitute blinding, especially when the readings are set up to be as different as possible. That’s a basic fact and anyone who has a problem with that is not understanding control & blinding as they are used in science.
Second, the mediums were not blinded. In order to effectively blind the mediums they should not have known if the name they were given was indeed that of a dead person or that of a living one. Not only did the mediums work in complete confidence that they were working only with dead people, but they also knew the gender and approximate age of the dead people they were supposed to give a reading for. That is not blinding, that is the opposite of blinding, the medium is going in knowing three pieces of information: the person is indeed dead (so no chance of giving a reading for a living person), the person’s gender (gleamed from the name) and the persons’ ages (roughly late teens to early twenties for the dead peer, and late 40s and higher for the dead parent). That is a lot of information for someone skilled at the guessing game. The way the experiment is set up, betrays one important thing: the author is going on about this study already assuming the mediums can indeed talk to the dead, so they didn’t even bother to control for the possibility of fraud, or guessing.
Thirdly, there is no indication in the paper about the blinding of the statisticians and the other persons involved in interpreting the data. The author refers to the proxy-sitters as their triple blinding but that is not what triple blinding means. Matter of fact, the presence of the proxy sitters is completely baffling. They do not need to be there, they add nothing to the overall methodology, and it seems their sole purpose was to pass on a name to the medium, which could have easily been done otherwise. Anyone who knows anything about triple blinding can easily confirm this is not triple blinding.
So the test subjects and the mediums were not properly blinded, and it appears the statisticians weren’t either. Strike Four!
Other problems with the study
Besides the methodological problems described above here are more problems that need to be worked out before we can have any reliance on the results of this study.
- There is no mention in here of how accurately the mediums readings matched with the descriptions of the deceased that the students gave in the beginning. Were there any specific pieces of information provided (such as the deceased’s birth date, death date, death place, mode of death, social security # etc, something that is specific to the person being “read”)?
- The participants were forced to choose one of the two readings provided. They were not asked to pick only if the reading very closely applied to their deceased person, they are forced to choose one of the two. When you combine this 50-50 pure chance, with the fact that the students were hand-picked to participate (possibly having been choses for their propensity to believe) and the fact that the two readings would have been fairly different (medium knew approximate ages) that can easily explain 13 out of 16 hits. The fact that we lack a control group makes that number almost useless as we have nothing to compare it to.
- When the students were chosen from the pool of 1,600 it was done so in order to “optimize testing conditions…based on answers “yes” or “unsure” to survey questions about his/her beliefs” yet no explanation of exactly what this optimization process involved.
- When the dead people were paired is was done in a way so to “optimize differences” across various characteristics. Again no description is given. When they say it was optimized for age does that mean to decrease or increase the age difference in the pair? The answer is unknown.
- The second part of the reading was the Life Questions in which the medium was to answer 4 specific questions about the dead person. The results on the accuracy of these answers are not available.
- Each medium reading was transcribed and turned into a numbered list of individual items. It is unknown how specific the items included in the list by the experimenter were. In other word, did it say “Bob died in a motorcycle crash on the I95” or does it say that “Bob died peacefully”? Those kinds of things always matter in a study of this sort.
We can get into more detail about other opened questions that remain and have not been properly addressed. In the Results the authors promise more details in a future manuscript, but I haven’t been able to find it, and as stated previously my attempt to contact the lead author was futile.
So what can we take from this study? How reliable is it? Unfortunately for the talking-to-the-dead enthusiasts, this study is worthless scientifically. It had a ridiculously small sample size, it lacked a control group, had no randomization or proper blinding, not in a scientific sense that is. There are many other unanswered questions, missing crucial information that could shed some light on the results. The authors forced the subjects to pick one of the two answers, which alone gives a 50-50 chance which when coupled with the other points I raised up earlier more than explains the results observed. And more importantly, nothing was reported on the accuracy of the mediums readings, how specific their readings were especially in the Life Questions sections and how well they matched with the subjects descriptions on specific items.
Would it not have been easier to ask the students to provide ten pieces of information specific to the dead person, ask the medium to do their reading, covering the 10 specific pieces of information and then ask a third-party to analyze how well the mediums’ answers match those specific pieces of information, as opposed to relying on forced choice between two options? I think so. Why wasn’t it done? Id’ rather not speculate.