6. An exchange with Jim Burridge, co-author of Hirst&al. with a comment by J.Benveniste 1. Milgrom P. The Guardian "Thanks for the memory " March 15, 2001, also available at http://www.guardian.co.uk/Archive/Article/0,4273,4152521,00.html.htm
2. Belon P. et al. Inflammation Research 48, 17 (1999).
3. Davenas et al. Nature 333, 816 (1988).
4. Hirst S. et al. Nature 366 , 525 (1993).
To: john.foreman@ucl.ac.uk; f.l.pearce@ucl.ac.uk; n.hayes@ucl.ac.uk;
j.burridge-1@plymouth.ac.uk
Subject: I: Evidence of bias?
Date: 17 July 2001 07:06
I enclose a letter to Nature concerning your work. Comments are welcome.
Regards,
Italo Vecchi
----- Original Message -----
From: I.Vecchi
To: corres@nature.com
Sent: Saturday, July 14, 2001 3:19 PM
Subject: Evidence of bias?
SIR - The recently renewed interest for experiments1 on high dilutions suggests that a reappraisal of the avaliable evidence may be appropriate
.The current interest centers mainly on the work of Belon et al.2 and on the controversial Davenas et al.3, but a close inspection of the unsuccessful attempt to detect a difference between control treatments and highly diluted
probes described in Hirst et al.4 reveals interesting features, whosesignificance appears to have been neglected. The combined p-value of 0.0027 in Table 2 of Hirst et al. sums up the probability of obtaining experimental results such us those actually obtained , under the assumption that the null-hypothesis holds, i.e. that "the treatment applied to the cells
produces a response which is not different from the response in theabsence of treatment"4. A p-value of 0.0027 constitutes , by any standard, very strong evidonce that the null-hypothesis should be rejected.
Data from Hirst et al. (Table 2)
Fischer p-value
Succussed high dilution
0.0027
Unsuccussed high dilution
0.086
Control
0.85
Moreover one may question whether the dismissal of the results above the significance threshold in Table 1 of Hirst et al. is justified, also taking into account that the Bonferroni procedure used is extremely conservative.
While it is certainly true that " for any given set of random numbers it isexpected that there will occasions when the examination of randomness by statistical tests fails"4, the purpose of any statististical analysis is precisely to quantify the likelihood of such occurences. From a methodological point of view the dismissal of data above the level of
statistical significance as "chance results"4 may be construed as an instance of subjective bias.
Regards,
Italo Vecchi
From: "Jim Burridge" <
To: "I.Vecchi"
CC: <john.foreman@ucl.ac.uk>
Subject: Re: Evidence of bias?
Date: Fri, 20 Jul 2001 13:25:16 +0100
Italo,
I have read your comments with interest and have re-read the work we reported in Hirst et al. I think you have misread the paper. The summary at the start of Hirst et al clearly says "Our results contain a source of variation for which we cannot account" and that point is discussed at length on page 527 in connection with the results shown in Table 2.
Furthermore, the footnote to Table 2 makes clear that it is only the high dilution mean cell counts that are being analysed in that table - i.e. it is only the high dilution "treatments" that are being compared. We stand by our original statement that "some unidentified part of our experimental procedure might account for them" (i.e. the effects noted in Table 2).
The claim that we "overlooked" or "dismissed" these results is quite misplaced.
Regards,
Jim Burridge.
From: I.Vecchi
To: jim.burridge@btinternet.com
Cc: john.foreman@ucl.ac.uk
Subject: Re: Evidence of bias?
Date: 22 July 2001 21:23
Jim,
Thank you for your reply and for your willingness to discuss the issue.
In your paper you are testing a null-hypothesis.
Both in Table 1 and in Table 2 you exhibit very strong evidence against the null-hypothesis. Then, instead of rejecting the null-hypothesis, you claim that your findings, which make the null-hypothesis untenable, depend on an unknown source of variation in the highly diluted treatments. Yours is a very rare example of a paper attributing its results to unidentified flaws in its own experimental procedure.
Let me ask you a straight question. Do you agree that Hirst et al. provides strong evidence that "the treatment applied to the cells produces a response which IS different from the response in the absence of treatment"?
Regards,
Italo
From : "Jim Burridge" <jim.burridge@btinternet.com>
To : I.Vecchi
CC : <john.foreman@ucl.ac.uk>
Subject : Re: Evidence of bias?
Date : Wed, 25 Jul 2001 15:38:59 +0100
Italo,
The problem is that, within each session, the high dilution treatments are confounded with triplicates. Therefore it is impossible to tell whether the detected effects are a result of the triplicates (by being prepared at the
same time for example) or the treatments. The simplest interpretation of the results is that some feature of the experimental procedure caused each triple to differ in some way from other triples. This is probably an example
of the "plot", "block" or "batch" effect which is a standard feature of most experimental observations and is discussed at length in the experimental design literature. Depending on context, such effects can be large or small.
In agricultural studies, for example, the effects can be very large indeed. In industry it is sufficiently important for paint and wallpaper manufacturers to advise customers to use materials with the same batch number when doing a particular job. In the more precise chemical and physical sciences such effects can be small. The present study probably
falls somewhere in between. We think, but cannot prove, that the results we reported are the result of such an effect. The fact that the "effects" do not seem to be consistent across sessions points to this interpretation. I assume your direct question refers to the comparison of the high dilution results with the controls without treatment. This is the subject of Table 1 which fails to provide evidence of a difference. Regards, Jim.
Jim,
thank you again for your reply.
Your argument is irrelevant. The point is not that the triplicates differ from one another, but that the basophil counts for high dilution triplicates consistently differ from those for control, as revealed by the fact that their p-values are too low. What is being revealed is a non-random effect, otherwise the p-values for high dilution triplicates would be uniformly
scattered.
<<I assume your direct question refers to the comparison of the high dilution results with the controls without treatment. This is the subject of Table 1 which fails to provide evidence of a difference.>>
In Table 1 you use an ultra-conservative Bonferroni procedure, which however yields clear evidence that the null-hypothesis should be rejected for succussed anti-IgE in the higher dilution range. You dismiss such evidence
as a "chance result".
A look at Fig. 3a should make you think. Actually in Table 1 you are not testing the null-hypothesis A that "the treatment applied to the cells produces a response which is not different from the response in the absence
of treatment", but another null-hypothesis B that " the treatment applied to the cells produces a MEAN response which is not different from the MEAN response in the absence of treatment". What is happening is that succussed anti-IgE (and to a lesser extent unsuccussed anti-IgE) strongly enhance the variation in basophil counts, while affecting the mean counts only moderately. The variations cancel out when you take the average, so that the data in Table 1
capture only the lesser effect, which however remains significant for higly diluted anti-IgE.
In Table 2 on the other hand you are testing the null-hypothesis A, which the data clearly show to be untenable.
Regards,
Italo Vecchi
From: I.Vecchi
To: jim.burridge@btinternet.com
CC: john.foreman@ucl.ac.uk; f.l.pearce@ucl.ac.uk; n.hayes@ucl.ac.uk;
j.burridge-1@plymouth.ac.uk
Subject: I: Evidence of bias?
Date: 7 September 2001
Hi Jim,
I asked a friend , who works as a biostatistician at a pharmaceutical product development company, to have a look at your paper. She prefers to stay anonymous, but here is her slightly edited response, which I find quite
interesting.
<<I found it. I read it. And did not like it .
The triplicates were done specifically to reduce the variability. If this variability thought to be a problem, the difference between the treatments within triplicates could be tested, succeeded IgE versus succeeded buffer,
for example. According to Fig.2 this would be highly significant (in the 2 highest ranges of dilutions all means of succeeded IgE are higher than of buffer, so it is unlikely to be insignificant, no matter how high variability is - unfortunately there is no individual data there).
As to Table 1, I do not see the reason to perform an individual test for each dilution separately (I think the t-test was for zero difference between mean cell count for a specific treatment and dilution versus mean cell count
of controls, is that true? From your response it seems that you understand t-tests as testing differences between different dilutions (within same range) on the same treatment) , thus increasing the number of tests and decreasing degrees of freedom. Of course, there is a little power to detect anything with such a procedure. Also, because of high variability of the assay, control was performed every time. The reason to do it - to specifically account for session variation. To do this not the differences of the means should be compared, but mean of the differences (treatment cell
count - control cell count from the same session) should be compared with zero. I can't get through the second part of Table 1 (F-tests) to understand what was tested and how.>>
I think that the content of our exchange is quite significant and I would like to make it available, as pasted below, to other people interested in this issue, also by posting it at my site http:\\www.weirdtech.com\sci\memoryofwater.html ). I am sure that you, as any good scientist, are happy to have your arguments open to public scrutiny.
Thank you again for your replies.
Italo Vecchi
From: Jacques Benveniste <jbenveniste@digibio.com>
To: "I.Vecchi" <vecchi@weirdtech.com>
CC: <bdj10@cam.ac.uk>, <truzzi@toast.net>, dguillonnet@digibio.com
Subject: Re: Exchange with Jim Burridge, co-author of Hirst&al.
Date: Mon, 10 Sep 2001 20:47:19 +0200
[…] If I had to participate in the exchange I would add:
1) If I recall (but I could go back to the documents if necessary) I spotted 14 changes in the methods used by Hirst et al, 4 of them being major and capable of drastically reducing the sensitivity of basophil reaction.
2) They never showed any raw data, while accusing me of not showing them altogether, which is false as can be seen in our Nature paper. When Pr Spira, head of an INSERM statistics team, asked them to send the data they refused, on the ground that he was not competent.3) I never compared succussed and unsuccussed anti-IgE, which is irrelevant to the goal of the research which is not to explore the necessity of the succussion. This is another example of how they did not replicate my work faithfully. If they had asked me I would have told them that even with no succussion there is a diffusion of the
activity. Since they were amateurs in the high dilution process, anything could have happened. The basic experimental layout was and still is to compare highly diluted/succussed vehicle with the highly diluted/succussed agonist, a standard procedure in pharmacology.
No comments on the top of yours on the statistical aspect. I notice that a professional statistician couldn't understand some of it, which reinforce my opinion that not giving any raw data but only obscure and "conservative" statistics is a clear example of smoke screen tactics. Readers have only read the title.
[…]
JB
From: "Jim Burridge" To: "I.Vecchi" CC: Subject: Re: for the record
Date: Wed, 12 Sep 2001 22:07:51 +0100
Hi Italo,
Sorry about the prolonged silence and thanks for your messages - I've been on holiday and busy with other matters. I shall reply to your earlier comments soon (when I've retrieved them from my old computer - new technology does have its limitations!). In the meantime I'm quite happy for our, rather inconclusive, exchange to be posted on a public website - although I doubt many will find it illuminating for reasons I will explain later.
Best wishes,
Jim.
From: "Jim Burridge" <jim.burridge@btinternet.com> To: <vecchi@isthar.com> CC: <john.foreman@ucl.ac.uk> Subject: Re: for the record Date: Sun, 16 Sep 2001 23:14:06 +0100 Hi Italo, I think you are clutching at straws. I also think you and your anonymous statistical friend should study carefully my original report written 9 years ago ("A Repeat of the 'Benveniste' Experiment: Statistical Analysis", Research Report No. 100, Department of Statistical Science, University College London, England, March 1992). That report gives rather more details of the conduct of the experiment reported in Hirst et al and considerably more discussion of the statistical issues involved. The basic problem with this type of experiment is the variability of both the cell counts and the underlying biological material so any analysis is bound to have a significant statistical component. I look forward to receiving from you a reference to the detailed statistical report on which the original paper by Davenas et al is presumably based - and a copy of the raw source data would be nice too. It may well be true, as the other "JB" says, that I "couldn't understand some of it". However, he does not say what he means by "it" - does he mean his paper, his results, his statistical analysis, or your comments in this recent exchange? I cannot tell. Perhaps it doesn't matter. Incidentally, I ought to correct the emphasis of something I said in one of my earlier replies to you (e-mail dated 25 July 2001). When I wrote that I had forgotten the details of some of my discussions with the experimenters (I have since re-read my original report and refreshed my memory). I refer to the "batch" effect that I said was confounded with the high dilution treatments (incidentally, I am at a total loss as to why you think that is irrelevant - especially since some of your later comments make exactly the point that I was making!). The batch effect is a possible explanation for the results we reported - but the cause (if it is not the treatments!) is obscure for the reasons stated in my original report. Another possibility is that the results are a result of using an inappropriate statistical test - again this possibility is discussed in my original report. Perhaps you could read that report and suggest a more appropriate statistical test? There is, of course, the possibility that the results are just a chance result ....... Jim Burridge. From: I.Vecchi To: jim.burridge@btinternet.com Sent: Monday, September 17, 2001 2:02 PM Subject: Re: for the record Hi Jim, Thank you for your reply and for your suggestions. < I look forward to receiving from you a reference to the detailed statistical report on which the original paper by Davenas et al is presumably based - and a copy of the raw source data would be nice too. > The data-analysis in Davenas et al. is indeed , to put it mildly, confusing. However the effect that they claim to have detected is remarkably similar to the moderate increase in average degranulation and the strong increase in variation that you respectively dismiss as a "chance result" and attribute to an "unknown variation source". <I am at a total loss as to why you think that is irrelevant - especially since some of your later comments make exactly the point that I was making!). […] The batch effect is a possible explanation for the results we reported - but the cause (if it is not the treatments!) is obscure for the reasons stated in my original report. Another possibility is that the results are a result of using an inappropriate statistical test - again this possibility is discussed in my original report. > The origin of your results may well be a batch effect or a statistical fluke. It may also be a miracle by the Virgin Mary. You "think, but cannot prove" that it is a batch effect. What you, I, Dr. Benveniste, or the pope "think, but cannot prove" is scientifically irrelevant. Conjectures that cannot be verified/refuted are scientifically irrelevant.As I wrote previously, you are testing a null-hypothesis. If your data are incompatible with it, you should either reject the null-hypothesis or, if you realise that your method may be flawed, modify your experiment, as my statistician friend suggests, so as to eliminate the ambiguity between batch effect, statistical fluke or whatever and violation of the null-hypothesis. If such ambiguity cannot be removed, i.e. if your "batch effect" pops up whenever the null-hypothesis is being tested, then you are just calling Benveniste's "high dilution effect" by another name. Essentially you are saying " I find results which are incompatible with the null-hypothesis , so they are either chance results (as in Table 1) or (Table 2) there must be a variation source compatible with the null-hypothesis somewhere". This is not a scientific way to analyse data . Although I am fully aware that criticising an experiment is far easier than conducting one. I believe that the above points are worth making, <Perhaps you could read that report and suggest a more appropriate statistical test? There is, of course, the possibility that the results are just a chance result > I am extremely interested in reading your report. Perhaps you could send me a copy? My address is: Italo Vecchi XXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXX Italy If you cannot send me a copy, I hope I can get one through the library system. In any case I will write you again after examining it. Regards, Italo Vecchi From: "Jim Burridge" <jim.burridge@btinternet.com> To: "I.Vecchi" <vecchi@isthar.com> CC: "Prof John Foreman" <john.foreman@ucl.ac.uk> Subject: Re: for the record Date: Mon, 17 Sep 2001 22:44:22 +0100 Thanks for your address - I'll send you a photocopy of my report as soon as I can. I'll comment on your thoughts about science and statistics separately, but I fear we will end up having to agree to differ. Jim. To: jim.burridge@btinternet.com Cc: john.foreman@ucl.ac.uk Sent: Wednesday, September 26, 2001 9:49 PM Subject: report Hi Jim, Thank you again for sending me your report, which I received on Monday. Here are my comments after a first reading. First of all, I noticed that your report contains no raw data. It is unfortunate that, arguably with one exception, none of the papers published on this issue contains, to my knowledge, clearly presented and complete raw experimental data. The possible exception is Benveniste et al. ("L’agitation de solutions hautement diluees … " C.R. Acad. Sci. Paris , t. 312, Serie II, 461-466, 1991) , which was co-authored, perhaps crucially, by Spira. The lack of clearly presented and complete data is, obviously and by any standard, a very serious flaw for any scientific work. I may add that the fact that you, in a previous message, apparently invoke the example of Davenas et al. as a reason for the unavailability of your raw data is somewhat perplexing. In a previous message you kindly solicited suggestions on a more appropriate statistical test. I will forward to you any further input from my statistician friend , but in the meantime you may well consider her previously posted comments. As I have stated previously, I do not understand on what basis you attribute the results in Table 2 to a batch effect, unless "batch effect" is just a more palatable name than "high dilution effect". Let me add a further remark on Benveniste et al. (see above), whose procedure has at least the merit of simplicity and may provide a good starting point for further discussion. In Benveniste et al. "experiments that could not yield valid data" are discarded "according to predefined exclusion criteria", something that you do not do in Hirst el al. , and that may explain why the effect that you dismiss as a "chance result" is possibly weaker than what they observe. On the other hand the "predefined exclusion criteria" are , in my opinion, a very delicate point, where bias may slip in one way or another. Actually one may argue that the published data in Benveniste et al. should not be considered complete, since they do not include the experiments that were discarded according to "predefined exclusion criteria". In your report, after some cautionary words about missing features of your experimental design (i.e. the fact that the <linking [of the tubes] was not recorded>), which <might have helped us interpret some of the findings described there>, i.e. the results conflicting with the null-hypothesis, you write that <However the main aim of the experiment is to show that the results do in fact behave as expected!>. Well, this may be your aim, and your honesty in stating it clearly is commendable, but I doubt that it should be the purpose of a scientific experiment. I suggest that a more appropriate aim would be "to verify whether the result are compatible with the null-hypothesis being tested". The purpose of an unbiased experiment is to obtain and weigh evidence, not to fulfil the experimenter's expectations. It is regrettable that your remarkably honest statement of purpose did not make it to the published version (i.e. Hirst et al.), since showing "that the results do in fact behave as expected!" appears indeed to be your goal. As my statistician friend points out, if you split up and multiply the tests so as to reduce the degrees of freedom, there will be little chance to detect any significant effect. The likelihood of a type II error, i.e. of wrongly accepting a false null-hypothesis, is not even considered in your discussion. Finally you literally turn the purpose of any statistical analysis on its head by dismissing the statistically significant (and Bonferroni adjusted!) effect that shows up in Table 1 as a "chance result". In your discussion of the results Table 2, where the number of degrees of freedom is higher and the anomalies are simply too strong (and perhaps unexpected) to be ignored, there are some interesting statements, which I could not find in the main paper: <one interpretation [of the results] is that there are, after all, differences between the treatments>, i.e. that, after all, Benveniste’s main claim is correct, and that <further work needs to be done>. If further work has been done, I would be grateful for any indication concerning it, since, after reading your report, I still believe that Hirst et al. provides significant, while by no means conclusive, evidence that high dilutions effects are real. Your report, while not differing substantially from the published version , is a precious contribution to this discussion, and, in my opinion, a more honest piece of work than Hirst et al.. I will make it available to some interested people, e.g. my statistician friend. I will also consider sending it to anyone requesting it by mail at my address above. Regards, tito Hi Jim, I thought you might find this interesting. The following diagram, which is based on the data in the two tables available at htttp://www.weirdtech.com/sci/hirstdata.html may help visualize the core issue. The data in the two tables correspond to the y-coordinate (in tenths of millimeter) of the points in Fig 3a and Fig 3c in Hirst et al., where, as stated therein, each point is the mean of the triplicate determinations in a single experiment. The points were measured by me using a rule. The accuracy of my measurements (or lack thereof) can be verified by anyone with some goodwill and a rule. My measurement endeavour was triggered by the adamant refusal of Hirst et al. to make their raw data available to public scrutiny. A millimeter corresponds approximately to 0,41 percentage points, hence the formula: mean average degranulation = sum(height of single points)/(5*4,1) and sum(height of single points)/(3*4,1) for high dilutions and succussed buffer respectively. The reader can decide whether the data, as plotted herewith, provide supporting evidence for Jacques Benveniste`s claims on "waves caused by extreme dilution". The most unexpected feature of the plot is the apparent periodicity in basophils degranulation in the succussed buffer. Such an effect may well be an optical fluke or whatever. If the effect is real however, then periodicity may be an intrinsic property of basophil degranulation, while highly diluted treatments increase variation and average degranulation. The time structure of measurements (i.e. basophil counts), , which has never been considered in the experimental setting, may be crucial: basophils may always subsist as an oscillating superposition between degranulating and non-degranulating state, along the lines proposed in my high-dilutions quantum model (i.e. me, see http://www.weirdtech.com/sci/feynman.html ). Highly diluted treatments may just boost the amplitude of the degranulating state as revealed by increased variation and mean. Regards, tito From : "Jim Burridge" <jim.burridge@btinternet.com> To : <vecchi@isthar.com> CC : "Prof John Foreman" <john.foreman@ucl.ac.uk> Subject : Re: "new" data on Benveniste's stuff Date : Mon, 26 Nov 2001 22:11:43 -0000 Hello again Tito! I suggest you ask yourself what would you expect random data to look like. If you can't answer that question I suggest you take the next logical step and use your favourite stats package (or even Microsoft Excel will do) to produce 24 random standard normal numbers and plot them against the order in which they were produced (not in order of magnitude of course!). Ask yourself whether the plot looks periodic or not. Of course you might, just by chance, produce an increasing sequence...........! Try it a few, or even several times, and you might appreciate the difficulty. Best wishes, Jim
From: "I.Vecchi"
To: "jim.burridge@btinternet.com"
Subject: "new" data on Benveniste's stuff