thank you very much for pointing out this problem. I have gone over the code and can confirm that it is incorrect.
I now propose to do the following calculation:
For each sample s and each taxon t, let t(s) be the proportion of taxon t in s.
Then, let L(t) be the list of all values t(s) for a given taxon t, across all samples s.
The z-score z(s , t) for any pair of sample s and taxon t is then the z-score of t(s) calculated using L(t).
Do you agree with way of determining the z-score calculation?
This is what I then get for the dataset that you posted:
Hi Daniel,
I am not sharp enough in statistics to validate the method of calculation that you propose. However it seems to me that two options are possible: calculate the z score by sample (pop) or by taxon depending on what we want to test.
In the second case it is therefore necessary for each line (taxon) to calculate:
(deviation of each cell from the average of the line of this taxon) / standard deviation.
To be validated …
Best
Cédric