问题
一直以来用Eigensoft的smartpca来做群体遗传的PCA分析很顺畅,结果也比较靠谱。
但今天报错如下:
$ ~/miniconda3/bin/smartpca -p smartpca.par
parameter file: smartpca.par
### THE INPUT PARAMETERS
##PARAMETER NAME: VALUE
genotypename: plink.ped
snpname: plink.pedsnp
indivname: plink.pedind
evecoutname: pca.vec
evaloutname: pca.val
numoutlieriter: 0
numchrom: 1000000
## smartpca version: 16000
norm used
warning (mapfile): bad chrom: 100 100:1816 0 1816
warning (mapfile): bad chrom: 101 101:1388 0 1388
warning (mapfile): bad chrom: 101 101:1922 0 1922
warning (mapfile): bad chrom: 102 102:1286 0 1286
warning (mapfile): bad chrom: 103 103:867 0 867
warning (mapfile): bad chrom: 104 104:149 0 149
warning (mapfile): bad chrom: 105 105:1532 0 1532
warning (mapfile): bad chrom: 106 106:1201 0 1201
warning (mapfile): bad chrom: 107 107:1113 0 1113
warning (mapfile): bad chrom: 108 108:255 0 255
Segmentation fault
这个原因有可能是染色体号为0导致。smartpca中 ,0意味着染色体编号信息缺失。
检查我的map文件中第一列(染色体号),从1开始,并没有为0。以前用带chr或scaffold开头的染色体数据做过,也没有报错。
解决
在Google group上找到了原因。
I have got Smartpca within EIGENSOFT (6.0.1) to work without converting with convertf - it will take map/ped directly. I have madified the output map/ped that stacks outputs.
EIGENSOFT and PLINK don't with thousands of chromosomes/contigs well - so I would suggest removing that info from the map file - replace the first column with all '1' for example. I do have some chromosome info so I have chromosomes 1-37 for assigned loci and I used for '40' for unassigned loci. I dont think smartpca likes a zero in the frist column of the map file.
example map file:
[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.map)
ped file - I have the phenotype (col 6) set to missing (-9) and smartpca complains about it - but it works.
example ped file:
[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.ped)
example parfile:
[https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile](https://github.com/rwaples/chum_populations/blob/master/results/batch_4/EIGENSOFT/complete.codom.subsample.parfile)
-Ryan
可以看到smartpca并不支持上千条的scaffold/contig(查看了下我的数据,有3000多contigs),而在做PCA分析时,染色体号并不影响最终结果。因此可将很碎的contig统一一个染色体号。
sed 's/contig[0-9]*/20/g' map.vcf
最终得到所有材料PCA结果。