-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Hi again,
I tried running MeShClust on 500 sequences that I simulated, all of length ~900nucleotides with most of the sequences highly similar (edit distances 1-20bp). A small portion of these sequences might have a high error rate, roughly Pacbios error rate of 10-15%. This is suppose to mimic PacBio Iso-Seq data. Any idea on how I should run MeShClust on such a dataset? Is it suitable for such sequences?
Thanks for your help!
[ksahlin@desmond bin]$ ./meshclust /nfs/brubeck.bx.psu.edu/scratch6/ksahlin/IsoCon_paper_n_10000/pacbio_reads/MEMBER_EXPERIMENT/TSPY13P_8_exponential_0.0001_500_1.fa --output ~/tmp/MESHCLUST/TSPY.clstr
avg length: 915
Recommended K: 4
Reading in sequences [=================================================] 100 %
Using 8 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:10;exons:1,2,3,4,5,6:copy14_read_170_error_rate_0.010857763300760043_total_errors_10
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:10;exons:1,2,3,4,5,6:copy5_read_242_error_rate_0.001092896174863388_total_errors_1
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:8;exons:1,2,3,4,5,6:copy34_read_418_error_rate_0.003278688524590164_total_errors_3
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:8;exons:1,2,3,4,5,6:copy65_read_267_error_rate_0.002185792349726776_total_errors_2
Alignment [============================================================] 100 %
positive=0 negative=986
Identity value does not match sampled data: Too many sequences below identity
Metadata
Metadata
Assignees
Labels
No labels