Skip to content

Too many sequences below identity #3

@ksahlin

Description

@ksahlin

Hi again,

I tried running MeShClust on 500 sequences that I simulated, all of length ~900nucleotides with most of the sequences highly similar (edit distances 1-20bp). A small portion of these sequences might have a high error rate, roughly Pacbios error rate of 10-15%. This is suppose to mimic PacBio Iso-Seq data. Any idea on how I should run MeShClust on such a dataset? Is it suitable for such sequences?

Thanks for your help!

[ksahlin@desmond bin]$ ./meshclust /nfs/brubeck.bx.psu.edu/scratch6/ksahlin/IsoCon_paper_n_10000/pacbio_reads/MEMBER_EXPERIMENT/TSPY13P_8_exponential_0.0001_500_1.fa --output ~/tmp/MESHCLUST/TSPY.clstr
avg length: 915
Recommended K: 4
Reading in sequences [=================================================] 100 %
Using 8 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:10;exons:1,2,3,4,5,6:copy14_read_170_error_rate_0.010857763300760043_total_errors_10
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:10;exons:1,2,3,4,5,6:copy5_read_242_error_rate_0.001092896174863388_total_errors_1
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:8;exons:1,2,3,4,5,6:copy34_read_418_error_rate_0.003278688524590164_total_errors_3
Before Pair: >TSPY13P;member:10;exons:1,2,3,4,5,6:copy10_read_36_error_rate_0.0_total_errors_0, >TSPY13P;member:8;exons:1,2,3,4,5,6:copy65_read_267_error_rate_0.002185792349726776_total_errors_2
Alignment [============================================================] 100 %
positive=0 negative=986
Identity value does not match sampled data: Too many sequences below identity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions