seqwalk.filtering

Module Contents

Functions

rc(seq)

reverse complement of DNA sequence

filter_rc_3letter(library, k)

filter library to be RC free

rc_hash_filtering(library, k)

filter any library to be RC free, using simple hash approach

filter_gc(library, gc_min, gc_max)

filters library for sequences that have desired GC content

filter_pattern(library, pattern)

filters library to remove specific patterns

seqwalk.filtering.rc(seq)

reverse complement of DNA sequence

Parameters:

seq – string with letters in {A, C, G, T}

Returns:

string corresponding to reverse complement

seqwalk.filtering.filter_rc_3letter(library, k)[source]

filter library to be RC free (Supplementary note X)

Parameters:
  • library – list of sequences

  • k – SSM k value

Returns:

filtered_library

list of sequences without reverse complementary k-mers

Return type:

list of strings

seqwalk.filtering.rc_hash_filtering(library, k)

filter any library to be RC free, using simple hash approach could be slow for large libraries

Parameters:
  • library – list of sequences

  • k – SSM k value

Returns:

filtered_library

list of sequences without reverse complementary k-mers

Return type:

list of strings

seqwalk.filtering.filter_gc(library, gc_min, gc_max)[source]

filters library for sequences that have desired GC content

Parameters:
  • library – list of sequences in string representation

  • gc_min – minimum number of GC bases (int)

  • gc_max – maximimum number of GC bases (int)

Returns:

filtered_library

list of sequences in string representation

Return type:

list of strings

seqwalk.filtering.filter_pattern(library, pattern)[source]

filters library to remove specific patterns

Parameters:
  • library – list of sequences in string representation

  • pattern – sequence pattern to be prevented

Returns:

filtered_library

list of sequences in string representation

Return type:

list of strings