Sequence Module
This module represents biological sequences, they can be protein, rna or dna they depending on the kind of sequence there are differnet functionalities available
Sequence
The main class for working with individual sequences, providing methods for sequence analysis, mutation, alignment and searching.
Basic Usage
Proteins
from benchmate.sequence.sequence import Sequence
# Create a sequence object
seq = Sequence(name="my_sequence", sequence="MKLLPRGPAAAAAAVLLLLSLLLLPQVQA",
seq_type="protein", features={"some":"features"})
# perfom a blast search via ncbi api, local blast coming soon
seq.blast("blasp", "NP")
seq.subseq(start=10, end=100)
# Introduce mutations
seq.mutate(
position=3, # 0-based position
to="A", # Amino acid to mutate to
)
seq.insert(0, "MTMTMT")
seq.delete(10, 5) #delete 5 aa starting from pos 10
#search (exact search only)
seq.find("MKLL")
#kmer counts (works on all types)
seq.kmer_counts(5, normalize=True)
seq.aa_composition()
seq.molecular_weight()
seq.isoelectric_point()
seq.hydropathy_profile(window=9) #rolling window
seq.to_fasta("my.fa")
#or load from fasta
Sequence.from_fasta("my.fa")
For DNA/RNA
seq=Sequence(name="my_other_seq", sequence="ATATATAGACACAGTAGACAGTA", type="RNA")
#calculate secondary structure (for rna)
seq.vienna(temperature=37)
seq.reverse_complement()
seq.translate(to_stop=False) #dont stop once you reach a stop codon
seq.gc_content(window=None) # or a rolling window
seq.gc_skew(windog=None) # same as above
SequenceList
You can also have a list of sequence, if you load from a multifasta you will get one automatically, the only catch is you cannot mix and match sequence types and you cannot have a nested list of sequences.
In addition to all the list methods and all the sequence methods you can also perform MSA via ClustalOmega
from benchmate.sequence.sequence import SequenceList
seq=Sequence.from_fasta("my.fa")
seq.ClustalOmega()