sequence
NoSequenceError Objects
class NoSequenceError(Exception)
Exception raised when there is no sequence in the file.
Sequence Objects
class Sequence()
A biological sequence with associated metadata and utility methods.
blast
def blast(program, database, threshold=10, hitlist_size=50)
using the ncbi blast api run blast, I am not sure if localblast is needed
Arguments:
program: which blast program to usedatabase: which database to usethreshold: e value thresholdhitlist_size: how many hits to returnwrite: whether to write the output file
Returns:
either the alignment dataframe or a Bio.SeqIO file connection or both
vienna
def vienna(temperature=37, *args)
Predict RNA secondary structure using ViennaRNA RNAfold via Biotite.
Arguments:
temperature: what temperature to use for foldingargs: additional arguments passed to vienna a string or a list of strings
Returns:
structure in dot-bracket notation, free energy in kcal/mol, list of base pairs
subseq
def subseq(start, end, keep_features=True)
Return subsequence [start:end) (0-based, half-open).
find
def find(subseq: str)
Returns all start indices (0-based) where subseq occurs (allowing overlaps). Case-insensitive match.
reverse_complement
def reverse_complement(keep_features=True)
reverse complement the sequence only works for dna and rna
Arguments:
keep_features: keep the original features
Returns:
another Sequence instance
translate
def translate(table=1, keep_features=True, to_stop=False)
Translate nucleic acids to protein. Uses Biopython table if available; otherwise
supports only standard table (1) for unambiguous triplets; ambiguous codons → ‘X’.
Arguments:
keep_features: keep existing features
gc_content
def gc_content(window=None)
GC fraction overall, or rolling mean over window (DNA/RNA).
gc_skew
def gc_skew(window) -> np.ndarray
GC skew = (G - C) / (G + C) in sliding windows.
kmer_counts
def kmer_counts(k: int, normalize: bool = True) -> Dict[str, float]
Counts (or frequencies) of k-mers (case-insensitive).
aa_composition
def aa_composition() -> Dict[str, float]
Fractional composition over the 20 canonical amino acids (others grouped as ‘X’).
molecular_weight
def molecular_weight() -> float
Approximate molecular weight in Daltons (average mass, subtract water for peptide bonds).
isoelectric_point
def isoelectric_point() -> float
Estimate pI using Henderson–Hasselbalch with a bisection search. Uses an EMBOSS-like pKa set.
hydropathy_profile
def hydropathy_profile(window=9, scale="KyteDoolittle") -> np.ndarray
Sliding-window hydropathy. Only ‘KyteDoolittle’ is supported. Returns an array of length L - window + 1 (centered windows).
mutate
def mutate(position, to, new_name=None, keep_features=True)
Mutate a specific location to something else, use caution we are not checking for validity, that is you can insert arbitrary things
insert
def insert(position, segment, keep_features=True)
Insert segment at position (0-based index before insertion).
delete
def delete(start, end, keep_features=True) -> "Sequence"
Delete [start:end) (0-based, half-open).
from_fasta
@classmethod
def from_fasta(cls, file_path, seq_type)
Read one or many sequences from a FASTA file. If there are multiple sequence you will get a SequenceList
to_fasta
def to_fasta(file_path: str) -> None
Write this sequence to a FASTA file.
SequenceList Objects
class SequenceList(list)
A list of Sequence objects with utility methods. Class methods are inherited from list
__init__
def __init__(sequences, type="protein")
Initialize with a list of Sequence objects, all must be Sequence instances.
ClustalOmega
def ClustalOmega(*args)
Perform multiple sequence alignment using Clustal Omega via Biotite. args are passed to ClustalOmegaApp.
Arguments:
args: list of clustal omega arguments
Returns:
returns a tuple of gapped_sequences, alignment_matrix and guide_tree
from_fasta
@classmethod
def from_fasta(cls, file_path, seq_type)
Read one or many sequences from a FASTA file. Unlike Sequence.from_fasta, this always returns a SequenceList.
to_fasta
def to_fasta(file_path: str) -> None
Write all sequences to a FASTA file.
utils
blast_search
def blast_search(program,
database,
sequence,
expect_threshold=10.0,
hitlist_size=50)
perfrom blast search via NCBI api this is not local blast
Arguments:
program: which program to use blastp, blastn, blastx, tblastn, tblastx, psiblastdatabase:sequence: Sequence instanceexpect_threshold: thresholdhitlist_size: max number of hits
Returns:
a dataframe