sequence

NoSequenceError Objects

class NoSequenceError(Exception)

Exception raised when there is no sequence in the file.

Sequence Objects

class Sequence()

A biological sequence with associated metadata and utility methods.

blast

def blast(program, database, threshold=10, hitlist_size=50)

using the ncbi blast api run blast, I am not sure if localblast is needed

Arguments:

  • program: which blast program to use
  • database: which database to use
  • threshold: e value threshold
  • hitlist_size: how many hits to return
  • write: whether to write the output file

Returns:

either the alignment dataframe or a Bio.SeqIO file connection or both

vienna

def vienna(temperature=37, *args)

Predict RNA secondary structure using ViennaRNA RNAfold via Biotite.

Arguments:

  • temperature: what temperature to use for folding
  • args: additional arguments passed to vienna a string or a list of strings

Returns:

structure in dot-bracket notation, free energy in kcal/mol, list of base pairs

subseq

def subseq(start, end, keep_features=True)

Return subsequence [start:end) (0-based, half-open).

find

def find(subseq: str)

Returns all start indices (0-based) where subseq occurs (allowing overlaps). Case-insensitive match.

reverse_complement

def reverse_complement(keep_features=True)

reverse complement the sequence only works for dna and rna

Arguments:

  • keep_features: keep the original features

Returns:

another Sequence instance

translate

def translate(table=1, keep_features=True, to_stop=False)

Translate nucleic acids to protein. Uses Biopython table if available; otherwise

supports only standard table (1) for unambiguous triplets; ambiguous codons → ‘X’.

Arguments:

  • keep_features: keep existing features

gc_content

def gc_content(window=None)

GC fraction overall, or rolling mean over window (DNA/RNA).

gc_skew

def gc_skew(window) -> np.ndarray

GC skew = (G - C) / (G + C) in sliding windows.

kmer_counts

def kmer_counts(k: int, normalize: bool = True) -> Dict[str, float]

Counts (or frequencies) of k-mers (case-insensitive).

aa_composition

def aa_composition() -> Dict[str, float]

Fractional composition over the 20 canonical amino acids (others grouped as ‘X’).

molecular_weight

def molecular_weight() -> float

Approximate molecular weight in Daltons (average mass, subtract water for peptide bonds).

isoelectric_point

def isoelectric_point() -> float

Estimate pI using Henderson–Hasselbalch with a bisection search. Uses an EMBOSS-like pKa set.

hydropathy_profile

def hydropathy_profile(window=9, scale="KyteDoolittle") -> np.ndarray

Sliding-window hydropathy. Only ‘KyteDoolittle’ is supported. Returns an array of length L - window + 1 (centered windows).

mutate

def mutate(position, to, new_name=None, keep_features=True)

Mutate a specific location to something else, use caution we are not checking for validity, that is you can insert arbitrary things

insert

def insert(position, segment, keep_features=True)

Insert segment at position (0-based index before insertion).

delete

def delete(start, end, keep_features=True) -> "Sequence"

Delete [start:end) (0-based, half-open).

from_fasta

@classmethod
def from_fasta(cls, file_path, seq_type)

Read one or many sequences from a FASTA file. If there are multiple sequence you will get a SequenceList

to_fasta

def to_fasta(file_path: str) -> None

Write this sequence to a FASTA file.

SequenceList Objects

class SequenceList(list)

A list of Sequence objects with utility methods. Class methods are inherited from list

__init__

def __init__(sequences, type="protein")

Initialize with a list of Sequence objects, all must be Sequence instances.

ClustalOmega

def ClustalOmega(*args)

Perform multiple sequence alignment using Clustal Omega via Biotite. args are passed to ClustalOmegaApp.

Arguments:

  • args: list of clustal omega arguments

Returns:

returns a tuple of gapped_sequences, alignment_matrix and guide_tree

from_fasta

@classmethod
def from_fasta(cls, file_path, seq_type)

Read one or many sequences from a FASTA file. Unlike Sequence.from_fasta, this always returns a SequenceList.

to_fasta

def to_fasta(file_path: str) -> None

Write all sequences to a FASTA file.

utils

def blast_search(program,
                 database,
                 sequence,
                 expect_threshold=10.0,
                 hitlist_size=50)

perfrom blast search via NCBI api this is not local blast

Arguments:

  • program: which program to use blastp, blastn, blastx, tblastn, tblastx, psiblast
  • database:
  • sequence: Sequence instance
  • expect_threshold: threshold
  • hitlist_size: max number of hits

Returns:

a dataframe