mmseqs
MMSeqs Objects
class MMSeqs()
Corrected MMseqs2 wrapper:
- Always creates query DB via
createdb - Supports single and paired alignment (
pairaln) - GPU with padded DB
- Flexible extra args
create_database
def create_database(
fasta_path: str,
db_path: str,
gpu_padded: bool = False,
extra_args: Optional[Union[List[str], Dict[str, str]]] = None) -> str
Create target database (optionally padded for GPU).
pad_db
def pad_db(old_db, new_db, **kwargs)
create a padded db from an exising one
:param old_db: old db to pad :param new_db: new db path :return the path of the new db if all goes well
search
def search(query: Union[str, List[str]],
target_db: str,
output_a3m: str,
output_tsv: str,
use_gpu: bool = False,
sensitivity: float = 5.7,
max_seqs: int = 1000,
evalue: float = 1e-3,
extra_search_args: Optional[Union[List[str], Dict[str,
str]]] = None,
extra_result2msa_args: Optional[Union[List[str], Dict[str,
str]]] = None,
tmp_dir: Optional[str] = None)
Full pipeline: query → search/pairaln → A3M + TSV
blast
Blast Objects
class Blast()
__init__
def __init__(path=None, dbtype="n")
initiate a Blast class instance
Arguments:
path(str): path of the executable if none will check $PATHdb(str): path and name of the blast database if exists if not it can be created using create_dbdbtype(str): type of the database n for nucleotide p for protein
create_db
def create_db(fasta,
output_path,
dbname,
dbtype="n",
overwrite=True,
arg_dict=None)
create a blast databse and stor in self.db
Arguments:
dbtype(str): database type n for nucleotide and p for proteinfasta(str): path of the fasta file only fasta is implementedoutput_path(str): output path for the database this is different from the databse namedbname(str :param overwrite: if there is already a self.db you can override this just edits the class instance value dooes not touch the databse): database name so self.db will be output_path/dbnamearg_dict(dict): a dictionary of arguments, if left empty will use default values see blast documentation
Returns:
None: nothing just puts the new database path in self.db after database creation
search
def search(seq,
db,
output_type="tabular",
exec="blastn",
arg_dict=None,
cols=None)
Search an existing blast database with a sequence class instance
Arguments:
seq: a benchmate.sequence.sequence.Sequence instancedb: the path and name of the databaseoutput_type: tabular or jsonexec: what to use for serach depends on the type of sequence being searchedarg_dict: additional arguments to blastcols: what columns to return if you are returning a table
Returns:
pd.DataFrame of dict
utils
SinglePassFastaIndex Objects
class SinglePassFastaIndex()
this is a tiny class to access MSA a3m files, these files look like fasta but they are not reall so tools that deal with them have issues. This is not really a faster solution but a solution.
__init__
def __init__(fasta_path, delim="_")
constructor, the goal is to create an index of the entries, sometimes you will get multiple entries with the same name these will have other things next to the name, a combination of these create a unique entry
foldseek
FoldSeek Objects
class FoldSeek()
A Python wrapper for FoldSeek with support for:
- Querying PDB structures (single or directory) against a database → A3M + TSV output
- Creating FoldSeek databases (standard or GPU-padded)
- GPU acceleration (if DB supports it)
- Flexible extra arguments
__init__
def __init__(foldseek_bin: str = "foldseek")
Initialize the wrapper.
Arguments:
foldseek_bin- Path to the FoldSeek executable (default: assumes in PATH)
create_database
def create_database(pdb_dir: str,
db_path: str,
gpu_padded: bool = False,
extra_args: Optional[Union[List[str], Dict[str,
str]]] = None,
tmp_dir: Optional[str] = None) -> str
Create a FoldSeek database from a directory of PDB/CIF files.
Arguments:
pdb_dir- Directory containing .pdb, .cif, .pdb.gz, .cif.gz filesdb_path- Output database prefix (without extension)gpu_padded- If True, create padded database for GPUextra_args- Additional arguments as list or dicttmp_dir- Temporary directory (if None, system temp is used)
Returns:
Path to created database
pad_db
def pad_db(old_db, new_db, **kwargs)
create a padded db from an exising one
:param old_db: old db to pad :param new_db: new db path :return the path of the new db if all goes well
search
def search(query_pdb: str,
target_db: str,
output_a3m: str,
output_tsv: str,
use_gpu: bool = False,
sensitivity: float = 7.5,
max_accept: int = 100000,
evalue: float = 1e-3,
extra_search_args: Optional[Union[List[str], Dict[str,
str]]] = None,
extra_result2msa_args: Optional[Union[List[str], Dict[str,
str]]] = None,
tmp_dir: Optional[str] = None)
Run FoldSeek search and generate A3M + TSV from a PDB query.
Arguments:
query_pdb- Path to query PDB/CIF filetarget_db- FoldSeek database to search againstoutput_a3m- Output A3M file pathoutput_tsv- Output TSV file pathuse_gpu- Enable GPU (FoldSeek will error if DB not padded or no GPU)sensitivity- Search sensitivity (higher = slower, more sensitive)max_accept- Maximum number of alignments to acceptevalue- E-value thresholdextra_search_args- Extra args forsearchextra_result2msa_args- Extra args forresult2msatmp_dir- Custom temporary directory
Notes:
GPU errors are caught and reported (FoldSeek handles compatibility).