Detailed documentation for epiNB
- class epinb.NBMulti(n_pan=10, n_spec=10, n_jobs=None, **kwargs)
Bases:
object- fit(X, y)
- predict(X, return_series=False)
- predict_log_odds(X, return_df=False)
- predict_log_proba(X, return_df=False)
- class epinb.NBScore(n_pan: int = 10, n_spec: int = 10, pan_feature_candidates: Optional[List] = None, *, smoothing_strength_1: float = 0.0, smoothing_strength_2: float = 0.0)
Bases:
objectCreate an epiNB model.
- Parameters
n_pan – Number of pan-allelic 2nd-order motifs used in prediction. Default/recommended: 10.
n_spec – Number of allele-specific 2nd-order motifs used in prediction. Default/recommended: 10.
pan_feature_candidates – Customize pan-allelic 2nd-order motifs. Default/recommended: [(1, -1), (0, 1), (4, -4), (1, 2), (0, -1), (-2, -1), (2, 4), (-3, -1), (2, -1), (1, 3), (0, 2)]
smoothing_strength_1 – Smoothing used for 1st-order motifs using BLOSUM62. Default/recommended: 0.
smoothing_strength_2 – Smoothing used for 2nd-order motifs using BLOSUM62. Default/recommended: 0.
- counter_1(X: ndarray) ndarray
Counter for 1st order motifs.
- Parameters
X – The peptide matrix
- Returns
A matrix representing the frequency of each AA at each position
- counter_2(X: ndarray, features: List[Tuple[int, int]]) ndarray
Counter for 1st order motifs.
- Parameters
X – The peptide matrix
features – The requested 2nd order motifs
- Returns
A matrix representing the frequency of each AA combination (400 in total) at each 2nd order motifs.
- fit(X: Iterable[str], min_len: int = 8, max_len: int = 11)
Fit the model.
- Parameters
X – Peptides
min_len – minimum length of the peptide to be count in. Discard otherwise.
max_len – maximum length of the peptide to be count in. Discard otherwise.
- Returns
Fitted model (self)
- fit_details_1(what='freq')
Show frequency of AAs (aka motifs, as input for a logo plot)
- Parameters
what – “freq” for frequency, or “log_odds” for log odds
- Returns
the request details
- fit_details_2(what='freq', topk=None)
Show frequency of AAs (aka motifs, as input for a logo plot)
- Parameters
what – “freq” for frequency, “log_odds” for log odds, or “surplus” for P(ab) - P(a)P(b).
topk – If unspecified, return the matrix directly. If specified, sort the values, and return the AA combinations and the values in two data frames. Only the topk values will be returned. Thus, specify 400 if all values are wanted.
- Returns
the request details
- predict_details(X: Iterable[str], *, log_prior=6.906754778648553)
Show prediction details for a list of peptides.
- Parameters
X – input peptides.
log_prior – log(Neg/Pos) as the prior for converting odds to probability.
- Returns
a data frame containing prediction details
- predict_log_odds(X: Iterable[str])
Predict log odds for a list of peptides. This is the recommended measurement to rank peptides because it minimizes numerical issues.
- Parameters
X – Input peptides
- Returns
log odds
- predict_log_proba(X: Iterable[str], *, log_prior: float = 6.906754778648553)
Predict log probability for a list of peptides. This is not the recommended measurement because numerical issues may make it hard to rank peptides. Peptides that are ranked high may have indistinguishable log probabilities. Please use log odds for ranking.
- Parameters
X – input peptides.
log_prior – log(Neg:Pos) as the prior for converting odds to probability.
- Returns
log probabilities
- predict_proba(X: Iterable[str], *, prior: float = 999.0) DataFrame
Predict (linear scale) probability for a list of peptides. Use of this measurement in ranking peptides is discouraged because many will have identical probabilities. Please use log odds for ranking.
- Parameters
X – input peptides.
prior – Neg:Pos ratio as the prior for converting odds to probability.
- Returns
probabilities
- seq2matrix(peptides: Iterable[str], no_warning: bool = False, return_ind: bool = False, min_len: int = 0, max_len: int = 100) Tuple[Optional[List[int]], ndarray]
Helper function to convert sequences to a matrix
- Parameters
peptides – Peptides to be processed.
no_warning – If true, do not warn when ignoring a peptide for unknown AAs (e.g. X)
return_ind – If true, return the indices of the kept peptides in the input. This helps to align the results with the input, even when some inputs are filtered out.
min_len – minimum length of the peptide to be count in. Discard otherwise.
max_len – maximum length of the peptide to be count in. Discard otherwise.
- Returns
The matrix (in numpy) or the indices and the matrix when requested.
- spec_feature_selection(X: ndarray, n_spec: int) list[tuple[int, int]]
Select allele-specific 2nd order motifs
- Parameters
X – The peptide matrix.
n_spec – Number of motifs.
- Returns
A list of motifs