DataStructures - Alignment

Run Module

Run

class msproteomicstoolslib.data_structures.Run.Run(header, header_dict, runid, orig_input_filename=None, filename=None, aligned_filename=None)

A run contains references to identified precursor groups and precursors.

The run stores a reference to precursor groups (heavy/light pairs) identified in the run. It has a unique id and stores the headers from the csv

A run has the following attributes:
  • an identifier that is unique to this run
  • a filename where it originally came from
  • a dictionary of precursor groups which are accessible through the following functions - getPrecursorGroup - hasPrecursor - getPrecursor - addPrecursor
__iter__()

Iterate through all precursor groups identified in this run.

addPrecursor(precursor, peptide_group_label)
getPrecursor(peptide_group_label, trgr_id)

Return precursor corresponding to the given peptide label group and the transition group id

getPrecursorGroup(curr_id)
get_aligned_filename()
get_best_peaks()
get_best_peaks_with_cutoff(cutoff)
get_id()
get_openswath_filename()
hasPrecursor(peptide_group_label, trgr_id)

PrecursorGroup Module

PrecursorGroup

class msproteomicstoolslib.data_structures.PrecursorGroup.PrecursorGroup(peptide_group_label, run)

A set of precursors that are isotopically modified versions of each other.

A collection of precursors that are isotopically modified versions of the same underlying peptide sequence. Generally these are heavy/light forms.

addPrecursor(self, precursor)

Add precursor to peptide group

getAllPeakgroups(self)

Generator of all peakgroups attached to the precursors in this group

getAllPrecursors(self)

Return a list of all precursors in this precursor group

getOverallBestPeakgroup(self)

Get the best peakgroup (by fdr score) of all precursors contained in this precursor group

getPeptideGroupLabel(self)

Get peptide group label

getPrecursor(self, curr_id)

Get the precursor for the given transition group id

Precursor Module

PrecursorBase

class msproteomicstoolslib.data_structures.Precursor.PrecursorBase(this_id, run)

Bases: object

find_closest_in_iRT(delta_assay_rt)
get_all_peakgroups()
get_best_peakgroup()
get_decoy()
get_id()
get_selected_peakgroup()
select_pg(this_id)
set_decoy(decoy)
unselect_pg(id)

GeneralPrecursor

class msproteomicstoolslib.data_structures.Precursor.GeneralPrecursor(this_id, run)

Bases: msproteomicstoolslib.data_structures.Precursor.PrecursorBase

A set of peakgroups that belong to the same precursor in a single run.

== Implementation details ==

This is a plain implementation where all peakgroup objects are stored in a simple list, this is not very efficient since many objects need to be created which in Python takes a lot of memory.

add_peakgroup(peakgroup)
append(transitiongroup)
find_closest_in_iRT(delta_assay_rt)
get_all_peakgroups()
get_best_peakgroup()

Return the best peakgroup according to fdr score

get_run_id()
get_selected_peakgroup()
id
peakgroups
precursor_group
protein_name
run
sequence

Precursor

class msproteomicstoolslib.data_structures.Precursor.Precursor(this_id, run)

Bases: msproteomicstoolslib.data_structures.Precursor.PrecursorBase

A set of peakgroups that belong to the same precursor in a single run.

Each precursor has a backreference to its precursor group (heavy/light pair) it belongs to, the run it belongs to as well as its amino acid sequence. Furthermore, a unique id for the precursor and the protein name are stored.

A precursor can return its best transition group, the selected peakgroup, or can return the transition group that is closest to a given iRT time. Its id is the transition_group_id (e.g. the id of the chromatogram)

The “selected” peakgroup is represented by the peakgroup that belongs to cluster number 1 (cluster_id == 1) which in this case is “special”.

== Implementation details ==

For memory reasons, we store all information about the peakgroup in a tuple (invariable). This tuple contains a unique feature id, a score and a retention time. Additionally, we also store, in which cluster the peakgroup belongs (if the user sets this).

A peakgroup has the following attributes:
  • an identifier that is unique among all other precursors
  • a set of peakgroups
  • a back-reference to the run it belongs to
add_peakgroup_tpl(pg_tuple, tpl_id, cluster_id=-1)

Adds a peakgroup to this precursor.

The peakgroup should be a tuple of length 4 with the following components:
  1. id
  2. quality score (FDR)
  3. retention time (normalized)

3. intensity (4. d_score optional)

cluster_ids_
find_closest_in_iRT(delta_assay_rt)
getAllPeakgroups()
getClusteredPeakgroups()
getPrecursorGroup()
get_all_peakgroups()
get_best_peakgroup()
get_id()
get_run_id()
get_selected_peakgroup()
id
peakgroups_
precursor_group
protein_name
run
select_pg(this_id)
sequence
setClusterID(this_id, cl_id)
unselect_all()
unselect_pg(this_id)

PeakGroup Module

PeakGroupBase

class msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase

Bases: object

cluster_id_
fdr_score
get_cluster_id()
get_fdr_score()
get_feature_id()
get_intensity()
get_normalized_retentiontime()
get_value(value)
id_
intensity_
is_selected()
normalized_retentiontime
select_this_peakgroup()
set_fdr_score(fdr_score)
set_feature_id(id_)
set_intensity(intensity)
set_normalized_retentiontime(normalized_retentiontime)
set_value(key, value)

MinimalPeakGroup

class msproteomicstoolslib.data_structures.PeakGroup.MinimalPeakGroup(unique_id, fdr_score, assay_rt, selected, cluster_id, peptide, intensity=None, dscore=None)

Bases: msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase

A single peakgroup that is defined by a retention time in a chromatogram of multiple transitions. Additionally it has an fdr_score and it has an aligned RT (e.g. retention time in normalized space). A peakgroup can be selected for quantification or not (this is stored as having cluster_id == 1).

Note that for performance reasons, the peakgroups are created on-the-fly and not stored as objects but rather as tuples in “Peptide”.

Each peak group has a unique id, a score (fdr score usually), a retention time as well as a back-reference to the precursor that generated the peakgroup. In this case, the peak group can also be assigned a cluster id (where the cluster 1 is special as the one we will use for quantification).

get_cluster_id()
get_dscore()
print_out()
select_this_peakgroup()
setClusterID(id_)
set_fdr_score(fdr_score)
set_feature_id(id_)
set_intensity(intensity)
set_normalized_retentiontime(normalized_retentiontime)

GuiPeakGroup

class msproteomicstoolslib.data_structures.PeakGroup.GuiPeakGroup(fdr_score, intensity, leftWidth, rightWidth, peptide)

Bases: msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase

A single peakgroup that is defined by a retention time in a chromatogram of multiple transitions.

get_value(value)

GeneralPeakGroup

class msproteomicstoolslib.data_structures.PeakGroup.GeneralPeakGroup(row, run, peptide)

Bases: msproteomicstoolslib.data_structures.PeakGroup.PeakGroupBase

get_dscore()
get_value(value)
peptide
print_out()
row
run
setClusterID(clid)
set_value(key, value)

DataStructures - Basic

Aminoacides Module

Aminoacid

class msproteomicstoolslib.data_structures.aminoacides.Aminoacid(name, code, code3, composition)

Class to hold information about a single Amino Acid (AA)

code = None

One letter code

code3 = None

Three letter code

composition = None

Elemental composition

elementsLib = None

Library of elements

name = None

Full name of the AA

Aminoacides

class msproteomicstoolslib.data_structures.aminoacides.Aminoacides
addAminoacid(aminoacid)
getAminoacid(code)
initAminoacides()

Modifications Module

Modification

class msproteomicstoolslib.data_structures.modifications.Modification(aminoacid, tpp_Mod, unimodAccession, peakViewAccession, is_labeling, composition)

A modification on an Aminoacid

codes = ['TPP', 'unimod', 'ProteinPilot']

Available modification formats

getcode(code)

Modifications

class msproteomicstoolslib.data_structures.modifications.Modifications

A collection of modifications

appendModification(modification)
is_bool(expression)
printModifications()
readModificationsFile(modificationsfile)

It reads a tsv file with additional modifications. Modifications will be appended to the default modifications of this class. Tsv file headers & an example: modified-AA TPP-nomenclature Unimod-Accession ProteinPilot-nomenclature is_a_labeling composition-dictionary S S[167] 21 [Pho] False {‘H’ : 1,’O’ : 3, ‘P’ : 1}

translateModificationsFromSequence(sequence, code, aaLib=None)

Returns a Peptide object, given a sequence with modifications in any of the available codes. The code (TPP, Unimod,...) to be translated must be given.

Peak Module

Peak

class msproteomicstoolslib.data_structures.peak.Peak(str=None, spectraST=False)

Represents one peak of a spectrum.

init_with_self(peak)
initialize(peak, intensity, peak_annotation, statistics)
parse_str(peak)
to_write_string()

Peptide Module

Peptide

class msproteomicstoolslib.data_structures.peptide.Peptide(sequence, modifications={}, protein='', aminoacidLib=None)
addSpectrum(spectrum)

Deprecated definition

all_ions(ionseries=None, frg_z_list=[1, 2], fragmentlossgains=[0], mass_limits=None, label='')

Returns all the fragment ions of the peptide in a tuple of two objects: (annotated, ionmasses_only) annotated is a list of tuples as : (ion_type, ion_number, ion_charge, lossgain, fragment_mz) ionmasses_only is a list of fragment masses. When ionseries is not provided, all existing ion series (see: Peptide.iontypes) will be calculated. When frg_z_list is not provided, fragment ion charge states +1 and +2 will be used.

calIsoforms(switchingModification, modLibrary)

This returns the full list of peptide species of the same peptide family (isobaric, same composition, different modification site. The list is given as a list of Peptide objects. switchingModification must be given as a Modification object.

cal_UIS(otherPeptidesList, UISorder=2, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2], mass_limits=None)

It calculates the UIS for a given peptide referred to a given list of other peptides. It returns a tuple of two objects all_UIS, and all_UIS_annotated. all_UIS contains only a mass list.

comparePeptideFragments(otherPeptidesList, ionseries=None, fragmentlossgains=[0], precision=1e-08, frg_z_list=[1, 2])

This returns a tuple of lists: (CommonFragments, differentialFragments). The differentialFragmentMasses are the masses of the __self__ peptide are not shared with any of the peptides listed in the otherPeptidesList. otherPeptidesList must be a list of Peptide objects. The fragments are reported as a tuple : (ionserie,ion_number,ion_charge,frqgmentlossgain,mass)

fragmentSequence(ion_type, frg_number)
getDeltaMassFromSequence(sequence)
getMZ(charge, label='')
getMZfragment(ion_type, ion_number, ion_charge, label='', fragmentlossgain=0.0)
getSequenceWithMods(code)
get_decoy_Q3(frg_serie, frg_nr, frg_z, blackList=[], max_tries=1000)
pseudoreverse(sequence='None')
shuffle_sequence()

Residues Module

Residues

class msproteomicstoolslib.data_structures.Residues.Residues(type='mono')

A class that contains information elements, amino acids and modifications. It stores mainly masse of these but also chemical formulas.

The most commonly used properties are:
  • Residues.average_elments : element weights
  • Residues.monoisotopic_elments : element weights
  • Residues.aa_codes : Three and One letter amino acid codes
  • Residues.aa_names : English names of the amino acids
  • Residues.aa_sum_formulas_text : Chemical formulas of all amino acids
  • Residues.aa_sum_formulas: Chemical formulas of all amino acids as hash
  • Residues.mass_xxx: monoisotopic masses of different compounds (NH3, H2O, CO, HPO4 etc)
  • Residues.average_data: average weight of amino acids
  • Residues.monoisotopic_data: monoisotopic weight of amino acids
  • Residues.monoisotopic_mod: monoisotopic modification data
  • Residues.mod_mapping: mapping of + notation to absolute weight notation (K[+8] to K[136])
  • Residues.Hydropathy: Hydropathy of amino acids (gravy scores)
  • TODO hydrophobicity of amino acids
  • TODO basicity of amino acids
  • TODO helicity of amino acids
  • Residues.pI: pI of amino acids

DDB Module

DDB

Abstraction layer to the 2DDB software framework.