bilalsm commited on
Commit
603e7be
·
verified ·
1 Parent(s): 595350b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -9
README.md CHANGED
@@ -10,25 +10,47 @@ tags:
10
  pipeline_tag: feature-extraction
11
  ---
12
 
13
- # Specollate Model
14
 
15
- ## Model Description
16
 
17
- SpeCollate is the first Deep Learning-based peptide-spectrum similarity network. It allows searching a peptide database by generating embeddings for both mass spectra and database peptides. K-nearest neighbor search is performed on a GPU in the embedding space to find the k (usually k=5) nearest peptide for each spectrum.
 
 
18
 
19
  ## Architecture
20
- SpeCollate network consists of two branch, i.e., Spectrum Sub-Network (SSN) and Peptide Sub-Network (PSN). SSN processes spectra and generates spectral embeddings while PSN processes peptide sequences and generates peptides embeddings. Both types of embeddings are generated in real space of dimension 256. The network architecture is shown in Fig 1 below.
 
 
 
 
 
 
 
21
 
 
22
 
 
 
23
 
24
- ## Model Details
 
 
 
 
 
25
 
26
- The Specollate model:
27
- 1. Encodes mass spectra into 512-dimensional embeddings
28
- 2. Encodes peptide sequences into matching embedding space
29
- 3. Enables fast cosine similarity search for PSM identification
30
 
 
31
 
 
 
 
 
 
 
 
32
 
33
  ## Citation
34
 
@@ -40,3 +62,6 @@ This model and associated code are released under the CC-BY-NC-ND 4.0 license an
40
  ## Links
41
 
42
  - **GitHub:** https://github.com/pcdslab/SpeCollate
 
 
 
 
10
  pipeline_tag: feature-extraction
11
  ---
12
 
13
+ # SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions
14
 
15
+ [Github](https://github.com/pcdslab/SpeCollate) | [Cite](#citation)
16
 
17
+ ## Abstract
18
+
19
+ Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
20
 
21
  ## Architecture
22
+ SpeCollate network consists of two branch, i.e., Spectrum Sub-Network (SSN) and Peptide Sub-Network (PSN). SSN processes spectra and generates spectral embeddings while PSN processes peptide sequences and generates peptides embeddings. Both types of embeddings are generated in real space of dimension 256.
23
+
24
+
25
+ Install:
26
+
27
+ ```bash
28
+ pip install specollate
29
+ ```
30
 
31
+ Quick usage:
32
 
33
+ ```python
34
+ from specollate import SpeCollateSearch
35
 
36
+ searcher = SpeCollateSearch(device='cuda') # or 'cpu'
37
+ searcher.search(
38
+ mgf_dir='path/to/mgf_dir',
39
+ peptide_db='path/to/peptide_db',
40
+ output_dir='./specollate_output'
41
+ )
42
 
43
+ ```
 
 
 
44
 
45
+ Sample search:
46
 
47
+ ```python
48
+ from specollate import SpeCollateSearch
49
+
50
+ searcher = SpeCollateSearch(device='cuda') # or 'cpu'
51
+ searcher.search_with_sample_data(output_dir='./specollate_output')
52
+
53
+ ```
54
 
55
  ## Citation
56
 
 
62
  ## Links
63
 
64
  - **GitHub:** https://github.com/pcdslab/SpeCollate
65
+
66
+ ## Contact
67
+ For any additional questions or comments, contact Fahad Saeed (fsaeed@fiu.edu).