Construction of Secondary Metabolism Gene Cluster (SMGC) families was archived in a multi-step process. The initial step is the identification of SMGC which is done using an in-house wrapper for the SMURF algorithm [REF]. Protein products of the resulting SMGCs were compared to each other by alignment using BLASTp (BLAST+ suite version 2.2.27, e-value <= 10^-10). Subsequently, a score based on BLASTp identity and shared proteins was created to determine the similarity between gene clusters.
Using these scores, we created a weighted network of SMGC clusters and used a random walk community detection algorithm (R version 3.3.2, igraph_1.0.1, cutoff 1 step) to determine families of SMGC clusters. Finally, we ran another round of random walk clustering on the communities which contained more members than species in the analysis.
- Select an organism and a JGI protein ID and view the clusters and neighbor clusters that protein is found in. Note that you must know the protein identifier to use this app.
- Paste in your own sequence and find the most similar match in the aspmine data set and view the clusters it is found in.
Note, when the information or warning text in the app goes pale, the app is rendering the tables and images, please be patient.