protein sequence analysis
Projects with this topic
-
Small C programs I wrote for the paper "Le Novère N., Corringer P.J. and Changeux J.P. Improved secondary structure predictions for a nicotinic receptor subunit.Incorporation of solvent accessibility and experimental data into a 2D representation. Biophysical Journal (1999), 76: 2329-2345."
Updated -
DPCfam Workstation version. Runs on Linux-based systems. Developed and tested on Ubuntu 18. DPCfamW uses the moodycamel::ConcurrentQueue library ( https://github.com/cameron314/concurrentqueue ) freely available provided citation (Simplified BSD license). This version replicates the pipeline used in to anlayze UniRef50 (v. 2017_07) as in Unsupervised protein family classification by Density Peak clustering, Russo ET, 2020, PhD Thesis ( http://hdl.handle.net/20.500.11767/116345 ), but with smaller datasets. Largest dataset we analysed is the TESTproteins_cd50.fasta datased we provide in this package. Due to memory bounds we do not guarantee that the abalysis of largest datasets is acheivable with this version.
Updated -
This repository contiains the implementation of DPC-based algorithm as described in Russo, E.T., Laio, A. & Punta, M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics 22, 121 (2021). https://doi.org/10.1186/s12859-021-04013-x. Note that the implementation has been written with the puropose of analysing, on a traditional workstation (8GB ram, 4-8 cores), query datasets with up to 5000 proteins, as those analysed in the reference paper.
Updated