Subnetwork analysis on proteomics data

Subnetwork analysis on proteomics data - python

Background: I have proteomics data from seven samples (pvalue/ log-score of fold change), I want to analyze the data by network (interactome) analyses.
Question: I like to create an interactome of all the proteins from the data, and map the proteins to this network that have significant pvalue (compare to control),
after that I like to create subnetwork(s); also like to add the pathways enrichments to the subnetwork(s).
Request: please suggest online or standalone tools (or algorithm) that fits my requirements.
Thanks !

For creating network graphs to represent your protein-protein interactions, I would recommend taking a look at the networkx library. You can use it to pass in some nodes (proteins of interest) and edges (interactions) and generate a graph. I believe that it can also generate subnetworks of these graphs as well.

Related

How to create a NetworkX graph from a Geographic Markup Language file?

I wish to create a road network graph in NetworkX/OSMnx from the Ordinance Surveys' (OS') Open Roads dataset which I have downloaded as a Geographic Markup Language (GML) file. After an embarrassingly long time, and thanks to this answer, I realised that this GML file format is not the same as the Graph Modelling Language that NetworkX/OSMnx accept and have a built-in function for.
These file formats are completely new to me and so I wanted to ask if there is any way to load the OS' Open Road data, that is in GML format, into NetworkX/OSMnx so I can perform some network analysis on it? Ideally, this would be using Python.
Alternatively, I have managed to use OSMnx directly to create a road network from Open Street Maps data, but I wanted to see if the OS Open Road data was a bit more complete.

OSMnx is designed to work with OpenStreetMap data. If you can massage your input data into an OSM-like format, it may be possible to load them with the graph_from_gdfs function. You will need one layer of nodes and one layer of edges. Then the steps would look something like:
Use ogr2ogr to convert your GML node and edge files to GeoPackage layers
Load your node and edge GeoPackage layers with GeoPandas as GeoDataFrames
Ensure these GeoDataFrames have the required index and columns
Use OSMnx's graph_from_gdfs function to convert the GeoDataFrames to a NetworkX MultiDiGraph

Machine learning - generate new data from current dataset

I have created a dataset from some sensor measurements and some labels and did some classification on it with good results. However, since my the amount of data in my dataset is relatively small (1400 examples) I want to generate more data based on this data. Each row from my dataset consists of 32 numeric values and a label.
Which would be the best approach to generate more data based on the existing dataset I have? So far I have looked at Generative Adversarial Networks and Autoencoders, but I don't think this methods are suitable in my case.
Until now I have worked in Scikit-learn but I could use other libraries as well.

The keyword is here Data Augmentation. You use your available data and modify them slightly to generate additional data which are a little bit different from your source data.
Please take a look at this link. The author uses Data Augmentation to rotate and flip the cat image. So he generate 6 additional images with different perspectives from a single source image.
If you transfer this idea to your sensor data you can add some kind of random noise to your data to increase the dataset. You can find a simple example for Data Aufmentation for time series data here.
Another approach is to window the data and move the window a small step, so the data in the window are a little bit different.
The guys from the statistics stackexchange write something about it. Please check this for additional information.

Incomplete feed dictionary for graph consisting of multiple separate parts?

I read somewhere around here that running multiple Tensorflow graphs in a single process is considered bad practice. Therefore, I now have a single graph which consists of multiple separate "sub-graphs" of the same structure. Their purpose is to generate specific models that describe production tolerances of multiple sensors of the same type. The tolerances are different for each sensor.
I'm trying to use TF to optimize a loss function in order to come up with a numerical description (i.e. a tensor) of that production tolerance for each sensor separately.
In order to achieve that and avoid having to deal with multiple graphs (i.e. avoid bad practice), I built a graph that contains a distinct sub-graph for each sensor.
The problem is that I only get data from a single sensor at a time. So, I cannot build a feed_dict that has all placeholders for all sub-graphs filled with numbers (all zeros wouldn't make sense).
TF now complains about missing values for certain placeholders, namely those of the other sensors that I don't have yet. So basically I would like to calculate a sub-graph without feeding the other sub-graphs.
Is that at all possible and, if yes, what will I have to do in order to hand an incomplete feed_dict to the graph?
If it's not possible to train only parts of a graph, even if they have no connection to other parts, what's the royal road to create models with the same structure but different weights that can get trained separately but don't use multiple graphs?

Analyse audio files with Python

I actually have Photodiode connect to my PC an do capturing with Audacity.
I want to improve this by using an old RPI1 as dedicated test station. As result the shutter speed should appear on the console. I would prefere a python solution for getting signal an analyse it.
Can anyone give me some suggestions? I played around with oct2py, but i dont really under stand how to calculate the time between the two peak of the signal.

I have no expertise on sound analysis with Python and this is what I found doing some internet research as far as I am interested by this topic
pyAudioAnalysis for an eponym purpose
You an use pyAudioAnalysis developed by Theodoros Giannakopoulos
Towards your end, function mtFileClassification() from audioSegmentation.py can be a good start. This function
splits an audio signal to successive mid-term segments and extracts mid-term feature statistics from each of these sgments, using mtFeatureExtraction() from audioFeatureExtraction.py
classifies each segment using a pre-trained supervised model
merges successive fix-sized segments that share the same class label to larger segments
visualize statistics regarding the results of the segmentation - classification process.
For instance
from pyAudioAnalysis import audioSegmentation as aS
[flagsInd, classesAll, acc, CM] = aS.mtFileClassification("data/scottish.wav","data/svmSM", "svm", True, 'data/scottish.segments')
Note that the last argument of this function is a .segment file. This is used as ground-truth (if available) in order to estimate the overall performance of the classification-segmentation method. If this file does not exist, the performance measure is not calculated. These files are simple comma-separated files of the format: ,,. For example:
0.01,9.90,speech
9.90,10.70,silence
10.70,23.50,speech
23.50,184.30,music
184.30,185.10,silence
185.10,200.75,speech
...
If I have well understood your question this is at least what you want to generate isn't it ? I rather think you have to provide it there.
Most of these information are directly quoted from his wiki which I suggest you to read it. Yet don't hesitate to reach out as far as I am really interested by this topic
Other available libraries for audio analysis :

How to convert an activity diagram to a petrinet, synthesize it and then analyse it?

I am doing a project on Petri nets.
I have generated an activity diagram (in the .xmi format) using the UML tool Umbrello. I need to convert it to a Petri net and then synthesize it using the tool Petrify. But in order to convert it to the Petri net, the activity diagram has to be converted into the XML format.
In order to synthesize using petrify, the Petri net has to be converted into .g format, and only afterwards to the .xml format. In short I need to integrate the tools Umbrello, UML2owfn, Petrify and PIPE. How could I integrate these tools using Python?

Conveniently activity diagrams more or less have the semantics of Petri Nets anyway. Here's the deal: you will need to read and parse the activity digram XML first. There are several good options for this in Python; unless your activity diagrams are just massive, you should probably choose one that keeps the whole XML element tree in memory.
Then convert the activity diagram into a bipartite graph. Since an activity diagram can have adjacent activity nodes (bubbles) without transitions (lines), collapse all the adjacent activity nodes into one place in the petri net.
There are several graph libraries in Python as well, but this is fairly simple and it may be easier to just represent the graph as lists of places and transitions, and a list of pairs for the edges.
Once you've got the patri net graph, just walk it to generate the Petrify input and you should be set. If yu really need those intermediate representations, it should be a SMOP to generate them as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Subnetwork analysis on proteomics data - python

Related

How to create a NetworkX graph from a Geographic Markup Language file?

Machine learning - generate new data from current dataset

Incomplete feed dictionary for graph consisting of multiple separate parts?

Analyse audio files with Python

How to convert an activity diagram to a petrinet, synthesize it and then analyse it?

Categories

Resources