Create a connected graph of common DBpedia entities

Create a connected graph of common DBpedia entities - python

My problem is such: Say I have 4 entities: Renoir, Newton, Leibniz and Pissaro. I need to create a connected graph of all entities common to them from the Dbpedia Ontology.
Example: This is a connected graph between Renoir and Pissaro from DBPedia. The nodes in between are the DBPedia schema's common to both. See image: http://postimg.org/image/6037y9lu1/
We need such a graph between the 4: Renoir, Newton, Leibniz and Pissaro.
http://postimg.org/image/vud0o1lu1/
How should this be done?
I’m novice to DPPedia, R or anything related. Any help is useful.
My objective of doing this is to find transitive connections between entities at conceptual level.

Have you tried to use relFinder? (http://www.visualdataweb.org/relfinder/relfinder.php) It serves precisely this purpose. I attach the graph I obtained when I introduced the four entities in your example:
As you can see, if you want to find a connection between them at a conceptual level you should aim for the "influencedBy"/"influences" relationship.

Related

How can i create a graph/tree programmatically to generate test data

I have a need for generating data for performance testing for an application which has data with lot of relations between entities. Here is example.
DivA
DivA[Payroll,HR,IT]
Payroll[Location,Classification,files]
HR[Location,Training,Compliance]
IT[Clearance,Experience,Compliance]
Location[City,Country]
Classification[ExemptionType,Expiry date]
....
From above "schema"
I need to generate data using following algorithm
Create parent entity (Ex: Consumer Electronics Division )
Populate all children (Ex: Consumer Electronics Division [Payroll,HR,IT] )
Check if children has more children (Ex: Consumer Electronics Division [Payroll[Location,Classification,files],HR [Location,Training,Compliance],IT[Clearance,Experience,Compliance]]
....
keep going until you don't find any more children.
Is there any algorithm/Data structure that helps to create data like this easily?
Thank you!

Diagram
You can find many graph algorithms, but if you want to do conceptual research on the subject, you can be free to choose and develop the terminology and algorithms.
In terms of answer or idea, I would like to point the graph above.
The direction of the arrows, whether they turn backwards, and many other details will determine the graph algorithm.
If you can picture the subject in a drawing -not have to be literally- with the goal you want to achieve, it may be possible to improve the answer.
PS. Lac of rep. prevents me from posting the image.I can only link the graph

Querying the shared nodes in a RDF graph

I have a graph of RDF data, that is the result of a SPARQL query in rdflib, but this question is valid just on any endpoint too. The graph looks like the picture below.
I want to find a way to query the nodes that are shared between two clusters. Those are basically the nodes that are:
Subject to two objects
Object to two subjects
Object to a subject, and, then subject to another object
I tried with Graph.subjects() and Graph.objects() on rdflib it seems to me that they are only iterable and I have to iterate the whole graph three times, for each of the above scenarios, and it would result in a lot of double counting.
I was wondering if anyone has an idea on how to do this in a better way, perhaps within SPARQL to begin with.

Find probability of tags coming together from given data

I just need an algorithm to solve the following problem in an efficient manner.
I have tuples with combination of tags which usually come together.For example
(python, django, flask, numpy),
(java, spring),
(mysql, sql, join),
(javascript, angularjs, ajax, deferred)
Now I have two requirements.
I need to form different categories from given data.
Given a new tag or tuple of tags, I need to find the probability of this tag coming together with all other distinct tags in data
For example :
Say new tuple is (nodejs, ajax)
then the probabilities might be
(nodejs, ajax) - (javascript, angularjs, ajax, deferred) - .60
(nodejs, ajax) - (mysql, sql, join) - .20
(nodejs, ajax) - (java, spring) - .20
etc
How should I go about solving this.

I would suggest treating this as a graph problem, tags are nodes and the number of occurence of say (tag1,tag2) is the weight of the edge between tag1 and tag2 nodes. You can possibly then generate recommended tags using nearest neighbour algorithm or even community detection (which tags are always co-mentioned together).
With a well constructed graph, enough initial data and some normalisation, I think it would be possible to output probability say of link between cluster1 =(tag1,tag2) with cluster2=(tag3,tag4,tag5).

So,the best approach that solved this problem was basically Apriori algorithm. It will provide association rules for the transnational database (considering every row as a transaction).
Below is a link for a very simple tutorial with implementation.
http://aimotion.blogspot.com/2013/01/machine-learning-and-data-mining.html

Basic questions about nested blockmodel in graph-tool

Very briefly, two-three basic questions about the minimize_nested_blockmodel_dl function in graph-tool library. Is there a way to figure out which vertex falls onto which block? In other words, to extract a list from each block, containing the labels of its vertices.
The hierarchical visualization is rather difficult to understand for amateurs in network theory, e.g. are the squares with directed edges that are drawn meant to implicate the main direction of the underlying edges between two blocks under consideration? The blocks are nicely shown using different colors, but on a very conceptual level, which types of patterns or edge/vertex properties are behind the block categorization of vertices? In other words, when two vertices are in the same block, what can I say about their common properties?

Regarding your first question, it is fairly straightforward: The minimize_nested_blockmodel_dl() function returns a NestedBlockState object:
g = collection.data["football"]
state = minimize_nested_blockmodel_dl(g)
you can query the group membership of the nodes by inspecting the first level of the hierarchy:
lstate = state.levels[0]
This is a BlockState object, from which we get the group memberships via the get_blocks() method:
b = lstate.get_blocks()
print(b[30]) # prints the group membership of node 30
Regarding your second question, the stochastic block model assumes that nodes that belong to the same group have the same probability of connecting to the rest of the network. Hence, nodes that get classified in the same group by the function above have similar connectivity patterns. For example, if we look at the fit for the football network:
state.draw(output="football.png")
We see that nodes that belong to the same group tend to have more connections to other nodes of the same group --- a typical example of community structure. However, this is just one of the many possibilities that can be uncovered by the stochastic block model. Other topological patterns include core-periphery organization, bipartiteness, etc.

Graphical Visualization of XML data

I have an XML file that looks like this:
<rebase>
<Organism>
<Name>Aminomonas paucivorans</Name>
<Enzyme>M1.Apa12260I</Enzyme>
<Motif>GGAGNNNNNGGC</Motif>
<Enzyme>M2.Apa12260I</Enzyme>
<Motif>GGAGNNNNNGGC</Motif>
</Organism>
<Organism>
<Name>Bacillus cellulosilyticus</Name>
<Enzyme>M1.BceNI</Enzyme>
<Motif>CCCNNNNNCTC</Motif>
<Enzyme>M2.BceNI</Enzyme>
<Motif>CCCNNNNNCTC</Motif>
</Organism>
</rebase>
I want to visualize this XML data into a graphical format. The connectivity is such that a lot of enzymes can contain common motifs but no organims can have similar enzymes. I looked at d3.js but I dont think it has what im looking for. I was really excited with the visualization neo4j seems to provide but i will need to learn it from scratch. However I havent come across any good tutorials for importing or creating a graph in neo4j via XML datasets. I know in the world of programming anything is possible so I wanted to know the possible ways I could import my data (preferably using python) to a neo4j database to visualize it.
UPDATE
I tried following this answer (second answer under this question). I created the 2 CSV files that he suggested. However the query has a lot of syntax errors , such as :
Invalid input 'S': expected 'n/N' (line 6, column 2)
"USING PERIODIC COMMIT"
WITH is required between CREATE and LOAD CSV (line 6, column 1)
"MATCH (o:Organism { name: csvLine.name}),(m:Motif { name: csvLine.motif})"
My cypher query skill are extremely limited and i couldnt get any imports to work so fixing the query by myself is proving to be really difficult. Any help will be greately appreciated

There is also a series of posts how to import XML into Neo4j.
http://supercompiler.wordpress.com/2014/07/22/navigating-xml-graph-using-cypher/
http://supercompiler.wordpress.com/2014/04/06/visualizing-an-xml-as-a-graph-neo4j-101/
First you should model how your data should look like as a graph, which entities do you need for your use-cases and which semantic connections.
In general if you can load the data in python, you can use py2neo or neo4jrestclient (see https://neo4j.com/developer/python/) to import your data into your model.

for this i would suggest to use directly gephi . at least a year ago it worked flawlessly, it supports xml/csv data format import directly and there is no need to use neo4j as pre-processor.
edit
oh, i see now, i though the connections are already included. in this case, you must create all the data from xml as a separate node - new node for each enzyme and motif and also for each organism(with a parameter name). those enzyme nad motif nodes must be unique - i.e. no duplicates. when creating an organism node, you connect the organism to its enzyme and motif nodes by a relationship. after this is done, querying/visualizing similar nodes is no problem, since common nodes share at least one of the enzyme/motif.
i don't know any smart way to import data xml to neo4j, but you should have no problem to convert it into two csv files. the format of that csv would than be:
first file:
name,enzyme
Aminomonas paucivorans,M1.Apa12260I
Aminomonas paucivorans,M2.Apa12260I
Bacillus cellulosilyticus,M1.BceNI
Bacillus cellulosilyticus,M2.BceNI
second file (i don't understand why the motif is duplicite thought):
name,motif
Aminomonas paucivorans,GGAGNNNNNGGC
Aminomonas paucivorans,GGAGNNNNNGGC
Bacillus cellulosilyticus,CCCNNNNNCTC
Bacillus cellulosilyticus,CCCNNNNNCTC
now we are going to do the import, which creates unique nodes and relationships (thus the above duplicite motifs would transfer just into 1 unique relation) (if neccessary, it is possible to have multiple relationships to the same motif node, too):
(i'm not sure with this import but it should work):
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file1.csv" AS csvLine
MATCH (o:Organism { name: csvLine.name}),(e:Enzyme { name: csvLine.enzyme})
CREATE (o)-[:has_enzyme]->(e) //or maybe CREATE UNIQUE?
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file2.csv" AS csvLine
MATCH (o:Organism { name: csvLine.name}),(m:Motif { name: csvLine.motif})
CREATE (o)-[:has_motif]->(m) //or maybe CREATE UNIQUE?
this shall create th graph with 2 organism nodes, 4 enzyme nodes and 2 motif nodes. each organism node should than have a relationship to its enzymes and motifs. after this is done, you can move forward to the visualization part described at the beginning.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create a connected graph of common DBpedia entities - python

Related

How can i create a graph/tree programmatically to generate test data

Querying the shared nodes in a RDF graph

Find probability of tags coming together from given data

Basic questions about nested blockmodel in graph-tool

Graphical Visualization of XML data

Categories

Resources