I am stuck into Invalid Semantic exception error following is my code:
import json
from py2neo import neo4j, Node, Relationship, Graph
graph = Graph()
graph.schema.create_uniqueness_constraint("Authors", "auth_name")
graph.schema.create_uniqueness_constraint("Mainstream_News", "id")
with open("example.json") as f:
for line in f:
while True:
try:
file = json.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# Now creating the node and relationships
news = graph.merge_one("Mainstream_News", {"id": unicode(file["_id"]["$oid"]), "entry_url": unicode(file["entry_url"]),"title":unicode(file["title"])})
authors = graph.merge_one("Authors", {"auth_name": unicode(file["auth_name"]), "auth_url" : unicode(file["auth_url"]), "auth_eml" : unicode(file["auth_eml"])})
graph.create_unique(Relationship(news, "hasAuthor", authors))
I am trying to connect the news node with authors node.My Json file looks like this:
{
"_id": {
"$oid": "54933912bf4620870115a2e3"
},
"auth_eml": "",
"auth_url": "",
"cat": [],
"auth_name": "Max Bond",
"out_link": [],
"entry_url": [
"http://www.usda.gov/wps/portal/usda/!ut/p/c5/04_SB8K8xLLM9MSSzPy8xBz9CP0os_gAC9-wMJ8QY0MDpxBDA09nXw9DFxcXQ-cAA_1wkA5kFaGuQBXeASbmnu4uBgbe5hB5AxzA0UDfzyM_N1W_IDs7zdFRUREAZXAypA!!/dl3/d3/L2dJQSEvUUt3QS9ZQnZ3LzZfUDhNVlZMVDMxMEJUMTBJQ01IMURERDFDUDA!/?navtype=SU&navid=AGRICULTURE"
],
"out_link_norm": [],
"title": "United States Department of Agriculture - Agriculture",
"entry_url_norm": [
"usda.gov/wps/portal/usda/!ut/p/c5/04_SB8K8xLLM9MSSzPy8xBz9CP0os_gAC9-wMJ8QY0MDpxBDA09nXw9DFxcXQ-cAA_1wkA5kFaGuQBXeASbmnu4uBgbe5hB5AxzA0UDfzyM_N1W_IDs7zdFRUREAZXAypA!!/dl3/d3/L2dJQSEvUUt3QS9ZQnZ3LzZfUDhNVlZMVDMxMEJUMTBJQ01IMURERDFDUDA!/"
],
"ts": 1290945374000,
"source_url": "",
"content": "\n<a\nhref=\"/wps/portal/usda/!ut/p/c4/04_SB8K8xLLM9MSSzPy8xBz9CP0os_gAC9-wMJ8QY0MDpxBDA09nXw9DFxcXQ-cAA_2CbEdFAEUOjoE!/?navid=AVIAN_INFLUENZA\">\n<b>Avian Influenza, Bird Flu</b></a> <br />\nThe official U.S. government web site for information on pandemic flu and avian influenza\n\n<strong>Pest Management</strong> <br />\nPest management policy, pesticide screening tool, evaluate pesticide risk, conservation\nbuffers, training modules.\n\n<strong>Weather and Climate</strong> <br />\nU.S. agricultural weather highlights, weekly weather and crop bulletin, major world crop areas\nand climatic profiles.\n"
}
The full exception error is like this:
File "/home/mohan/workspace/test.py", line 20, in <module>
news = graph.merge_one("Mainstream_News", {"id": unicode(file["_id"]["$oid"]), "entry_url": unicode(file["entry_url"]),"title":unicode(file["title"])})
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 958, in merge_one
for node in self.merge(label, property_key, property_value, limit=1):
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 946, in merge
response = self.cypher.post(statement, parameters)
File "/usr/local/lib/python2.7/dist-packages/py2neo/cypher/core.py", line 86, in post
return self.resource.post(payload)
File "/usr/local/lib/python2.7/dist-packages/py2neo/core.py", line 331, in post
raise_from(self.error_class(message, **content), error)
File "/usr/local/lib/python2.7/dist-packages/py2neo/util.py", line 235, in raise_from
raise exception
py2neo.error.InvalidSemanticsException: Cannot merge node using null property value for {'title': u'United States Department of Agriculture - Agriculture', 'id': u'54933912bf4620870115a2e3', 'entry_url': u"[u'http://www.usda.gov/wps/portal/usda/!ut/p/c5/04_SB8K8xLLM9MSSzPy8xBz9CP0os_gAC9-wMJ8QY0MDpxBDA09nXw9DFxcXQ-cAA_1wkA5kFaGuQBXeASbmnu4uBgbe5hB5AxzA0UDfzyM_N1W_IDs7zdFRUREAZXAypA!!/dl3/d3/L2dJQSEvUUt3QS9ZQnZ3LzZfUDhNVlZMVDMxMEJUMTBJQ01IMURERDFDUDA!/?navtype=SU&navid=AGRICULTURE']"}
Any suggestions to fix this ?
Yeah, I see what's going on here. If you look at the py2neo API and look for the merge_one function, it's defined this way:
merge_one(label, property_key=None, property_value=None)
Match or create a node by label and optional property and
return a single matching node. This method is intended to be
used with a unique constraint and does not fail if more than
one matching node is found.
The way that you're calling it is with a string first (label) and then a dictionary:
news = graph.merge_one("Mainstream_News", {"id": unicode(file["_id"]["$oid"]), "entry_url": unicode(file["entry_url"]),"title":unicode(file["title"])})
Your error message says that py2neo is treating the entire dictionary like a property name, and you haven't provided a property value.
So you're calling this function incorrectly. What you should probably be doing is merge_one only on the basis of the id property, then later adding the extra properties you need to the node that comes back.
You need to convert those merge_one calls into something like this:
news = graph.merge_one("Mainstream News", "id", unicode(file["_id]["$oid]))
Note this doesn't give you the extra properties, those you'd add later.
Related
I have a py file to read data from Wordpress API and pass values to another fields of other API. When values are singles, i have no problem, but i don't know how make that:
When i read one field from the API, the states values, comes with code instead the text value. For example, when the text value in Wordpress is Barcelona, returns B, and i'll need that the value returned will be Barcelona.
One example of code with simple fields values:
oClienteT["Direcciones"] = []
oClienteT["Telefono"] = oClienteW["billing"]["phone"]
oClienteT["NombreFiscal"] = oClienteW["first_name"] " " oClienteW["last_name"]
oClienteT["Direcciones"].append( {
"Codigo" : oClienteW["id"],
"Nombre" : oClienteW["billing"]["first_name"],
"Apellidos" : oClienteW["billing"]["last_name"],
"Direccion" : oClienteW["billing"]["address_1"],
"Direccion2" : oClienteW["billing"]["address_2"],
"Poblacion" : oClienteW["billing"]["state"],
"Provincia" : oClienteW["billing"]["city"]
})
When billing city is Madrid and billing state is madrid, Wordpress returns Madrid and M
I need tell thst when Madrid, returns Madrid, and so on.
Make sure to convert to a JSON object before accessing fields (data = json.loads(json_str))
response = { "billing": { "address_1": "C/GUSTAVO ADOLFO BECQUER, 4", "city": "SEVILLA", "state": "SE"}}
print(response["billing"].get("address_1", None))
I've just got solved it:
def initProvincias(self):
self.aProvincias['C'] = 'A Coruña'
self.aProvincias['VI'] = 'Álava'
def getProvincia(self , sCod ):
if not sCod in self.aProvincias:
_logger.info("PROVINCIA NO ENCONTRADA "+str(sCod))
return ""
return self.aProvincias[sCod]
"Provincia" : self.getProvincia( oClienteW["shipping"]["state"] ),
I need to populate a database in neo4j using a json of the following that contains data of some processes. Among them the name of the process, its parents and its children (if any). Here is a part of the json as an example:
[
{
"process": "IPTV_Subscriptions",
"parents": ["IPTV_Navigation","DeviceCertifications-insertion"],
"childs": ["villa_iptv", "villa_ott", "villa_calicux"]
},
{
"process": "IPTV_Navigation",
"parents": [],
"childs": ["IPTV_Subscriptions"],
},
{
"process": "DeviceCertifications-getter",
"parents": [],
"childs": ["DeviceCertifications-insertion"]
},
{
"process": "DeviceCertifications-insertion",
"parents": ["DeviceCertifications-getter"],
"childs": ["IPTV_Subscriptions"]
}
]
With the following Python code I generated, I found that I can create each node with the processes contained in the json in bulk:
import json
from py2neo import Graph
from py2neo.bulk import create_nodes, create_relationships
graph = Graph("bolt://localhost:7687", auth = ("yyyy", "xxxx"))
#Opening json
f = open('/app/conf/data.json',)
processs = json.load(f)
data=[]
for i in processs:
proc=[]
proc.append(i["process"])
data.append(proc)
keys = ["process"]
create_nodes(graph.auto(), data, labels={"process"}, keys=keys)
And checking in neo4j, I see that the nodes are already created.
But now I need to make the relationships. For each process, from the json I know which are the parents and children of that node.
I wanted to take the documentation as an example:
from py2neo import Graph
from py2neo.bulk import create_relationships
g = Graph()
data = [
(("Alice", "Smith"), {"since": 1999}, "ACME"),
(("Bob", "Jones"), {"since": 2002}, "Bob Corp"),
(("Carol", "Singer"), {"since": 1981}, "The Daily Planet"),
]
create_relationships(g.auto(), data, "WORKS_FOR", start_node_key=("Person", "name", "family name"), end_node_key=("Company", "name"))
But it didn't work for me
Having from the json the information of the parents and children, does anyone have an idea of how I can generate the massive relationships? In view of the json example I have, the relationship tags would be ParentOf and ChildOf but I have no idea how they would be generated from python.
Below is the script to create the bulk relationship using py2neo. Let me know if it works for you or not. Another thing, please label your nodes as Process rather than process (notice the upper case P). Then I use the relationship :CHILD_OF. If you want :PARENT_OF then change the tuple in data and swap the first and third item.
import json
from py2neo import Graph
from py2neo.bulk import create_relationships
graph = Graph("neo4j://localhost:7687", auth = ("neo4j", "neo4jay"))
#Opening json
f = open('data2.json',)
processs = json.load(f)
data=[]
for i in processs:
for p in i["parents"]:
data.append((i["process"],{},p))
create_relationships(graph.auto(), data, "CHILD_OF", start_node_key=("Process", "process"), end_node_key=("Process", "process"))
Result:
Hello everyone and happy new year! I'm posting for the first time here since I can't find an answer for my problem, I've been searching for 2 days now and can't figure out what to do and it's driving me crazy... Here's my problem:
I'm using the Steamspy api to download data of the 100 most played games in the last 2 weeks on Steam, I'm able to dump the downloaded data into a .json file (dumped_data.json in this case), and to import it back into my Python code (I can print it). And while I can print the whole thing, it's not really useable as it's 1900 lines of more or less useful data, so I want to be able to print only specific items of the dumped data. In this project, I would like to only print the name, developer, owners and price of the first 10 games on the list. It's my first Python project, and I can't figure out how to do this...
Here's the Python code:
import steamspypi
import json
#gets data using the steamspy api
data_request = dict()
data_request["request"] = "top100in2weeks"
#dumps the data into json file
steam_data = steamspypi.download(data_request)
with open("dumped_data.json", "w") as jsonfile:
json.dump(steam_data, jsonfile, indent=4)
#print values from dumped file
f = open("dumped_data.json")
data= json.load(f)
print(data)
And here's an exemple of the first item in the json file:
{
"570": {
"appid": 570,
"name": "Dota 2",
"developer": "Valve",
"publisher": "Valve",
"score_rank": "",
"positive": 1378093,
"negative": 267290,
"userscore": 0,
"owners": "100,000,000 .. 200,000,000",
"average_forever": 35763,
"average_2weeks": 1831,
"median_forever": 1079,
"median_2weeks": 1020,
"price": "0",
"initialprice": "0",
"discount": "0",
"ccu": 553462
},
Thanks in advance to everyone that's willing to help me, it would mean a lot.
The following prints the values you desire from the first 10 games:
for game in list(data.values())[:10]:
print(game["name"], game["developer"], game["owners"], game["price"])
I have data from a Facebook group feed (24000 odd records in total). Eg.
{
"data": [
{
"message": "MoneyWise its time to vote for the 2017 winners https://www.moneywise.co.uk/home-finances-survey?",
"updated_time": "2017-07-27T21:15:52+0000",
"permalink_url": "https://www.facebook.com/groups/uwpartnersforum/permalink/1745120025791166/",
"from": {
"name": "John Oliver",
"id": "10152744793754666"
},
"id": "1452979881671850_1745120025791166"
},
{
"message": "We often think of communicating as figuring out a really good message and leaving it that. But the annoying fact is that unless we pay close attention to how that message is landing on the other person, not much communication will take place - Alan Alda",
"updated_time": "2017-07-27T21:15:26+0000",
"permalink_url": "https://www.facebook.com/groups/uwpartnersforum/permalink/1744867295816439/",
"from": {
"name": "Adrian Watts",
"id": "10152461880942242"
},
"id": "1452979881671850_1744867295816439"
}
]
}
and I am trying to extract, on comand prompt and in file, "message", "permalink_url", "updated_time", "name" and "id"(one inside from) post by a particular person, say "John Oliver". Following python script works.. mostly:
fhand = open('try1.json')
urlData = fhand.read()
jsonData = json.loads(urlData)
fout = open('output1.txt', 'w')
for i in jsonData["data"]:
if i["from"]["name"] == "John Oliver":
print (i["message"], end = "|")
print (i["permalink_url"], end = "|")
print (i["updated_time"], end = "|")
print (i["from"]["name"], end = "|")
print (i["from"]["id"], end = "\n")
print()
fout.write(str(i["message"]) + "|")
fout.write(str(i["permalink_url"]) + "|")
fout.write(str(i["updated_time"]) + "|")
fout.write(str(i["from"]["name"]) + "|")
fout.write(str(i["from"]["id"]) + "\n")
fout.close()
But I am facing two issues.
Issue 1. If there is no message in any records I am getting traceback:
Traceback (most recent call last):
File "facebook_feed.py", line 36, in <module>
main()
File "facebook_feed.py", line 25, in main
print (i["message"], end = "|")
KeyError: 'message'
So, I need some help in going through the complete file even if there is no message for an object extracting all other details from it.
Issue 2. and this is a strange one ... I have two files "try1.json" with 500 odd records and "trial1.json" with 24000 odd records, with completely same structure. When I open "try1.json" in "Atom" text editor it is colour highlighted smaller file in Atom but "trial1.json" is not colour highlighted bigger file in atom. On running the above script with try1.json, I am getting the KeyError for "message" (as shown above) but for "trial1.json" I get this:
Traceback (most recent call last):
File "facebook_feed.py", line 36, in <module>
main()
File "facebook_feed.py", line 20, in main
if i["from"]["name"] == "John Oliver":
KeyError: 'from'
trial1.json is 17 MB file.. is that an issue?
If you're not sure if i["message"] exists, don't just blindly access it. Either use dict.get, e.g. i.get('message', 'No message found'), or check if it's there first:
if "message" in i:
print (i["message"], end = "|")
You can do the same kind of thing with i["from"].
Atom isn't highlighting the big file because it's big. But if you can successfully json.loads something, it's valid JSON.
I have a Python-script gathering some metrics and saving them to RethinkDB. I have also written a small Flask-application to display the data on a dashboard.
Now I need to run a query to find all rows in a table newer than 1 hour. This is what I got so far:
tzinfo = pytz.timezone('Europe/Oslo')
start_time = tzinfo.localize(datetime.now() - timedelta(hours=1))
r.table('metrics').filter( lambda m:
m.during(start_time, r.now())
).run(connection)
When I try to visit the page I get this error message:
ReqlRuntimeError: Not a TIME pseudotype: `{
"listeners": "6469",
"time": {
"$reql_type$": "TIME",
"epoch_time": 1447581600,
"timezone": "+01:00"
}
}` in:
r.table('metrics').filter(lambda var_1:
var_1.during(r.iso8601('2015-11-18T12:06:20.252415+01:00'), r.now()))
I googled a bit and found this thread which seems to be a similar problem: https://github.com/rethinkdb/rethinkdb/issues/4827, so I revisited how I add new rows to the database as well to see if that was the issue:
def _fix_tz(timestamp):
tzinfo = pytz.timezone('Europe/Oslo')
dt = datetime.strptime(timestamp[:-10], '%Y-%m-%dT%H:%M:%S')
return tzinfo.localize(dt)
...
for row in res:
... remove some data, manipulate some other data ...
r.db('metrics',
{'time': _fix_tz(row['_time']),
...
).run(connection)
'_time' retrieved by my data collection-script contains some garbage I remove, and then create a datetime-object. As far as I can understand from the RethinkDB documentation I should be able to insert these directly, and if I use "data explorer" in RethinkDB's Admin Panel my rows look like this:
{
...
"time": Sun Oct 25 2015 00:00:00 GMT+02:00
}
Update:
I did another test and created a small script to insert data and then retrieve it
import rethinkdb as r
conn = r.connect(host='localhost', port=28015, db='test')
r.table('timetests').insert({
'time': r.now(),
'message': 'foo!'
}).run(conn)
r.table('timetests').insert({
'time': r.now(),
'message': 'bar!'
}).run(conn)
cursor = r.table('timetests').filter(
lambda t: t.during(r.now() - 3600, r.now())
).run(conn)
I still get the same error message:
$ python timestamps.py
Traceback (most recent call last):
File "timestamps.py", line 21, in <module>
).run(conn)
File "/Users/tsg/.virtualenv/p4-datacollector/lib/python2.7/site-packages/rethinkdb/ast.py", line 118, in run
return c._start(self, **global_optargs)
File "/Users/tsg/.virtualenv/p4-datacollector/lib/python2.7/site-packages/rethinkdb/net.py", line 595, in _start
return self._instance.run_query(q, global_optargs.get('noreply', False))
File "/Users/tsg/.virtualenv/p4-datacollector/lib/python2.7/site-packages/rethinkdb/net.py", line 457, in run_query
raise res.make_error(query)
rethinkdb.errors.ReqlQueryLogicError: Not a TIME pseudotype: `{
"id": "5440a912-c80a-42dd-9d27-7ecd6f7187ad",
"message": "bar!",
"time": {
"$reql_type$": "TIME",
"epoch_time": 1447929586.899,
"timezone": "+00:00"
}
}` in:
r.table('timetests').filter(lambda var_1: var_1.during((r.now() - r.expr(3600)), r.now()))
I finally figured it out. The error is in the lambda-expression. You need to use .during() on a specific field. If not the query will try to wrestle the whole row/document into a timestamp
This code works:
cursor = r.table('timetests').filter(
lambda t: t['time'].during(r.now() - 3600, r.now())
).run(conn)