Error: "Cannot find reference to loads in json.py" - python

I am a newbie to python. I am learning the how python works with json.
After writing this code in pycharm, I am getting unresolved references at several locations.
"Import resolves to its containing file".
"Cannot find reference to dumps in json.py "
"Cannot find reference to loads in json.py "
I am getting this error while importing json, loads() and dumps() method are called.
This is the video link from where I am learning to code python.
https://www.youtube.com/watch?v=9N6a-VLBa2I&list=PL-osiE80TeTt2d9bfVyTiXJA-UTHn6WwU&index=44
Please help me in resolving this.
import json
# Decoding json string to python.
# This a python string that happens to be a valid json also.
people_string = '''
{
"people": [
{
"name": "Sumedha",
"phone":"0987654312"
"City": "Middletown"
},
{
"name": "Ankit",
"phone":"9999999999"
"City": "Middletown2"
},
{
"name": "Hemlata",
"phone":"9865656475"
"City": "Chandigarh"
}
]
}
'''
# loads method loads the string.
data = json.loads(people_string)
for person in data['people']:
print(person['name'])
del person['phone']
new_string = json.dumps(data, indent=2, sort_keys=True)

You named your test script json.py, so it's shadowing the built-in json module, preventing you from importing the built-in module, making import json try to import itself (that's what "Import resolves to its containing file" is trying to warn you about). Name your script something else (e.g. jsontest.py) and it will work.

The error seems to appear when you have both JSON & pandas packages in the same python environment.
I found a cheap solution by having a separate project file for pandas & JSON so that I install the packages in different virtual environments.

Related

NEO4J APOC LOAD JSON FROM EXTERNAL VARIABLE

I'm trying to load a json document into Neo4j but, if possible, I don't want to use a file because, in my case, it's a waste of time.
WHAT I'M DOING NOW:
Python query to Elasticsearch Database
Push data into a .json file
From Neo4j Python Library, run apoc.load.json('file:///file.json')
WHAT I WANT TO DO:
Python query to Elasticsearch Database
From Neo4j Python Library, run apoc.load.json()
Is there any syntax that could help me with that? Thank you
If you already have APOC installed, you can utilize the APOC to ES connector without having to use apoc.load.json.
Here is an example from the documentation:
CALL apoc.es.query("localhost","bank","_doc",null,{
query: { match_all: {} },
sort: [
{ account_number: "asc" }
]
})
YIELD value
UNWIND value.hits.hits AS hit
RETURN hit;
Link to docs: https://neo4j.com/labs/apoc/4.1/overview/apoc.es/

python trouble de-serializing avro in memory

Currently, I am using requests to grab an avro file from a database and storing the data in requests.text. the file is separated by the schema and data. How do I merge the schema and data in memory into readable/usable data.
Requests.text brings the data down in Unicode, and seperates it by schema first and data second. I have been able to use string manipulation to just grab the schema part of the Unicode and set that as a schema variable, however I am unsure how to handle the data section. I tried encodeing the data to utf-8 and passing it as raw_bytes in my code, with no luck,
#the request text is too large, so I am shortening it down
r.text = u'Obj\x01\x04\x14avro.codec\x08null\x16avro.schema\u02c6\xfa\x05{"namespace": "namespace", "type": "record", "fields" : [{"type": ["float", "null"], "default": " ", "name": "pvib_z_crest_factor"}],
#repeat for x amount of fields
"name": "Telemetry"}\x00\u201d \xe0B\x1a\u2030=\xc0\u01782\n.\u015e\x049\xaa\x12\xf6\u2030\x02\x00\u0131\u201a];\x02\x02\x02\x00\xed\r>;\x02\x02\x00\x01\x02\x00\x00\x02\x00\x00\x00\x00\x00\x02\x02\x00\x00\x00\x1aC\x00\x00\x00\x02C\x02\x00:\x00#2019-02-27 16:38:39.530263-05:00\x02\x02\x00\xaeGa=\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf8\x04\x02\x00\x00\x00\x00\x00\x00\x00\x02\x02\x02\x02\x00\xac\xc5\'7\x00\x00\x00\xe9B\x02\x00\x00\x00\x00\x00\x00\x0e-r#\x00\x00\x00\x00\x00\x02\x02\x00\xfa\xc0\xf5A\x00\x00\x00\xc0#\x00\x00\x00\x00\x02\x00\x02\xc9\xebB\x00\x00\x00\x00\x00\x00\xaa\ufffd\'\x02\x00\x02\xc9\xebB\x02\x02\x00\x00\x00\x00\x00\x02\x00\ufffd\xc2u=\x02\x00\xfc\x18\xd3>\x02\x02\x00\\\ufffdB>\x02\x02\x001\x08,=\x02\x00\x00\x02\x02\x00\x000oE\x00sh!A\x02\x00\x00\xc0uE\x02\x00\xf6(tA\x00\x00\x00\x00\x00\x00-\xb2\ufffd=\x02\x00\x1c \xd1B\x02\x02\x00#2019-02-27 16:38:39.529977-05:00\x02\x00\x080894\x00\u011f\xa7\xc6=\x00\x00\x02\x02\x02\x02\x02\x02\x00\x00\x00\xe0A\x02\x00\x00\x00\u011eA\x00\x00\x00\xb8A\x00\xc3\xf5\xc0#\x00\xd5x\xe9=\x02\x00\x00\x00q=VA\x02\x00\x00\x000B\x02\x00ZV\xfaE\x02\x02\x02\x02\x00\x00\x00!C\x02\x00\x00\x00#C\x00\x00\x00)C\x00\x00\x02\x00\x00\x00\u20ac?\x00\x00\x02\x02\x02\x02\x02\x00\xf8\x04\x02\x00\x00\x00\x00\x00\x02\x00\x00\x00\u20ac?\x00\x02W\x00ff6A\x00\x00\x00\x00\x00\x02\x00\xcc&\x10L\x00\x00\xf7\x7fG\x02\x02\x02\x00\x00\x00\x00\x00\x02\x02\x02\x00\x00\u20ac\xacC\x02\x02\x02\x00\x1c~%A\x00\x1c \xd1B\x00\x01\x02\x02\x02\x00\xfa\xc0\xf5A\x02\x02\x02\x02\x02\x00\x00\x000B\x00\x00\x00\x00\x00\x00\x00\x00?C\x00\xf4-\x1fE\x00\x00\x00\x00\x00\x00\x00\u0131\x7fG\x00\x00\u015f\x7fG\x00\x00\u0131\x7fG\x00\x00\x00\x0bC\x00#2019-05-31 13:00:25.931949+00:00\x00#2019-05-31 09:00:25.931967-04:00\x00\x00\x00\xe0A\x00h\xe8\u0178:\x00=\n%C\x00\x00\x00\x07C\x02\x00\x00\x00\xe0#\x00\x01\x02\x00\x00\x02\x02\x00\x00\u011e\u2020F\x02\x00\x00\u20acDE\x00\xcd\xcc\xcc=\x00#2019-02-27 16:38:39.529620-05:00\x02\x00\x00\x00\xc8B\x00\x00\x00\x06C\x02\x00\x01\x004\u20ac7:\x00\x00\x000B\x02\x02\x02\x02\x02\x02\x0033CA\x02\x00L7\t>\x02\x02\x00\xae\xc7\xa7B\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x02\x02\x00\x00\x00pB\x00\x00\x00`B\x00\x00\x02\x00\x00\x00...
#continues on, too big to put the rest of (feel free to ask questions to see more)
I except the file in memory to be de-serialized into readable data, however I have been getting constant errors of list being out of range or cannot access branch index x.
Thank you for reading
EDIT(6/5/19):
I managed to download the avro file using azure storage explorer on another device. From here, I ran the following code:
import avro.schema
from avro.io import DatumReader, DatumWriter
from avro.datafile import DataFileReader, DataFileWriter
avro_file = DataFileReader(open("Destination/to/file.avro", "rb"), DatumReader())
avro_file = [x for x in avro_file]
for i in range(len(avro_file)):
print(len(data))
print(data[i])
(NOTE: the computer I ran this code on runs off of python 3.7, but theres no real syntax changes between the two python version)
This code runs smoothly and shows the data in the appropriate places.
However, Cannot simply pass the same data im recieving from the request as an argument to DataFileReader (Stating the obvious, but guessing it has something to do with calling "rb" when opening the file and the request.text being in unicode). Is their any way to modify that request.text to work so I can pass it as an argument inside DataFileReader (replacing open(file, "rb"))?
You want content, not text
I also think you'll want to try BytesIO, which should be able to be used like a file object
import io
import requests
r = requests.get("http://example.com/file.avro")
inmemoryfile = io.BytesIO(r.content)
reader = DataFileReader(inmemoryfile, DatumReader())
records = list(reader)
reader.close()
(code untested)

How to convert from TSV file to JSON file?

So I know this question might be duplicated but I just want to know and understand how can you convert from TSV file to JSON? I've tried searching everywhere and I can't find a clue or understand the code.
So this is not the Python code, but it's the TSV file that I want to convert to JSON:
title content difficulty
week01 python syntax very easy
week02 python data manipulation easy
week03 python files and requests intermediate
week04 python class and advanced concepts hard
And this is the JSON file that I want as an output.
[{
"title": "week 01",
"content": "python syntax",
"difficulty": "very easy"
},
{
"title": "week 02",
"content": "python data manipulation",
"difficulty": "easy"
},
{
"title": "week 03",
"content": "python files and requests",
"difficulty": "intermediate"
},
{
"title": "week 04",
"content": "python class and advanced concepts",
"difficulty": "hard"
}
]
The built-in modules you need for this are csv and json.
To read tab-separated data with the CSV module, use the delimiter="\t" parameter:
Even more conveniently, the CSV module has a DictReader that automatically reads the first row as column keys, and returns the remaining rows as dictionaries:
with open('file.txt') as file:
reader = csv.DictReader(file, delimiter="\t")
data = list(reader)
return json.dumps(data)
The JSON module can also write directly to a file instead of a string.
if you are using pandas you can use the to_json method with the option orient="records"to obtain the list of entries you want.
my_data_frame.to_json(orient="records")

JSON file isn't finished writing to by the time I load it, behave BDD

My program is writing to a JSON file, and then loading, reading, andPOSTing it. The writing part is being done by the behave BDD.
# writing to the JSON file is done by behave
data = json.load(open('results.json', 'r'))
r = requests.post(MyAPIEndpoint, json=data)
I'm running into an issue since the writing is not being completed before I begin loading. (It's missing the closing [ after the final {.)
HOOK-ERROR in after_all: ValueError: Expecting object: line 2 column 2501 (char 2502)
Is there a way of getting by this, either by changing something with my call to behave's __main__ or by a change in how or when I'm loading the JSON file?
i think the problem here is little mixed, in one part, you can wait to the file finish to be written, and close it when is not in use, you can do that inside your code or somwthing like this
check if a file is open in Python
In other part, for the computer, the data is data, don't will analysis that, i means, you know where is the error, because you analise that, when you think, so for you is obvs, but for the computer isn't obvs, how much errores there is?, and where or the structure of it?, is there all the data we need?, for a computer is hard know all of this, you will need write a program to can deduce all of this data and check it.
If your program uses multiples results, i think the better ways is use temp files, so you can freely create one, write, check when is ready and use it, and don't will care if you have other similar process using it.
Other way, check if the json is valid before call it, Python: validate and format JSON files, when is valid load it.
Hope can help.
Cya.
One way to address this problem is to change your file format from being JSON at the top level to newline-delimited JSON (NDJSON), also called line-delimited JSON (LDJSON) or JSON lines (JSONL).
https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON
For example, this JSON file:
{
"widgets": [
{"name": "widget1", "color": "red"},
{"name": "widget2", "color": "green"},
{"name": "widget3", "color": "blue"}
]
}
Would become this NDJSON file:
{"name": "widget1", "color": "red"}
{"name": "widget2", "color": "green"}
{"name": "widget3", "color": "blue"}
It's especially useful in the context of streaming data, which kind of sounds like the use case you have where you might have one process writing to a file continuously while another is reading it.
You could then read the NDJSON file like so:
import json
from pprint import pprint
with open('widgets.json') as f:
all_lines = [json.loads(l) for l in f.readlines()]
all_data = {'widgets': all_lines}
pprint(all_data)
Output:
{'widgets': [{'color': 'red', 'name': 'widget1'},
{'color': 'green', 'name': 'widget2'},
{'color': 'blue', 'name': 'widget3'}]}

Reading pretty print json files in Apache Spark

I have a lot of json files in my S3 bucket and I want to be able to read them and query those files. The problem is they are pretty printed. One json file has just one massive dictionary but it's not in one line. As per this thread, a dictionary in the json file should be in one line which is a limitation of Apache Spark. I don't have it structured that way.
My JSON schema looks like this -
{
"dataset": [
{
"key1": [
{
"range": "range1",
"value": 0.0
},
{
"range": "range2",
"value": 0.23
}
]
}, {..}, {..}
],
"last_refreshed_time": "2016/09/08 15:05:31"
}
Here are my questions -
Can I avoid converting these files to match the schema required by Apache Spark (one dictionary per line in a file) and still be able to read it?
If not, what's the best way to do it in Python? I have a bunch of these files for each day in the bucket. The bucket is partitioned by day.
Is there any other tool better suited to query these files other than Apache Spark? I'm on AWS stack so can try out any other suggested tool with Zeppelin notebook.
You could use sc.wholeTextFiles() Here is a related post.
Alternatively, you could reformat your json using a simple function and load the generated file.
def reformat_json(input_path, output_path):
with open(input_path, 'r') as handle:
jarr = json.load(handle)
f = open(output_path, 'w')
for entry in jarr:
f.write(json.dumps(entry)+"\n")
f.close()

Categories

Resources