Ijson parse from list - python

I have a list in which each item contains JSON data, so I am trying to parse the data using Ijson since the data load will be huge.
This is what I am trying to achieve:
article_data=#variable which contains the list
parser = ijson.parse(article_data)
for id in ijson.items(parser, 'item'):
if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
data_article_id.append(id['article_id'])
data_article_short_desc.append(id['short_desc'])
data_article_long_desc.append(id['long_desc'])
This is the error I get:
AttributeError: 'generator' object has no attribute 'read'
I thought of converting the list into string and then try to parse in Ijson, but it fails and gives me the same error.
Any suggestions please?
data_article_id=[]
data_article_short_desc=[]
data_article_long_desc=[]
for index in article_data:
parser = ijson.parse(index)
for id in ijson.items(parser, 'item'):
if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
data_article_id.append(id['article_id'])
data_article_short_desc.append(id['short_desc'])
data_article_long_desc.append(id['long_desc'])
since it is in list, i tried this one also .. but it is giving me the same error.
'generator' object has no attribute 'read'

I am assuming that you have a list of byte string json object that you want to parse.
ijson.items(JSON, prefix) takes a readable byte object as input. That is it takes a opened file or file-like object as input. Specifically, the input should be bytes file-like objects.
If you are using Python 3, you can use io module with
io.BytesIO to create a in-memory binary stream.
Example
Suppose input is [b'{"id": "ab"}', b'{"id": "cd"}']
list_json = [b'{"id": "ab"}', b'{"id": "cd"}']
for json in list_json:
item = ijson.items(io.BytesIO(json), "")
for i in item:
print(i['id'])
Output:
ab
cd

Related

How do I parse faulty json file from python using module json?

I have big size of json file to parse with python, however it's incomplete (i.e., missing parentheses in the end). The json file consist of one big json object which contains json objects inside. All json object in outer json object is complete, just finishing parenthese are missing.
for example, its structure is like this.
{bigger_json_head:value, another_key:[{small_complete_json1},{small_complete_json2}, ...,{small_complete_json_n},
So final "]}" are missing. however, each small json forms a single row so when I tried to print each line of the json file I have, I get each json object as a single string.
so I've tried to use:
with open("file.json","r",encoding="UTF-8") as f:
for line in f.readlines()
line_arr.append(line)
I expected to have a list with line of json object as its element
and then I tried below after the process:
for json_line in line_arr:
try:
json_str = json.loads(json_line)
print(json_str)
except json.decoder.JSONDecodeError:
continue
I expected from this code block, except first and last string, this code would print json string to console. However, it printed nothing and just got decode error.
Is there anyone who solved similar problem? please help. Thank you
If the faulty json file only miss the final "]}", then you can actually fix it before parse it.
Here is an example code to illustrate:
with open("file.json","r",encoding="UTF-8") as f:
faulty_json_str = f.read()
fixed_json_str = faulty_json_str + ']}'
json_obj = json.loads(fixed_json_str)

Python: pickle: No code suggestion after extracting string object from pickle file

for example, this is my code:
#extract the object from "lastringa.pickle" and save it
extracted = ""
with open("lastringa.pickle","rb") as f:
extracted = pickle.load(f)
Where "lasting.pickle" contains a string object with some text.
So if I type extracted. before the opening of the file, I'm able to get the code suggestion as shown in the picture:
But then, after this operation extracted = pickle.load(f), if I type extracted. I don't get code suggestion anymore.
Can somebody explain me why is that and how to solve this?
Pickle reads and writes objects as binary files. You can confirm this by the open('lastringa.pickle', 'rb'), command where you are using the rb option, i.e. read binary.
Your IDE doesn't know the type of the object that the pickle is expected to read, so that it can suggest the string methods (e.g. .split(), .read())
On the other hand, in the first photo, your IDE knows that expected is a string and it knows what to suggest.

csv file convert to io.BytesIO object, then stream to blob storage ,meets value type error:a bytes-like object is required, not '_io.TextIOWrapper'

I am trying to stream a csv to azure blob storage, the csv is generated directly from python scripts without local copy, i have the following code, df is the csv file:
with open(df,'w') as f:
stream = io.BytesIO(f)
stream.seek(0)
block_blob_service.create_blob_from_stream('flowshop', 'testdata123', stream)
then i got the error massage:
stream = io.BytesIO(f) TypeError: a bytes-like object is required, not '_io.TextIOWrapper'
i think the problem has been the format incorrect, can you please identify the problem. thanks.
You opened df for write, then tried to pass the resulting file object as the initializer of io.BytesIO (which is supposed to to take actual binary data, e.g. b'1234'). That's the cause of the TypeError; open files (read or write, text or binary) are not bytes or anything similar (bytearray, array.array('B'), mmap.mmap, etc.), so passing them to io.BytesIO makes no sense.
It looks like your goal is to read from df, and you shouldn't need io.BytesIO at all for that. Just change the mode from (text) write, 'w', to binary read, 'rb'. Then pass the resulting file object to your API directly:
with open(df, 'rb') as f:
block_blob_service.create_blob_from_stream('flowshop', 'testdata123', f)
Update: Apparently df was your actual data, not a file name to open at all. Given that, you should really skip the stream API (which is pointless if the data is already in memory) and just use the bytes based API directly:
block_blob_service.create_blob_from_bytes('flowshop', 'testdata123', df)
or if df is str, not bytes:
block_blob_service.create_blob_from_bytes('flowshop', 'testdata123', df.encode('utf-8'))

Convert a raw tweet string to a JSON object in Python

I'm using twitter's API to download raw tweets so I can play with them. The iterator loop they gave in the example looks something like this (I added an if condition to run the loop n times, not shown here):
iterator = twitter_stream.statuses.sample()
for tweet in iterator:
print (json.dumps(tweet))
break
These commands output the entire JSON object in the correct format.
To extract the "text" item from the raw tweet json object, I tried using the .get("text") operator on the
txts = []
for tweet in iterator:
txts.append((json.dumps(tweet)).get("text"))
break
print (txts)
But I get an error saying "AttributeError: 'str' object has no attribute 'get'"
So I searched around and found a solution where they wrote all the outputs from json.dumps(tweet) to a file, use json.loads(jsonfile) to a variable, and tried to use the .get("text") operator on it to load the text:
fl = open("ipjson.json", "a")
for tweet in iterator:
fl.write(json.dumps(tweet))
break
fl.flush()
decode = json.loads(fl)
for item in decode:
txt = item.get("text")
txts.append(txt)
print (txts)
But this gives me another error saying "TypeError: the JSON object must be str, not 'TextIOWrapper'"
What am I doing wrong? Is there a better/easier way to extract text from a raw tweet JSON object?
For the first example you don't need JSON you can just do:
txts = []
for status in statuses:
txts.append(status.text)
For the second example you're handling the JSON incorrectly. You should instead do:
txts = []
for status in statuses:
txts.append(json.dumps(status))
with open('ipjson.json','w') as fou:
json.dump(txts,fou)
And to read it back in:
with open('ipjson.json','r') as fin:
txts = json.load(fin)
for txt in txts:
print(json.loads(txt)['text'])
Please note that when you're writing and reading the JSON you use dump and load but with the individual JSON objects you're using dumps and loads.
JSON files require recursive scanning,
https://stackoverflow.com/a/42855667/3342050
or known locations within the structure.
After you get your dict, list, & entries, you parse through for specific values:
https://stackoverflow.com/a/42860573/3342050
This is entirely dependent upon what data is returned,
because keys will be unique to that structure.

Parse file containing JSON objects into database using python

I have a file containing in each row a JSON object, which means that the whole file is not a valid JSON, but only each row by itself is.
What I'm trying to do is to iterate through the file and convert each row into a JSON and then print the values, simple because only each row by itself is a valid JSON.
The file looks like this:
{json object 1}
{json object 2}
{json object 3}
{json object 4}
each JSON object looks like this:
{"event":"Session","properties":{"time":1423353612,"duration":33}}
The code I'm trying to run with no success is the following:
import simplejson as json
with open("sessions.json", "r") as f:
for line in f:
j=json.JSONEncoder().encode(line)
print j['event']['time']
print j['event']['duration']
I'm getting the following error:
TypeError: string indices must be integers, not str
Any ideas why?
Thanks!
You're calling the wrong thing. Converting from a JSON string to a Python object is decoding, not encoding. And in any case, it's better to use the top-level functions in the json module, rather than the underlying classes themselves.
for line in f:
j = json.loads(line)
Edit
Given the structure you show, j['event'] is the string "Session" and it does not have a sub-property time. Looks like you mean j['properties']['time'].

Categories

Resources