I'm relatively new to python / using JSONs and I'm having an issue.
I'm trying to add 200 of my tweets to a json file. here is my code that does this :
def process_or_store(tweet):
with open('baileighsTweets.json', 'a') as f:
f.write(json.dumps(tweet._json, indent=4))
f.close()
for tweet in tweepy.Cursor(api.user_timeline).items(200):
process_or_store(tweet)
This code runs fine, and adds my tweets to a json file, with each tweet being a json object. however, an error occurs with one of my objects in the json file :
picture with the error
same code on a different line, no error
it appears to be a very basic syntax issue but I'm confused about why it happened - my code adds to the json file, I don't do it manually, so I don't understand why I received an 'end of file expected' error, and I don't how to fix it.
Thanks in advance for your help/suggestions guys!
It would have been good if you had attached the json file along the question. I feel there might be syntax issue like missing "," or the "}" braces. You can use https://jsonlint.com/ to validate your json and try to understand where the real issue is. Hope this helps you.
Related
I am trying to import a csv-File and make a dictionary out of it.
The problem is that every line is in a string.
For example like this:
""First Name", "Last Name", "Date of Birth""
""Alex", "Turner", "08.09.1978""
""Max", "Parker", "16.02.2003""
""Mike", "Johnsen", "04.12.1999""
I tried to transform it into utf-8 because maybe thought that would be the problem but didn`t work.
import csv
csv_filename = 'Pläne.csv'
with open(csv_filename, encoding='utf-8') as f:
plaene_dict = csv.DictReader(f)
I have to delete all the double quotes at the beginning and end of every line to get a dictionary out.
Had someone this problem before? How can I transform the csv to be fine for the input.
Also have no idea to work around and to delete all the " with python before transform it into in dict. Anyone an idea?
Thanks a lot.
maxmyh
EDIT:
Here is a Link to get a sample of the file https://www.transfernow.net/dl/20221109C4Av4YIO
I have come across an error while parsing json with ijson.
Background:
I have a series(approx - 1000) of large files of twitter data that are compressed in a '.bz2' format. I need to get elements from the file into a pd.DataFrame for further analysis. I have identified the keys I need to get. I am cautious putting twitter data up.
Attempt:
I have managed to decompress the files using bz2.decompress with the following code:
## Code in loop specific for decompressing and parsing -
with open(file, 'rb') as source:
# Decompress the file
json_r = bz2.decompress(source.read())
json_decom = json_r.decode('utf-8') # decompresses one file at a time rather than a stream
# Parse the JSON with ijson
parser = ijson.parse(json_decom)
for prefix, event, value in parser:
# Print selected items as part of testing
if prefix=="created_at":
print(value)
if prefix=="text":
print(value)
if prefix=="user.id_str":
print(value)
This gives the following error:
IncompleteJSONError: parse error: trailing garbage
estamp_ms":"1609466366680"} {"created_at":"Fri Jan 01 01:59
(right here) ------^
Two things:
Is my decompression method correct and giving the right type of file for ijson to parse (ijson takes both bytes and str)?
Is is a JSON error? // If it is a JSON error is it possible to develop some kind of error handler to move to the next file - if so any suggestion would be appreciated?
Any assistance would be greatly appreciated.
Thank you, James
To directly answer your two questions:
The decompression method is correct in the sense that it yields JSON data that you then feed to ijson. As you point out, ijson works both with str and bytes inputs (although the latter is preferred); if you were giving ijson some non-JSON input you wouldn't see an error showing JSON data in it.
This is a very common error that is described in ijson's FAQ. It basically means your JSON document has more than one top-level value, which is not standard JSON, but is supported by ijson by using the multiple_values option (see docs for details).
About the code as a whole: while it's working correctly, it could be improved on: the whole point of using ijson is that you can avoid loading the full JSON contents in memory. The code you posted doesn't use this to its advantage though: it first opens the bz-compressed file, reads it as a whole, decompresses that as a whole, (unnecessarily) decodes that as a whole, and then gives the decoded data as input to ijson. If your input file is small, and the decompressed data is also small you won't see any impact, but if your files are big then you'll definitely start noticing it.
A better approach is to stream the data through all the operations so that everything happens incrementally: decompression, no decoding and JSON parsing. Something along the lines of:
with bz2.BZ2File(filename, mode='r') as f:
for prefix, event, value in ijson.parse(f):
# ...
As the cherry on the cake, if you want to build a DataFrame from that you can use DataFrame's data argument to build the DataFrame directly with the results from the above. data can be an iterable, so you can, for example, make the code above a generator and use it as data. Again, something along the lines of:
def json_input():
with bz2.BZ2File(filename, mode='r') as f:
for prefix, event, value in ijson.parse(f):
# yield your results
df = pandas.DataFrame(data=json_input())
I am trying to import a JSON file and convert it into a dataframe to be used for data analysis in Python. Below is my code:
import json
import pandas as pd
with open('Kickstarter_2015-06-12.json','r',encoding = "utf8") as f:
data5 = json.loads(f.read())
df_nested_list = pd.json_normalize(data5, record_path =['projects'])
However, I got this error when I run these codes:
JSONDecodeError: Expecting ',' delimiter
May I know how do I solve this problem? Any help in this area would be appreciated. Thanks!
I would suggest to use this JSON lint to make sure that your JSON is properly formatted, it seems that you're missing a comma somewhere.
It could also happen when " or ' are involved in your strings/values because these characters can mess up the parser but it's hard to tell without seeing the JSON itself. If json shows as valid in json lint and you can find the values that contain special characters please provide a small example without having to expose your entire json.
I have big size of json file to parse with python, however it's incomplete (i.e., missing parentheses in the end). The json file consist of one big json object which contains json objects inside. All json object in outer json object is complete, just finishing parenthese are missing.
for example, its structure is like this.
{bigger_json_head:value, another_key:[{small_complete_json1},{small_complete_json2}, ...,{small_complete_json_n},
So final "]}" are missing. however, each small json forms a single row so when I tried to print each line of the json file I have, I get each json object as a single string.
so I've tried to use:
with open("file.json","r",encoding="UTF-8") as f:
for line in f.readlines()
line_arr.append(line)
I expected to have a list with line of json object as its element
and then I tried below after the process:
for json_line in line_arr:
try:
json_str = json.loads(json_line)
print(json_str)
except json.decoder.JSONDecodeError:
continue
I expected from this code block, except first and last string, this code would print json string to console. However, it printed nothing and just got decode error.
Is there anyone who solved similar problem? please help. Thank you
If the faulty json file only miss the final "]}", then you can actually fix it before parse it.
Here is an example code to illustrate:
with open("file.json","r",encoding="UTF-8") as f:
faulty_json_str = f.read()
fixed_json_str = faulty_json_str + ']}'
json_obj = json.loads(fixed_json_str)
I'm trying to test this code out: https://github.com/shanepm/500px-Bot/blob/master/500px.py
Completely new to Phyton, but have programming skills.
Running on Windows 10.
I get this error msg:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The error is pointed to be at pendingFollowList = json.loads(f.read()) and on the rest of the json.loads...calls
Have tried to pass one more argument on the line above:
with open('pendingUsers.txt', 'r', **encoding='utf-8'**) as f:
This did not help.
Have also checked Encoding on .txt files as well.
Anybody that knows what to do??
Thanks in advance!
Alen
What I would do is instead of using a .txt file, use a .json file because you're using json. I've used
import json
with open("filename.json") as f:
Json_content = json.load(f)
And then if you want to access one of the values of your Json you would Json_content["key_in_json_file"]
all!
It helped out to change the extension.
My main issue was that my input files where blank and not json-formatted correctly.
Thanks to all for the contribution.
BR
Alen