error with writelines(), when creating json objects - python

I wnat to convert the text file into json objects, and my inputfile i.e. text file has large number of objects(4 mb). It throw an error when I try to write the json objects into text file. Here is the error"writelines( ) argument must be a sequence of strings". Here is my input file:
created_at : 03 Ekim 2014 Cuma, 06:36, article : İSTANBUL (CİHAN)- Fethullah Gülen Hocaefendi'nin “421. Nağme: Şamatalarınız Haramîliğinizi Örtemeyecek!..” isimli yeni sohbeti, herkul.org sitesinde yayınlandı. Hocaefendi, "Şamatayla hangi şirretliği kapamak istediğini herkes anlıyor. Silinmez o zihinlerden" ifadelerini kullandı.Sohbetinde Allah Rasûlü (sallallâhu aleyhi ve sellem) Efendimiz’in, “Allahım beni kendi gözümde küçük, insanlar nazarında ise (yüklediğin misyona uygun.
created_at : 06 Ekim 2014 Pazartesi, 11:57, article : KAYSERİ (CİHAN)- Kimse Yok Mu Derneği Kayseri Şubesi, hayırseverlerin bağışlarıyla paket haline getirdiği kurban etlerini ihtiyaç sahiplerine ulaştırdı. Şehirde daha önce derneğe müracaatta bulunan ve tespit edilen aileler için şehrin 4 ayrı noktasında kurban eti dağıtım merkezi oluşturuldu. Kurban etlerini alan aileler ise Kimse Yok Mu ile yüzlerinin güldüğünü ve emeği geçenlere teşekkür ettiklerini söylediler. Geçen yıla göre ise bağış miktarlarının yüzde 50 oranında arttığı bildirildi. Kimse Yok Mu Derneği’nin Kayseri Şubesi’nde Kurban Bayramı nedeniyle hareketlilik yaşanıyor. Dernek, hayırseverlerin bağışladığı kurbanların kesimi yapıldıktan sonra. Here is my code:
#!usr/bin/python
import sys, os
import json
inputfile = open('bugun_data_collection_KimseYokmu.txt', 'r')
outputfile = open('bugun_data_collection_json_KimseYokmu.txt', 'w')
#shows how the dictionary looks like
reps = {"created_at": "date","article": "text"}
#reads the input file line by line
for line in inputfile:
outputfile.writelines((line, json.dumps(reps))
inputfile.close()
outputfile.close()
this is the error: "" line 11
inputfile.close()
^
SyntaxError: invalid syntax

Notice the error message: "writelines( ) argument must be a sequence of strings". It's throw out because the second element of the parameter(tuple type) is a dict not a string as expected. You can use json.dumps(reps) to convert it to string, like this:
outputfile.writelines((line, json.dumps(reps)))
Besides, you have put the file close operation in the for loop, this will cause another error when you write into or read from a closed file.
If you want to extract text from the input file, you can do it like this(did not deal with exceptions):
#reads the input file line by line
outputlines = []
for line in inputfile:
text = line.split('article : ')[1]
date = line.split('article : ')[0].split('created_at : ')[1]
reps = {"created_at": date,"article": text}
outputlines.append(json.dumps(reps))
outputfile.writelines(outputlines)

Related

Loading a JSON in python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've got a problem loading a JSON in python. I'm working with python 2.7 and I've got a JSON file that I would like to load. I did:
movies = json.load(open(FBO_REF_FILE, 'r'))
But when I display it I got a dict full of:
{u'id_yeyecine': 42753, u'budget_dollars': u'85', u'classification': u'Tous publics', u'pays': u'US', u'budget_euros': u'0', u'dpw_entrees_fr': 132326, u'realisateurs': u'Brad Peyton, Kevin Lima', u'is_art_et_essai': u'NON', u'distributeur_video': u'Warner hv', u'genre_gfk_1': u'ENFANT', u'genre_gfk_2': u'FILM FAMILLE', u'genre_gfk_3': u'FILM FAMILLE', u'is_3D': u'OUI', u'fid': 16429, u'cum_entrees_pp': 58076, u'titre': u'COMME CHIENS ET CHATS LA REVANCHE DE KITTY GALORE', u'psp_entrees': 963, u'cum_entrees_fr': 348225, u'dps_copies_fr': 453, u'dpj_entrees_pp': 7436, u'visa': 127021, u'dps_entrees_fr': 178908, u'genre': u'Com\xe9die', u'distributeur': u'WARNER BROS.', u'editeur_video': u'Warner bros', u'psp_copies': 15, u'dpw_entrees_pp': 26195, u'id_imdb': None, u'date_sortie_video': u'2010-12-06', u'dps_copies_pp': 39, u'date_sortie': u'2010-08-04', u'dps_entrees_pp': 32913, u'dpj_entrees_fr': 40369, u'ecrivains': u'', u'acteurs': u"Chris O'donnell, Jack McBrayer", u'is_premier_film': u'NON'}
I tried using ast but I got the following error: string malformed. The error I get when using last is the following:
153 if cursor is None:
154 movies = json.load(open(FBO_REF_FILE, 'r'))
--> 155 movies = ast.literal_eval(movies)
156 for movie in movies:
157 if movies[movie]['id_allocine'] == allocine_id:
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.pyc in literal_eval(node_or_string)
78 return left - right
79 raise ValueError('malformed string')
---> 80 return _convert(node_or_string)
81
82
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.pyc in _convert(node)
77 else:
78 return left - right
---> 79 raise ValueError('malformed string')
80 return _convert(node_or_string)
81
ValueError: malformed string
With json.load you parse a json file into python's datatypes. In your case this is a dict.
With open you load a file.
If you don't want to parse the json file just do the following
content = None
with open(FBO_REF_FILE, 'r') as f:
content = f.read()
print content # content is a string contaning the content of the file
If you want to parse the json file into python's datatypes do the following:
content = None
with open(FBO_REF_FILE, 'r') as f:
content = json.loads(f.read())
print content # content is a dict containing the parsed json data
print content['id_yeyecine']
print content['budget_dollars']
If you want to pretty print your dictionary:
json.dumps(movies, sort_keys=True, indent=4)
Or use pprint: https://docs.python.org/2/library/pprint.html
To read from movies, use regular dict methods:
id_yeyecine = movies["id_yeyecine"]
Now id_yeyecine is 42753.

Tokenizing and Removing Stopwords from JSON using nltk

Hi I keep getting this error:
D:\WinPython-32bit-2.7.10.3\python-2.7.10>python TweetTest.py Twitter.json
Traceback (most recent call last):
File "TweetTest.py", line 60, in <module>
tweet = json.loads(line)
File "D:\WinPython-32bit-2.7.10.3\python-2.7.10\lib\json\__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "D:\WinPython-32bit-2.7.10.3\python-2.7.10\lib\json\decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 4488 - line 1 column 99678411 (char 4487 - 99678410)
I have no idea what is wrong. My code is as follows:
import sys
import json
from collections import Counter
import re
from nltk.corpus import stopwords
import string
punctuation = list(string.punctuation)
stop = stopwords.words('english') + punctuation + ['rt', 'via']
emoticons_str = r"""
(?:
[:=;] # Eyes
[oO\-]? # Nose (optional)
[D\)\]\(\]/\\OpP] # Mouth
)"""
regex_str = [
emoticons_str,
r'<[^>]+>', # HTML tags
r'(?:#[\w_]+)', # #-mentions
r"(?:\#+[\w_]+[\w\'_\-]*[\w_]+)", # hash-tags
r'http[s]?://(?:[a-z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-f][0-9a-f]))+', # URLs
r'(?:(?:\d+,?)+(?:\.?\d+)?)', # numbers
r"(?:[a-z][a-z'\-_]+[a-z])", # words with - and '
r'(?:[\w_]+)', # other words
r'(?:\S)' # anything else
]
tokens_re = re.compile(r'('+'|'.join(regex_str)+')', re.VERBOSE | re.IGNORECASE)
emoticon_re = re.compile(r'^'+emoticons_str+'$', re.VERBOSE | re.IGNORECASE)
def tokenize(s):
return tokens_re.findall(s)
def preprocess(s, lowercase=False):
tokens = tokenize(s)
if lowercase:
tokens = [token if emoticon_re.search(token) else token.lower() for token in tokens]
return tokens
if __name__ == '__main__':
fname = sys.argv[1]
with open(fname, 'r') as f:
count_all = Counter()
for line in f:
tweet = json.loads(line)
tokens = preprocess(tweet['text'])
count_all.update(tokens)
print(count_all.most_common(5))
This is the first two output of my JSON file. I have used a Tweet Stream listener to collect the tweets.
{"created_at":"Wed Apr 06 08:33:55 +0000 2016","id":717631408345333760,"id_str":"717631408345333760","text":"RT #whosharold: Hilary Clinton cannot be president pls she can't even hold her man down what makes ya think she gon hold the office down","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":472387071,"id_str":"472387071","name":"BigGucciK 2x","screen_name":"KaisonThatBoy","location":"Bridgeport, CT","url":null,"description":null,"protected":false,"verified":false,"followers_count":1608,"friends_count":1219,"listed_count":8,"favourites_count":1293,"statuses_count":64337,"created_at":"Mon Jan 23 22:07:27 +0000 2012","utc_offset":-10800,"time_zone":"Atlantic Time (Canada)","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"131516","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme14\/bg.gif","profile_background_tile":true,"profile_link_color":"009999","profile_sidebar_border_color":"EEEEEE","profile_sidebar_fill_color":"EFEFEF","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/709500377104818182\/4vMu066C_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/709500377104818182\/4vMu066C_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/472387071\/1457000395","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Apr 06 03:16:15 +0000 2016","id":717551464575401984,"id_str":"717551464575401984","text":"Hilary Clinton cannot be president pls she can't even hold her man down what makes ya think she gon hold the office down","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":792436550,"id_str":"792436550","name":"sadboyz","screen_name":"whosharold","location":null,"url":null,"description":"platano maduro no vuelve a verde","protected":false,"verified":false,"followers_count":1285,"friends_count":979,"listed_count":11,"favourites_count":4877,"statuses_count":91425,"created_at":"Thu Aug 30 21:26:30 +0000 2012","utc_offset":-10800,"time_zone":"Atlantic Time (Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/pbs.twimg.com\/profile_background_images\/773304539\/94dbc3d1558da7f1e3d2c6fffcb5d710.jpeg","profile_background_image_url_https":"https:\/\/pbs.twimg.com\/profile_background_images\/773304539\/94dbc3d1558da7f1e3d2c6fffcb5d710.jpeg","profile_background_tile":true,"profile_link_color":"0084B4","profile_sidebar_border_color":"FFFFFF","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/714669878012219392\/9HmilvPG_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/714669878012219392\/9HmilvPG_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/792436550\/1458855437","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":2,"favorite_count":7,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"whosharold","name":"sadboyz","id":792436550,"id_str":"792436550","indices":[3,14]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1459931635353"}
{"created_at":"Wed Apr 06 08:33:55 +0000 2016","id":717631409742020609,"id_str":"717631409742020609","text":"RT #WisegalGranny: HONY Just Destroyed Donald Trump\u2019s Dream Of Becoming President - https:\/\/t.co\/8GIDVa76bZ Oooo, that's gonna hurt! #Unite\u2026","source":"\u003ca href=\"https:\/\/roundteam.co\" rel=\"nofollow\"\u003eRoundTeam\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2846552432,"id_str":"2846552432","name":"Glenn Silva","screen_name":"GlennSilva76","location":"hawaii","url":null,"description":"Christian, Constitutional Conservative, Pro 1A 2A and RF, It's Time To Unite And Take Our Country Back! #NeverTrump\r\n#UniteWithCruz #CruzCrew #CruzToVictory","protected":false,"verified":false,"followers_count":1981,"friends_count":2408,"listed_count":99,"favourites_count":1819,"statuses_count":38301,"created_at":"Wed Oct 08 07:34:50 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/691834454868889601\/1gkIbY1C_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/691834454868889601\/1gkIbY1C_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2846552432\/1453447926","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Apr 06 08:18:04 +0000 2016","id":717627418454966272,"id_str":"717627418454966272","text":"HONY Just Destroyed Donald Trump\u2019s Dream Of Becoming President - https:\/\/t.co\/8GIDVa76bZ Oooo, that's gonna hurt! #UniteWithCruz #NeverTrump","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":4726275950,"id_str":"4726275950","name":"Wisegal1958","screen_name":"WisegalGranny","location":null,"url":null,"description":null,"protected":false,"verified":false,"followers_count":475,"friends_count":290,"listed_count":73,"favourites_count":8976,"statuses_count":10881,"created_at":"Fri Jan 08 02:36:28 +0000 2016","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"2B7BB9","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/715082668770242561\/ohjXvK85_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/715082668770242561\/ohjXvK85_normal.jpg","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":1,"favorite_count":0,"entities":{"hashtags":[{"text":"UniteWithCruz","indices":[114,128]},{"text":"NeverTrump","indices":[129,140]}],"urls":[{"url":"https:\/\/t.co\/8GIDVa76bZ","expanded_url":"http:\/\/www.parhlo.com\/hony-just-destroyed-trumps-dream-of-becoming-president\/?track=twb","display_url":"parhlo.com\/hony-just-dest\u2026","indices":[65,88]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"UniteWithCruz","indices":[133,140]},{"text":"NeverTrump","indices":[139,140]}],"urls":[{"url":"https:\/\/t.co\/8GIDVa76bZ","expanded_url":"http:\/\/www.parhlo.com\/hony-just-destroyed-trumps-dream-of-becoming-president\/?track=twb","display_url":"parhlo.com\/hony-just-dest\u2026","indices":[84,107]}],"user_mentions":[{"screen_name":"WisegalGranny","name":"Wisegal1958","id":4726275950,"id_str":"4726275950","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1459931635686"}
Please help me. Thank you.
I had the same error once.
Your script loads a JSON object at each line read, the issue might be that your JSON objects are not separated by a newline.
For instance if your file contains
json_oject1
json_oject2
then the two objects will be read whereas if the file contains
json_oject1 json_oject2
you will get an error.
Solution: add a newline when writing a new JSON object to the output file.
(related: https://stackoverflow.com/a/21058946/2314737)

How can i get my files to be opened?

Hi there im working on a function that merges two separate .txt files and outputs a personalized letter. The problem is, is that i can include my text within the funciton module and it works perfectly. But when i try to open them in the function and to be used by the function i get this
error message:
Traceback (most recent call last):
File "/Users/nathandavis9752/CP104/davi0030_a10/src/q2_function.py", line 25, in
data = cleanData(q2)
File "/Users/nathandavis9752/CP104/davi0030_a10/src/q2_function.py", line 17, in cleanData
return [item.strip().split('\n\n') for item in query.split('--')]
AttributeError: 'file' object has no attribute 'split'
code:
letter = open('letter.txt', 'r')
q2 = open('q2.txt', 'r')
def cleanData(query):
return [item.strip().split('\n\n') for item in query.split('--')]
def writeLetter(template, variables, replacements):
# replace ith variable with ith replacement variable
for i in range(len(variables)):
template = template.replace(variables[i], replacements[i])
return template
data = cleanData(q2)
print (data)
variables = ['[fname]', '[lname]', '[street]', '[city]']
letters = [writeLetter(letter, variables, person) for person in data]
for i in letters:
print (i)
q2.txt file:
Michael
dawn
lock hart ln
Dublin
--
kate
Nan
webster st
king city
--
raj
zakjg
late Road
Toronto
--
dave
porter
Rock Ave
nobleton
letter.txt file:
[fname] [lname]
[street]
[city]
Dear [fname]:
As a fellow citizen of [city], you and all your neighbours
on [street] are invited to a celebration this Saturday at
[city]'s Central Park. Bring beer and food!
You are trying to split a file buffer rather than a string.
def cleanData(query):
return [item.strip().split('\n\n') for item in query.read().split('--')]

Python: error loading JSON object

I'm trying to load the following JSON string in python:
{
"Motivo_da_Venda_Perdida":"",
"Data_Visita":"2015-03-17 08:09:55",
"Cliente":{
"Distribuidor1_Modelo":"",
"RG":"",
"Distribuidor1_Marca":"Selecione",
"PlataformaMilho1_Quantidade":"",
"Telefone_Fazenda":"",
"Pulverizador1_Quantidade":"",
"Endereco_Fazenda":"",
"Nome_Fazenda":"",
"Area_Total_Fazenda":"",
"PlataformaMilho1_Marca":"Selecione",
"Trator1_Modelo":"",
"Tipo_Cultura3":"Selecione",
"Tipo_Cultura4":"Selecione",
"Cultura2_Hectares":"",
"Colheitadeira1_Quantidade":"",
"Tipo_Cultura1":"Soja",
"Tipo_Cultura2":"Selecione",
"Plantadeira1_Marca":"Stara",
"Autopropelido1_Modelo":"",
"Email_Fazenda":"",
"Autopropelido1_Marca":"Stara",
"Distribuidor1_Quantidade":"",
"PlataformaMilho1_Modelo":"",
"Trator1_Marca":"Jonh deere",
"Email":"",
"CPF":"46621644000",
"Endereco_Rua":"PAQUINHAS, S/N",
"Caixa_Postal_Fazenda":"",
"Cidade_Fazenda":"",
"Plantadeira1_Quantidade":"",
"Colheitadeira1_Marca":"New holland",
"Data_Nascimento":"2015-02-20",
"Cultura4_Hectares":"",
"Nome_Cliente":"MILTON CASTIONE",
"Cep_Fazenda":"",
"Telefone":"5491290687",
"Cultura3_Hectares":"",
"Trator1_Quantidade":"",
"Cultura1_Hectares":"",
"Autopropelido1_Quantidade":"",
"Pulverizador1_Modelo":"",
"Caixa_Postal":"",
"Estado":"RS",
"Endereco_Numero":"",
"Cidade":"COLORADO",
"Colheitadeira1_Modelo":"",
"Pulverizador1_Marca":"Selecione",
"CEP":"99460000",
"Inscricao_Estadual":"0",
"Plantadeira1_Modelo":"",
"Estado_Fazenda":"RS",
"Bairro":""
},
"Quilometragem":"00",
"Modelo_Pretendido":"Selecione",
"Quantidade_Prevista_Aquisicao":"",
"Id_Revenda":"1",
"Contato":"05491290687",
"Pendencia_Para_Proxima_Visita":"",
"Data_Proxima_Visita":"2015-04-17 08:09:55",
"Valor_de_Venda":"",
"Maquina_Usada":"0",
"Id_Vendedor":"2",
"Propensao_Compra":"Propensao_Compra_Frio",
"Comentarios":"despertar compra",
"Sistema_Compra":"Sistema_Compra_Finame",
"Outro_Produto":"",
"Data_Prevista_Aquisicao":"2015-04-17 08:09:55",
"Objetivo_Visita":"Despertar_Interesse",
"Tipo_Contato":"Telefonico"}
however I get the following error when I try to load it
File "python_file.py", line 107, in busca_proxima_mensagem
Visita = json.loads(corpo)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 2 - line 6 column 84 (char 1 - 1020)
but this JSON seems to be valid according to this site: http://jsonformatter.curiousconcept.com/ What am I doing wrong? Why can't I load this string as a JSON object?
I'm trying to load the string from AWS SQS like this:
import json
...
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
Visita = json.loads(corpo)
OK, so I figured out what is causing me problems: There is a slash as a value of a key
"Endereco_Rua":"PAQUINHAS, S/N",
However I'm telling python to filter that out (code below), but it's not working. How can I remove that? Can do it on the origin that created the data, as I don't have access to the interface the user uses to fill in.
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
corpo = corpo.replace("/", "") #Filtering slashes
Visita = json.loads(corpo)
Found a solution! Beside the slash caracter, sometimes this error also happened with no visible cause. Ended up solving this by adding the following lines in my python code:
1) At the start of my code, along with other python imports
from boto.sqs.message import RawMessage
2) Changing my SQS queue to use/fetch raw data:
fila = sqs_conn.get_queue(constantes.fila_SQS)
fila.set_message_class(RawMessage)
Hope this helps anyone who is having the same issue.

Displaying better error message than "No JSON object could be decoded"

Python code to load data from some long complicated JSON file:
with open(filename, "r") as f:
data = json.loads(f.read())
(note: the best code version should be:
with open(filename, "r") as f:
data = json.load(f)
but both exhibit similar behavior)
For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.
However, for other types of JSON error (including the classic "using comma on the last item in a list", but also other things like capitalising true/false), Python's output is just:
Traceback (most recent call last):
File "myfile.py", line 8, in myfunction
config = json.loads(f.read())
File "c:\python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "c:\python27\lib\json\decoder.py", line 360, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?
I've found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:
json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded
which is not very descriptive. The same operation with simplejson:
simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)
Much better! Likewise for other common errors like capitalizing True.
You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this
This will show you error in the JSON you are trying to decode.
You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.
for the example given by tom:
>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'
RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.
As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.
>>> rson.loads('[true,False]')
[True, u'False']
I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.
For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:
def load_json_file(path):
data = open(path, 'r').read()
print data
try:
return Bunch(json.loads(data))
except ValueError, e:
raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
path))
That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn't read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)
Just mentioning this because, while maybe not a good answer to the OP's specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.
As to me, my json file is very large, when use common json in python it gets the above error.
After install simplejson by sudo pip install simplejson.
And then I solved it.
import json
import simplejson
def test_parse_json():
f_path = '/home/hello/_data.json'
with open(f_path) as f:
# j_data = json.load(f) # ValueError: No JSON object could be decoded
j_data = simplejson.load(f) # right
lst_img = j_data['images']['image']
print lst_img[0]
if __name__ == '__main__':
test_parse_json()
I had a similar problem this was my code:
json_file=json.dumps(pyJson)
file = open("list.json",'w')
file.write(json_file)
json_file = open("list.json","r")
json_decoded = json.load(json_file)
print json_decoded
the problem was i had forgotten to file.close() I did it and fixed the problem.
The accepted answer is the easiest one to fix the problem. But in case you are not allowed to install the simplejson due to your company policy, I propose below solution to fix the particular issue of "using comma on the last item in a list":
Create a child class "JSONLintCheck" to inherite from class "JSONDecoder" and override the init method of the class "JSONDecoder" like below:
def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
super(JSONLintCheck,self).__init__(encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
self.scan_once = make_scanner(self)
make_scanner is a new function that used to override the 'scan_once' method of the above class. And here is code for it:
1 #!/usr/bin/env python
2 from json import JSONDecoder
3 from json import decoder
4 import re
5
6 NUMBER_RE = re.compile(
7 r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
8 (re.VERBOSE | re.MULTILINE | re.DOTALL))
9
10 def py_make_scanner(context):
11 parse_object = context.parse_object
12 parse_array = context.parse_array
13 parse_string = context.parse_string
14 match_number = NUMBER_RE.match
15 encoding = context.encoding
16 strict = context.strict
17 parse_float = context.parse_float
18 parse_int = context.parse_int
19 parse_constant = context.parse_constant
20 object_hook = context.object_hook
21 object_pairs_hook = context.object_pairs_hook
22
23 def _scan_once(string, idx):
24 try:
25 nextchar = string[idx]
26 except IndexError:
27 raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
28 #raise StopIteration
29
30 if nextchar == '"':
31 return parse_string(string, idx + 1, encoding, strict)
32 elif nextchar == '{':
33 return parse_object((string, idx + 1), encoding, strict,
34 _scan_once, object_hook, object_pairs_hook)
35 elif nextchar == '[':
36 return parse_array((string, idx + 1), _scan_once)
37 elif nextchar == 'n' and string[idx:idx + 4] == 'null':
38 return None, idx + 4
39 elif nextchar == 't' and string[idx:idx + 4] == 'true':
40 return True, idx + 4
41 elif nextchar == 'f' and string[idx:idx + 5] == 'false':
42 return False, idx + 5
43
44 m = match_number(string, idx)
45 if m is not None:
46 integer, frac, exp = m.groups()
47 if frac or exp:
48 res = parse_float(integer + (frac or '') + (exp or ''))
49 else:
50 res = parse_int(integer)
51 return res, m.end()
52 elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
53 return parse_constant('NaN'), idx + 3
54 elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
55 return parse_constant('Infinity'), idx + 8
56 elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
57 return parse_constant('-Infinity'), idx + 9
58 else:
59 #raise StopIteration # Here is where needs modification
60 raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
61 return _scan_once
62
63 make_scanner = py_make_scanner
Better put the 'make_scanner' function together with the new child class into a same file.
Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.
json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.
What I have done is:
opened my json file with vim,
removed byte order mark (set nobomb)
save file
This resolved the problem with json.tool. Hope this helps!
When your file is created. Instead of creating a file with content is empty. Replace with:
json.dump({}, file)
You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have "some long complicated JSON file" and you will probably need to run it several times (decoders fail and report the first error they encounter only).

Categories

Resources