How to parse twitter feed data using json.reads(line) - python

I've downloaded a big stream of twitter data in JSON format and saved it to a text file. I now want to read it in, line by line, and decode it into a dictionary using json.reads().
My only problem is that it throws an error on the first line, which I assume means the function doesn't think the data is JSON? I have added the line I want to decode at the bottom of this post. When I just print the lines the code works fine, its only the json.reads() function that throws an error.
Here is the code:
def decodeJSON(tweet_data):
for line in tweet_data:
parsedJSON = json.loads(line)
print(parsedJSON) # I just want to print for now to confirm it works.
Here is the error:
File "/Users/cc756/Dropbox/PythonProjects/TwitterAnalysisAssignment/tweet_sentiment.py", line 17, in analyseSentiment
parsedJSON = json.loads(line) File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s) File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/cc756/anaconda/envs/tensorflow/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here is the first string:
'b\\'{"delete":{"status":{"id":805444624881811457,"id_str":"805444624881811457","user_id":196129140,"user_id_str":"196129140"},"timestamp_ms":"1500994305560"}}\\''
I feel like it should work, I've been staring at this for an hour with no improvement!

Your strings are in the wrong format. I'm not sure what you need to do to get rid of the 'b\\'(which doesn't really make sense) at the beginning, but manually typing it in to the shell gives me this:
In [119]: json.loads(b'{"delete":{"status":{"id":805444624881811457,"id_str":"80
...: 5444624881811457","user_id":196129140,"user_id_str":"196129140"},"time
...: stamp_ms":"1500994305560"}}')
Out[119]:
{u'delete': {u'status': {u'id': 805444624881811457,
u'id_str': u'805444624881811457',
u'user_id': 196129140,
u'user_id_str': u'196129140'},
u'timestamp_ms': u'1500994305560'}}
Sorry, I'd make a comment, but imagine this post in a comment... :)
I'm not sure what's up with your pasting of the string into the question, but it's following an invalid format for Python, so you may want to correct that.
UPDATE: The issue was that the data was in binary format and just needed to be decoded with data.decode('utf-8')

Related

Python: ValueError: Expecting , delimiter: line 1 when loading json

sample_json_obj = '{"foo":{"IdentityDocuments":[{"bar":"{\"baz\":\"qux\"}"}]}}'
This is syntactically accurate for a JSON string.
However when I try to load into a variable for further processing,
json.loads(sample_json_obj)
I get the following stacktrace
Traceback (most recent call last):
File "test.py", line 5, in <module>
json.loads(sample_json_obj)
File "/usr/local/Cellar/python#2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 40 (char 39)
I read through a few StackOverflow posts trying to fix this and one of the popular ones suggest mutating the string to the following format
sample_json_obj = r'{"foo":{"IdentityDocuments":[{"bar":"{\"baz\":\"qux\"}"}]}}'
which works as expected but my only problem is that since I am getting this JSON object directly from a query and I am not able to add the r in front of it i.e convert it to a raw string although I have tried several other stack posts including trying formatting, none worked.
Can someone please help me?
Would be great if someone could help me without modifying the first line i.e
sample_json_obj = '{"foo":{"IdentityDocuments":[{"bar":"{\"baz\":\"qux\"}"}]}}'

How to convert a string to a list of string? [duplicate]

This question already has answers here:
How to convert string representation of list to a list
(19 answers)
Closed 2 years ago.
I have a string:
s = "['9780310714675', '9780763630614', '9781416925330', '9780310714569']"
when loaded into memory, this becomes:
'[\'9780310714675\', \'9780763630614\', \'9781416925330\', \'9780310714569\']'
I wonder how to convert it to a list of strings:
strList = ['9780310714675', '9780763630614', '9781416925330', '9780310714569']
I have tried to do this using:
strList = json.loads(s)
but then I got the following error:
Traceback (most recent call last):
File "/home/xxx/dev/python/data-processing/MongoDB/query.py", line 101, in <module>
isbnList = marcQuery.stringToStringList(isbnString)
File "/home/xxx/dev/python/data-processing/MongoDB/query.py", line 74, in stringToStringList
strList = json.loads(s)
File "/home/xxx/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/xxx/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/xxx/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
Can anyone give me some hints on how to do this?
Just use the eval() function. It will evaluate a string as if it was python expression.
strList = eval(s)
Edit:
Some people argue ast.literal_eval is better. And this is 100% true for production code and any case where you can't trust the source of the input. However when learning python and writing hobby projects, you know you can trust the input because you wrote it.
import ast
def read_value(string):
try:
return ast.literal_eval(string)
except:
print("Unsafe to eval: \"" + string "\"")
I don't believe telling new programmers to always follow proper data safety is helpful while learning. In fact, I would rather they experiment and learn from mistakes on their own.
"I know you don't understand how this works yet, but just copy this magic code block instead because it is safer"
Reference on why they are different: https://stackoverflow.com/a/15197698/5987669

Python json.load(file) error with valid JSON

I have a question concerning an issue I ran into while using the json lib in Python.
I'm tying to read a json file using the json.load(file) command using the following code:
import json
filename= '../Data/exampleFile.json'
histFile= open(filename, 'w+')
print(json.load(histFile))
The JSON file I am trying to read is valid according to some website I found: a screenshot of that validation, because I'm new and still lack the reputation...
The error message I'm getting is the following:
File ".\testLoad.py", line 5, in <module>
print(json.load(histFile))
File "C:\Users\...\Python\Python37\lib\json\__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Users\...\Python\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\...\Python\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\...\Python\Python37\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Alright, so I believe it is not the file that is the issue, but the json.load(file) works for me in other cases.
Sadly I was not able to figure this error-message out on my own, so it would be amazing if someone with some more experience dealing with Python-JSON interaction could maybe help me out.
You opened the file for writing:
histFile= open(filename, 'w+')
# ^^^^
The w mode first truncates the file, so the file is empty (it doesn't matter here that the file can also be read from, the + sees to that but the file is truncated nonetheless). See the open() function documentation:
'w': open for writing, truncating the file first)
There is no JSON data in it to parse. This is why the exception tells you that parsing failed at the very start of the file:
Expecting value: line 1 column 1 (char 0)
There is no data in line one, column one.
If you wanted to open a file for both reading and writing without truncating it first, use 'r+' as the file mode.

What am I doing wrong with requests in python: ValueError: Expecting value: line 1 column 1 (char 0)?

I'm not sure even how to ask the question, as it seems it would require quite a lot of code to get into the details. Rather than show the code, I will discuss the behavior when I run.
I am using requests to grab information from an online database. When I run a for loop to go through all of my entries, I get an error like the one below on one of the first 20 entries (usually the first, but not always). The entries in the list are all alike (just different ID numbers). I am using sleep() to ensure that I do not go beyond my rate limit (I have tried increasing sleep to ridiculous wait times, but still get the error). What really surprises me is that it works some, and then gets stuck.... what could cause that?
Also, the code was working before, then I made a large number of edits to other code in the same file, but I didn't think I edited anything related to this.
Traceback (most recent call last):
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 233, in <module>
main()
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 212, in main
match_histories=get_match_histories(challenger_Ids+master_Ids)
File "C:/Users/Mark/PycharmProjects/Riot_API_Challenger_Stats/Main.py", line 62, in get_match_histories
match_histories[summoner_Ids[i]]=api.get_match_history_data(summoner_Ids[i])
File "C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py", line 52, in get_match_history_data
return self._request(api_url)
File "C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py", line 25, in _request
return response.json()
File "C:\Users\Mark\Anaconda3\lib\site-packages\requests\models.py", line 819, in json
return json.loads(self.text, **kwargs)
File "C:\Users\Mark\Anaconda3\lib\json\__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "C:\Users\Mark\Anaconda3\lib\json\decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Mark\Anaconda3\lib\json\decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
Here are lines 10-25 of RiotAPI
def _request(self, api_url, params={}):
args = {'api_key':self.api_key}
for key, value in params.items():
if key not in args:
args[key] = value
#requests.get accesses the URL
response = requests.get(
Consts.URL['base'].format(
proxy=self.region,
region=self.region,
url=api_url
),
params=args
)
print(response.url)
return response.json()
Here is the response:
{"matches":[{"matchId":1878534497,"region":"NA","platformId":"NA1","matchMode":"CLASSIC","matchType":"MATCHED_GAME","matchCreation":1436223958539,"matchDuration":2097,"queueType":"RANKED_SOLO_5x5","mapId":11,"season":"SEASON2015","matchVersion":"5.12.0.348","participants":[{"teamId":200,"spell1Id":4,"spell2Id":7,"championId":15,"highestAchievedSeasonTier":"UNRANKED","timeline":{"creepsPerMinDeltas":{"zeroToTen":5.699999999999999,"tenToTwenty":6.9,"twentyToThirty":7.1},"xpPerMinDeltas":{"zeroToTen":358.5,"tenToTwenty":350.0,"twentyToThirty":364.20000000000005},"goldPerMinDeltas":{"zeroToTen":365.3,"tenToTwenty":337.5,"twentyToThirty":287.5},"csDiffPerMinDeltas":{"zeroToTen":-0.7,"tenToTwenty":-1.7000000000000004,"twentyToThirty":1.0999999999999999},"xpDiffPerMinDeltas":{"zeroToTen":-0.9000000000000057,"tenToTwenty":-114.75,"twentyToThirty":-121.19999999999999},"damageTakenPerMinDeltas":{"zeroToTen":480.5,"tenToTwenty":565.3,"twentyToThirty":1258.6},"damageTakenDiffPerMinDeltas":{"zeroToTen":-147.49999999999994,"tenToTwenty":-134.69999999999996,"twentyToThirty":15.0},"role":"DUO_CARRY","lane":"BOTTOM"},"masteries":[{"masteryId":4112,"rank":4},{"masteryId":4114,"rank":1},{"masteryId":4122,"rank":3},{"masteryId":4124,"rank":1},{"masteryId":4132,"rank":1},{"masteryId":4134,"rank":3},{"masteryId":4142,"rank":2},{"masteryId":4144,"rank":1},{"masteryId":4151,"rank":1},{"masteryId":4152,"rank":3},{"masteryId":4162,"rank":1},{"masteryId":4211,"rank":2},{"masteryId":4212,"rank":2},{"masteryId":4221,"rank":1},{"masteryId":4222,"rank":3},{"masteryId":4232,"rank":1}],"stats":{"winner":false,"champLevel":14,"item0":3031,"item1":0,"item2":3142,"item3":3035,"item4":1053,"item5":3250,"item6":3342,"kills":4,"doubleKills":1,"tripleKills":0,"quadraKills":0,"pentaKills":0,"unrealKills":0,"largestKillingSpree":3,"deaths":12,"assists":5,"totalDamageDealt":184710,"totalDamageDealtToChampions":27477,"totalDamageTaken":30740,"largestCriticalStrike":684,"totalHeal":2952,"minionsKilled":237,"neutralMinionsKilled":1,"neutralMinionsKilledTeamJungle":1,"neutralMinionsKilledEnemyJungle":0,"goldEarned":12074,"goldSpent":12065,"combatPlayerScore":0.....etc.}}]}]}
It'd be helpful if you can include a snippet of your code to see how you've actually used it.
However, from the exception, I think you're trying to do a
return response.json()
at line 25 of C:\Users\Mark\PycharmProjects\Riot_API_Challenger_Stats\RiotAPI.py
but the response is not in JSON format.
You can see the raw response with
print response.text
to see the string version of the response and check that the string is in JSON format.
It would be nice if you could post the actual code. But if you are unable to do so due to confidential reasons we can surmise some information from the stack trace.
You are using some HTTP protocol (SOAP/ResT) API to get a number( or series of numbers) in JSON format. One of these ID numbers has a character that is not expected or the JSON is invalid itself. Try and print the JSON request you receive before you pass it to see which one fails. Then create a unit test and run it try to analyze it with breakpoints.
Could be some sort of hyphenated or foreign character based on the database.

json.JSONDecoder().decode() can not work well

code is simple, but it can not work. I don't know the problem
import json
json_data = '{text: \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", key: \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'
my_data = json.JSONDecoder().decode(json_data)
print my_data
throw exption behinde:
Traceback (most recent call last):
File "D:\Python27\project\demo\digSeo.py", line 4, in <module>
my_data = json.JSONDecoder().decode(json_data)
File "D:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
Your json_data is not valid JSON.
In JSON, property names need to be in double quotes ("). Also, the double quotes terminating the string values don't need to be ecaped since you're already using single quotes (') for the string.
Example:
json_data = '{"text": "tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=", "key": "MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW"}'
The json module in Python standard library can work well, that's what a lot of people are using for their applications.
However these few lines of code that use this module have a small issue. The problem is that your sample data is not a valid JSON. The keys (text and key) should be quoted like this:
json_data = '{"text": \"tl4ZCTPzQD0k|rEuPwudrAfgBD3nxFIsSbb4qMoYWA=\", "key": \"MPm0ZIlk9|ADco64gjkJz2NwLm6SWHvW\"}'

Categories

Resources