Unable to convert json file in dataframe

Unable to convert json file in dataframe - python

I am building an recommendation engine. This json file contains event data, I want to convert it into a dataframe. I tried read_json method but it give an error
UnicodeDecodeError:'charmap'codec can't decode byte 0x81
in position 21573281:charactermaps to <undefined>
Below is some entries from json:
{"_id":{"$oid":"57a30ce268fd0809ec4d194f"},"session":{"start_timestamp":{"$numberLong":"1470183490481"},"session_id":"def5faa9-20160803-001810481"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"OfferViewed","event_timestamp":{"$numberLong":"1470183505399"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{"Category":"120000","CustomerID":"4078","OfferID":"45436"}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1950"},"session":{"start_timestamp":{"$numberLong":"1470183490481"},"session_id":"def5faa9-20160803-001810481"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"ContextMenuItemSelected","event_timestamp":{"$numberLong":"1470183500206"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{"MenuItem":"OfferList","CustomerID":"4078"}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1951"},"session":{"start_timestamp":{"$numberLong":"1470183490481"},"session_id":"def5faa9-20160803-001810481"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"CategoryPageCategorySelection","event_timestamp":{"$numberLong":"1470183499171"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{"Category":"Recharge","CustomerID":"4078"}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1952"},"session":{"start_timestamp":{"$numberLong":"1470183490481"},"session_id":"def5faa9-20160803-001810481"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"_session.start","event_timestamp":{"$numberLong":"1470183490481"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{"CustomerID":"4078"}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1953"},"session":{"start_timestamp":{"$numberLong":"1470181311752"},"session_id":"def5faa9-20160802-234151752","stop_timestamp":{"$numberLong":"1470181484875"}},"metrics":{},"arrival_timestamp":{"$numberLong":"1470183523054"},"event_type":"_session.stop","event_timestamp":{"$numberLong":"1470183490480"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"5","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"2.0.0.0","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:2e26918b-f7b1-471e-9df4-b931509f7d37","client_id":"ee0b61b0-85cf-4b2f-960e-e2aedef5faa9"},"device":{"locale":{"country":"US","code":"en_US","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"YU","model":"AO5510"},"attributes":{}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1954"},"session":{"start_timestamp":{"$numberLong":"1470193238841"},"session_id":"7b606a93-20160803-030038841"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470193295093"},"event_type":"_session.start","event_timestamp":{"$numberLong":"1470193238844"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"2","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"1.0.2","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:e96515c9-5824-4c66-a42f-33cceb78b6e3","client_id":"efed74fd-40d8-41a2-b37e-e85c7b606a93"},"device":{"locale":{"country":"GB","code":"en_GB","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"samsung","model":"SM-J200G"},"attributes":{}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1955"},"session":{"start_timestamp":{"$numberLong":"1470193253960"},"session_id":"7b606a93-20160803-030053960","stop_timestamp":{"$numberLong":"1470193256359"}},"metrics":{},"arrival_timestamp":{"$numberLong":"1470193404776"},"event_type":"_session.stop","event_timestamp":{"$numberLong":"1470193278227"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"2","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"1.0.2","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:e96515c9-5824-4c66-a42f-33cceb78b6e3","client_id":"efed74fd-40d8-41a2-b37e-e85c7b606a93"},"device":{"locale":{"country":"GB","code":"en_GB","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"samsung","model":"SM-J200G"},"attributes":{}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1956"},"session":{"start_timestamp":{"$numberLong":"1470193253960"},"session_id":"7b606a93-20160803-030053960"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470193404776"},"event_type":"_session.start","event_timestamp":{"$numberLong":"1470193253960"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"2","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"1.0.2","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:e96515c9-5824-4c66-a42f-33cceb78b6e3","client_id":"efed74fd-40d8-41a2-b37e-e85c7b606a93"},"device":{"locale":{"country":"GB","code":"en_GB","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"samsung","model":"SM-J200G"},"attributes":{}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1957"},"session":{"start_timestamp":{"$numberLong":"1470193238841"},"session_id":"7b606a93-20160803-030038841","stop_timestamp":{"$numberLong":"1470193244581"}},"metrics":{},"arrival_timestamp":{"$numberLong":"1470193404776"},"event_type":"_session.stop","event_timestamp":{"$numberLong":"1470193253959"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"2","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"1.0.2","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:e96515c9-5824-4c66-a42f-33cceb78b6e3","client_id":"efed74fd-40d8-41a2-b37e-e85c7b606a93"},"device":{"locale":{"country":"GB","code":"en_GB","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"samsung","model":"SM-J200G"},"attributes":{}}
{"_id":{"$oid":"57a30ce268fd0809ec4d1958"},"session":{"start_timestamp":{"$numberLong":"1470193331290"},"session_id":"7b606a93-20160803-030211290"},"metrics":{},"arrival_timestamp":{"$numberLong":"1470193404776"},"event_type":"_session.start","event_timestamp":{"$numberLong":"1470193331291"},"event_version":"3.0","application":{"package_name":"com.think.vito","title":"Vito","version_code":"2","app_id":"7ffa58dab3c646cea642e961ff8a8070","cognito_identity_pool_id":"us-east-1:4d9cf803-0487-44ec-be27-1e160d15df74","version_name":"1.0.2","sdk":{"version":"2.2.2","name":"aws-sdk-android"}},"client":{"cognito_id":"us-east-1:e96515c9-5824-4c66-a42f-33cceb78b6e3","client_id":"efed74fd-40d8-41a2-b37e-e85c7b606a93"},"device":{"locale":{"country":"GB","code":"en_GB","language":"en"},"platform":{"version":"5.1.1","name":"ANDROID"},"make":"samsung","model":"SM-J200G"},"attributes":{}}

Wrong encoding. Explicitely read it as utf-8 e.g. (edit: +'dirty' Line Feeds (LF aka. \n)
with open(datafilename, encoding="utf8") as f:
# Reading file as list of lines
data = f.readlines()
# Removing useless whitespaces
data = [line.rstrip() for line in data]
# Joining lines together
data = ''.join(data)
# Loading dataframe from json str
df = pandas.read_json(datafile)

You could try using:
import json
with open('myfile.json') as json_data:
d = json.load(json_data)
print(d)
Without more info its difficult to advise.

As the error says, you have an issue with the encoding. When you read in the file, you need to change the encoding:
file = open(filename, encoding="utf8")

Related

Convert individual json objects without delimiter to valid json file which can be processed at once

I have a JSON Lines format text file in which each line contains a valid JSON object. However,these JSON objects are not separated by a delimiter, so the file on a whole is not a valid JSON file.
I want to add a comma after each JSON object, so as to make the the whole file a valid JSON file, which can be processed at once using json.load().
I have written the following code to add a comma at the end of each line:
import json
import csv
testdata = open('resutdata.csv', 'wb')
csvwriter = csv.writer(testdata)
with open('data.json') as f:
for line in f:
csvwriter.writerow([json.loads(line), ','])
testdata.close()
However, the csv file obtained adds a each line with quotes and a comma with quotes at the end. How do I solve my problem?

As you need to convert json lines to json file, you can directly convert it into json file as follows:
import json
# Contains the output json file
resultfile = open('resultdata.json', 'wt')
data = []
with open('data.json') as f:
for line in f:
data.append(json.loads(line))
resultfile.write(json.dumps(data))
resultfile.close()

Error loading json in python line by line?

Here is my json file format,
[{
"name": "",
"official_name_en": "Channel Islands",
"official_name_fr": "Îles Anglo-Normandes",
}, and so on......
while loading the above json which is in a file I get this error,
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:
here is my python code,
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))

,} is not allowed in JSON (I guess that's the problem according to the data given).

You appear to be processing the entire file one line at a time. Why not simply use .read() to get the entire contents at once, then feed that to json?
with open('file') as f:
contents = f.read()
data = json.loads(contents)
Better yet, why not use json.load() to pass the readable directly and let it handle the slurping?
with open('file') as f:
data = json.load(f)

The problem is in your reading and decoding the file line by line. Any single line in your file (e.g., "[{") is not a valid JSON expression.

Your individual lines are not valid JSON. For instance, the first line '[{' by itself is not a valid JSON. If your entire file is actually valid JSON and you want individual lines, first load the entire JSON and then browse through the python dictionary.
import json
data = json.loads(open('file').read()) # this should be a list
for list_item in data:
print(list_item['name'])

Try to read a json file with python

I have a json file that is a synonime dicitonnary in French (I say French because I had an error message with ascii encoding... due to the accents 'é',etc). I want to read this file with python to get a synonime when I input a word.
Well, I can't even read my file...
That's my code:
data=[]
with open('sortieDES.json', encoding='utf-8') as data_file:
data = json.loads(data_file.read())
print(data)
So I have a list quite ugly, but my question is: how can I use the file like a dictionary ? I want to input data['Académie']and have the list of the synonime... Here an example of the json file:
{"Académie française":{
"synonymes":["Institut","Quai Conti","les Quarante"]
}

You only need to call json.load on the File object (you gave it the name data_file):
data=[]
with open('sortieDES.json', encoding='utf-8') as data_file:
data = json.load(data_file)
print(data)

Instead of
json.load(line)
you have to use
json.loads(line)
Your s is missing in loads(...)

Delete every non utf-8 symbols from string

I have a big amount of files and parser. What I Have to do is strip all non utf-8 symbols and put data in mongodb.
Currently I have code like this.
with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('utf-8', 'ignore')
line = line.encode('utf-8', 'ignore')
somehow I still get an error
bson.errors.InvalidStringData: strings in documents must be valid UTF-8:
1/b62010montecassianomcir\xe2\x86\x90ta0\xe2\x86\x90008923304320733/290066010401040101506055soccorin
I don't get it. Is there some simple way to do it?
UPD: seems like Python and Mongo don't agree about definition of Utf-8 Valid string.

Try below code line instead of last two lines. Hope it helps:
line=line.decode('utf-8','ignore').encode("utf-8")

For python 3, as mentioned in a comment in this thread, you can do:
line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
The 'ignore' parameter prevents an error from being raised if any characters are unable to be decoded.
If your line is already a bytes object (e.g. b'my string') then you just need to decode it with decode('utf-8', 'ignore').

Example to handle no utf-8 characters
import string
test=u"\n\n\n\n\n\n\n\n\n\n\n\n\n\nHi <<First Name>>\nthis is filler text \xa325 more filler.\nadditilnal filler.\n\nyet more\xa0still more\xa0filler.\n\n\xa0\n\n\n\n\nmore\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nfiller.\x03\n\t\t\t\t\t\t almost there \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nthe end\n\n\n\n\n\n\n\n\n\n\n\n\n"
print ''.join(x for x in test if x in string.printable)

with open(fname, "r") as fp:
for line in fp:
line = line.strip()
line = line.decode('cp1252').encode('utf-8')

Valid JSON in text file but python json.loads gives "JSON object could be decoded"

I have a valid JSON (checked using Lint) in a text file.
I am loading the json as follows
test_data_file = open('jsonfile', "r")
json_str = test_data_file.read()
the_json = json.loads(json_str)
I have verified the json data in file on Lint and it shows it as valid. However the json.loads throws
ValueError: No JSON object could be decoded
I am a newbie to Python so not sure how to do it the right way. Please help
(I assume it has something to do it encoding the string to unicode format from utf-8 as the data in file is retrieved as a string)

I tried with open('jsonfile', 'r') and it works now.
Also I did the following on the file
json_newfile = open('json_newfile', 'w')
json_oldfile = open('json_oldfile', 'r')
old_data = json_oldfile.read()
json.dump(old_data, json_newfile)
and now I am reading the new file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unable to convert json file in dataframe - python

You could try using: import json with open('myfile.json') as json_data: d = json.load(json_data) print(d) Without more info its difficult to advise.

As the error says, you have an issue with the encoding. When you read in the file, you need to change the encoding: file = open(filename, encoding="utf8")

Related

Convert individual json objects without delimiter to valid json file which can be processed at once

Error loading json in python line by line?

Try to read a json file with python

Delete every non utf-8 symbols from string

Valid JSON in text file but python json.loads gives "JSON object could be decoded"

Categories

Resources