I am trying to re-write a json file to add missing data values. but i cant seem to get the code to re-write the data on the json file. Here is the code to fill in missing data:
import pandas as pd
import json
data_df = pd.read_json("Data_test.json")
#replacing empty strings with nan
df2 = data_df.mask(data_df == "")
#filling the nan with data from above.
df2["Food_cat"].fillna(method="ffill", inplace=True,)
"Data_test.json" is the file with the list of dictionary and I am trying to either edit this json file or create a new one with the filled in data that was missing.
I have tried using
with open('complete_data', 'w') as f:
json.dump(df2, f)
but it does not seem to work. is there a way to edit the current data or create a new json file with the completed data?
this is the original data, I would like to keep this format.
Try to do this
import pandas as pd
import json
data_df = pd.read_json("Data_test.json")
#replacing empty strings with nan
df2 = data_df.mask(data_df == "")
#filling the nan with data from above.
df2["Food_cat"].fillna(method="ffill", inplace=True,)
df2.to_json('path_of_file.json')
Tell me if it works.
I have downloaded a sample dataset from here that is a series of JSON objects.
{...}
{...}
I need to load them to a pandas dataframe. I have tried below code
import pandas as pd
import json
filename = "sample-S2-records"
df = pd.DataFrame.from_records(map(json.loads, "sample-S2-records"))
But there seems to be parsing error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
What am I missing?
You can try pandas.read_json method:
import pandas as pd
data = pd.read_json('/path/to/file.json', lines=True)
print data
I have tested it with this file, it works fine
The function needs a list of JSON objects. For example,
data = [ json_obj_1,json_obj_2,....]
The file does not contain the syntax for list and just has series of JSON objects. Following would solve the issue:
import pandas as pd
import json
# Load content to a variable
with open('../sample-S2-records/sample-S2-records', 'r') as content_file:
content = content_file.read().strip()
# Split content by new line
content = content.split('\n')
# Read each line which has a json obj and store json obj in a list
json_list = []
for each_line in content:
json_list.append(json.loads(each_line))
# Load the json list in form of a string
df = pd.read_json(json.dumps(json_list))
I am using python 3.6 and trying to download json file (350 MB) as pandas dataframe using the code below. However, I get the following error:
data_json_str = "[" + ",".join(data) + "]
"TypeError: sequence item 0: expected str instance, bytes found
How can I fix the error?
import pandas as pd
# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"
# now, load it into pandas
data_df = pd.read_json(data_json_str)
From your code, it looks like you're loading a JSON file which has JSON data on each separate line. read_json supports a lines argument for data like this:
data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
Note
Remove lines=True if you have a single JSON object instead of individual JSON objects on each line.
Using the json module you can parse the json into a python object, then create a dataframe from that:
import json
import pandas as pd
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
data = json.load(f)
df = pd.DataFrame(data)
If you open the file as binary ('rb'), you will get bytes. How about:
with open('C:/Users/Alberto/nutrients.json', 'rU') as f:
Also as noted in this answer you can also use pandas directly like:
df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)
if you want to convert it into an array of JSON objects, I think this one will do what you want
import json
data = []
with open('nutrients.json', errors='ignore') as f:
for line in f:
data.append(json.loads(line))
print(data[0])
The easiest way to read json file using pandas is:
pd.read_json("sample.json",lines=True,orient='columns')
To deal with nested json like this
[[{Value1:1},{value2:2}],[{value3:3},{value4:4}],.....]
Use Python basics
value1 = df['column_name'][0][0].get(Value1)
Please the code below
#call the pandas library
import pandas as pd
#set the file location as URL or filepath of the json file
url = 'https://www.something.com/data.json'
#load the json data from the file to a pandas dataframe
df = pd.read_json(url, orient='columns')
#display the top 10 rows from the dataframe (this is to test only)
df.head(10)
Please review the code and modify based on your need. I have added comments to explain each line of code. Hope this helps!
CSV file as stack.csv
PROBLEM_CODE;OWNER_EMAIL;CALENDAR_YEAR;CALENDAR_QUARTER
CONFIG_ASSISTANCE;dalangle#gmail.com;2014;2014Q3
ERROR_MESSAGES;aganju#gmail.com;2014;2014Q3
PASSWORD_RECOV;dalangle#gmail.com;2014;2014Q3
ERROR_MESSAGES;biyma#gmail.com;2014;2014Q3
ERROR_MESSAGES;derrlee#gmail.com;2014;2014Q3
SOFTWARE_FAILURE;dalangle#gmail.com;2014;2014Q3
ERROR_MESSAGES;maariano#gmail.com;2014;2014Q3
SOFTWARE_FAILURE;dalangle#gmail.com;2014;2014Q3
My Code:
import pandas as pd
import csv
data = pd.read_csv('stack.csv', sep='delimiter')
min_indices = (data['OWNER_EMAIL'] == dalangle#gmail.com)
data = data[min_indices]
data.to_csv('isabevdata.csv')
Error:
KeyError: 'OWNER_EMAIL'
I need help with this code using pandas. I want to remove some columns later on from the result: isabevdata.csv --> using petl module and then send the table to front end for display
I just recently discovered the power of pandas. (Thanks Wes McKinney!) I have a csv that contains the following information:
RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11
Normally, I do not use pandas for this process. I use the csv library to generate a lists. Convert them using the datetime library. I then loop through each line and run something like the following to get the sorted index of each row:
'"' + ','.join(map(str, sorted(range(len(dates)), key=lambda k: dates[k]))) + '"'
It then returns something like this for each line:
Out[40]: '"1,0,2,3"'
I then I add it at the end of each line as a new field in my csv.
I can read the csv into pandas and convert the items to the date dtype. I am just unsure how to go about getting the sorted index values using pandas and then flattening them into a string and putting them into a column? Any help most appreciated!
You can use numpy.argsort() to get the sort index:
from StringIO import StringIO
import numpy as np
import pandas as pd
txt = """RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE
2013-01-24,2013-01-02,2013-01-30,2013-02-03
2013-01-30,2013-01-21,2013-01-13,2013-01-06
2013-01-29,2013-01-28,2013-01-01,2013-01-29
2013-02-16,2013-02-12,2013-01-04,2013-02-11
2013-01-06,2013-02-07,2013-02-25,2013-02-12
2013-01-26,2013-01-28,2013-02-12,2013-01-10
2013-01-26,2013-02-10,2013-01-12,2013-01-30
2013-01-03,2013-01-24,2013-01-19,2013-01-02
2013-01-22,2013-01-13,2013-02-03,2013-02-05
2013-02-06,2013-01-16,2013-02-07,2013-01-11"""
df = pd.read_csv(StringIO(txt))
idx = np.argsort(df, axis=1)
buf = StringIO()
idx.to_csv(buf, index=False, header=False)
print buf.getvalue()
the output:
1,0,2,3
3,2,1,0
2,1,0,3
2,3,1,0
0,1,3,2
3,0,1,2
2,0,3,1
3,0,2,1
1,0,2,3
3,1,0,2