Creating JSON from multiple dataframes python - python

My code works perfect fine for 1 dataframe using the to_json
However now i would like to have a 2nd dataframe in this result.
So I thought creating a dictionary would be the answer.
However it produces the result below which is not practical.
Any help please
I was hoping to produce something a lot prettier without all the "\"
A simple good example
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
df.to_json(orient='records')
A simple bad example
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
{"result_1": df.to_json(orient='records')}
I also tried
jsonify({"result_1": df.to_json(orient='records')})
and
{"result_1": [df.to_json(orient='records')]}

Hi I think that you are on the right way.
My advice is to use also json.loads to decode json and create a list of dictionary.
As you said before we can create a pandas dataframe and then use df.to_json to convert itself.
Then use json.loads to json format data and create a dictionary to insert into a list e.g. :
data = {}
jsdf = df.to_json(orient = "records")
data["result"] = json.loads(jsdf)
Adding elements to dictionary as below you will find a situation like this:
{"result1": [{...}], "result2": [{...}]}
PS:
If you want to generate random values for different dataframe you can use faker library from python.
e.g.:
from faker import Faker
faker = Faker()
for n in range(5):
df.append(list(faker.profile().values()))
df = pd.DataFrame(df, columns=faker.profile().keys())

Related

How to prevent NaN when using str.lower in Python?

I'm looking to convert a column to lower case. The issue is there are some instances where the string within the column only contains numbers. In my real life case this is due to poor data entry. Instead of having these values converted to NaN, I would like to keep the numeric string as is. What is the best approach to achieving this?
Below is my current code and output
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
df['col'].str.lower()
Current Output
Desired Output
Just convert to column to strings first:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).str.lower())
Pre-Define the data as str format.
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']}, dtype=str)
print(df['col'].str.lower())
to add a slight variation to Tim Roberts' solution without using the .str accessor:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).apply(lambda x: x.lower()))

Create multiple empty DataFrames named from a list using a loop

I'm trying to create multiple empty DataFrames with a for loop where each DataFrame has a unique name stored in a list. Per the sample code below, I would like three empty DataFrames, one called A[], another B[] and the last one C[]. Thank you.
import pandas as pd
report=['A','B','C']
for i in report:
report[i]=pd.DataFrame()
It would be best to use a dictionary
import pandas as pd
report=['A','B','C']
df_dict = {}
for i in report:
df_dict[i]=pd.DataFrame()
print(df_dict['A'])
print(df_dict['B'])
print(df_dict['C'])
You should use dictionnary for that:
import pandas as pd
report={'A': pd.DataFrame(),'B': pd.DataFrame(),'C': pd.DataFrame()]
if you have a list of string or character containing the name, which is I think what you are really trying to do
name_dataframe = ['A', 'B', 'C']
dict_dataframe = {}
for name in name_dataframe:
dict_dataframe[name] = pd.Dataframe()
It is not a good practise, and you should probably use a dictionary to do this, but the below code gets the work done if you still need to do it, this will create the DataFrames in the memory with the names in the list report:
for i in report:
exec(i + ' = pd.DataFrame()')
And if you want to store the empty DataFrames in a list:
df_list = []
for i in report:
exec(i + ' = pd.DataFrame() \ndf_list.append(' + i+ ')')

how to display/view `sklearn.utils.Bunch` data set?

I am going through a tutorial that uses sklearn.utils.Bunch as a data set:
cal_housing = fetch_california_housing()
I'm running this on a Databricks notebook.
I've read through the documentation that I can find like
https://scikit-learn.org/stable/modules/generated/sklearn.utils.Bunch.html and search engines aren't yielding anything useful.
but how can I see/view what's in this data set?
If I understood correctly, you can convert it to pandas dataframe:
df = california_housing.fetch_california_housing()
calf_hous_df = pd.DataFrame(data= df.data, columns=df.feature_names)
calf_hous_df.sample(4)
Moreover, you can see attributes:
df.keys()
dict_keys(['data', 'target', 'feature_names', 'DESCR'])
the sklearn.utils.Bunch data can be viewed by using pandas to make it into a dataframe:
data = pd.DataFrame(cal_housing.data,columns=cal_housing.feature_names)
data

How to Create id for Mapping with plotly.express

I have a dataframe "states" that has each states's child poverty rate and json file called "us_states". I want to create a choropleth map using plotly express but I'm struggling to create the id column. Here is my entire code.
import pandas as pd
import json
import plotly.express as px
states = pd.read_csv('https://raw.githubusercontent.com/ngpsu22/Child-Poverty-State-Map/master/poverty_rate_map.csv')
us_states = pd.read_json('https://github.com/ngpsu22/Child-Poverty-State-Map/raw/master/gz_2010_us_040_00_500k.json')
state_id_map = {}
for feature in us_states['features']:
feature['id'] = feature['properties']['NAME']
state_id_map[feature['properties']['STATE']] = feature['id']
states['id'] = states['state'].apply(lambda x: state_id_map[x])
But I get this error:
KeyError: 'Maine'
Which since Maine is first in my data frame means that something is going wrong.
Any suggestions?
us_states.features is a dict
Use pd.json_normalize to extract the dict into a dataframe.
'geometry.coordinates' for each row is a large nested list
It's not clear what the loop is supposed to do, the data from the two dataframes can be joined together for easier access, using pd.merge.
us_states = pd.read_json('https://github.com/ngpsu22/Child-Poverty-State-Map/raw/master/gz_2010_us_040_00_500k.json')
# convert the dict to dataframe
us_states_features = pd.json_normalize(us_states.features, sep='_')
# the Name column is addressed with
us_states_features['properties_Name']
# join the two dataframe into one
df = pd.merge(states, us_states_features, left_on='state', right_on='properties_NAME')

python - Convert pandas dataframe to json or dict and then back to df with non-unique columns

I need to send a dataframe from a backend to a frontend and so first need to convert it either to an object that is JSON serialisable or directly to JSON. The problem being that I have some dataframes that don't have unique cols. I've looked into the orient parameter, to_json(), to_dict() and from_dict() methods but still can't get it to work...
The goal is to be able to convert the df to something json serializable and then back to its initial self.
I'm also having a hard time copy-pasting it using pd.read_clipboard so I've included a sample df causing problems as an image (sorry!).
I found a way to make it work.
Here is a simple reproducible example:
import pandas as pd
import json
# create simple df with two identical named columns
df = pd.DataFrame([[1, 2, 3, 4]], columns=['col1', 'col2', 'col1', 'col2'])
# orient='split' conservers order
jsonized_df = df.to_json(orient='split')
# suppose the df is part of a bigger data structure being sent to another app
random_dict = {'foo': 'bar'}
all_data = [random_dict, jsonized_df]
data_to_frontend = json.dumps(jsonized_df)
# then from the other app
all_data = json.loads(data_to_frontend)
final_df = pd.read_json(all_data[1], orient='split') #important to remember to include the orient parameter when reading the json df as well!
The final_df will be identical to the initial_df with order preserved!

Categories

Resources