How do I convert a list/dictionary into a Dataframe?

How do I convert a list/dictionary into a Dataframe? - python

I have a JSON response (sample below) that I'm trying to convert into a DataFrame. I've had several issues with the data being listed as columns (1 x 346), etc. I only need the 5 columns listed below:
area_name,
date,
month,
unemployment_rate,
year
Here's my code:
edd_ca_df = pd.DataFrame.from_dict(edd_ca, orient="index",
columns=["area_name", "month", "date", "year", "unemployment_rate"])
and here's a sample of the JSON response:
[[{'area_name': 'California',
'area_type': 'State',
'date': '1990-01-01T00:00:00.000',
'employment': '14099700',
'labor_force': '14953900',
'month': 'January',
'seasonally_adjusted_y_n': 'N',
'status_preliminary_final': 'Final',
'unemployment': '854200',
'unemployment_rate': '5.7',
'year': '1990'},
{'area_name': 'California',
'area_type': 'State',
'date': '1990-02-01T00:00:00.000',
'employment': '14206700',
'labor_force': '15049400',
'month': 'February',
'seasonally_adjusted_y_n': 'N',
'status_preliminary_final': 'Final',
'unemployment': '842800',
'unemployment_rate': '5.6',
'year': '1990'},
Any help would be greatly appreciated.

Since you have a list of dictionaries, this is as simple as passing all the data to a new DataFrame and specifying what columns you want to keep:
import pandas as pd
all_data = [{'area_name': 'California',
'area_type': 'State',
'date': '1990-01-01T00:00:00.000',
'employment': '14099700',
'labor_force': '14953900',
'month': 'January',
'seasonally_adjusted_y_n': 'N',
'status_preliminary_final': 'Final',
'unemployment': '854200',
'unemployment_rate': '5.7',
'year': '1990'},
{'area_name': 'California',
'area_type': 'State',
'date': '1990-02-01T00:00:00.000',
'employment': '14206700',
'labor_force': '15049400',
'month': 'February',
'seasonally_adjusted_y_n': 'N',
'status_preliminary_final': 'Final',
'unemployment': '842800',
'unemployment_rate': '5.6',
'year': '1990'}]
keep_columns = ['area_name','date','month','unemployment_rate','year']
df = pd.DataFrame(columns=keep_columns, data=all_data)
print(df)
Output
area_name date month unemployment_rate year
0 California 1990-01-01T00:00:00.000 January 5.7 1990
1 California 1990-02-01T00:00:00.000 February 5.6 1990

Related

How to extract information in a dictionary in json

data =
{'gems': [{'name': 'garnet', 'colour': 'red', 'month': 'January'},
{'name': 'emerald', 'colour': 'green', 'month': 'May'},
{'name': "cat's eye", 'colour': 'yellow', 'month': 'June'},
{'name': 'sardonyx', 'colour': 'red', 'month': 'August'},
{'name': 'peridot', 'colour': 'green', 'month': 'September'},
{'name': 'ruby', 'colour': 'red', 'month': 'December'}]}
How do I create a list of colours and then just find the months with the colour red?
I've tried for and if, but I keep getting the error message
string indices must be integers

Because you have dictionaries within a list, you can use a list-comprehension with nested if logic to filter out those values you don't want:
[x['month'] for x in data['gems'] if x['colour'] == 'red']
Returns:
['January', 'August', 'December']

Assuming that one wants the output as a dataframe, one can use pandas.json_normalize and pandas.DataFrame.query as follows
df = pd.json_normalize(data['gems']).query('colour == "red"')['month']
[Out]:
0 January
3 August
5 December
If one wants the index to be reset, one needs to pass pandas.DataFrame.reset_index as
df = pd.json_normalize(data['gems']).query('colour == "red"')['month'].reset_index(drop=True)
[Out]:
0 January
1 August
2 December

How to distinct (count), group by and sum data in DataFrame in Python?

I have the next DataFrame:
a = [{'order': '789', 'name': 'A', 'date': 20220501, 'sum': 15.1}, {'order': '456', 'name': 'A', 'date': 20220501, 'sum': 19}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 14.1}, {'order': '704', 'name': 'B', 'date': 20220502, 'sum': 22.9}, {'order': '700', 'name': 'B', 'date': 20220502, 'sum': 30.1}, {'order': '710', 'name': 'B', 'date': 20220502, 'sum': 10.5}]
df = pd.DataFrame(a)
print(df)
I need, to distinct (count) value in column order and to add values to the new column order_count, grouping by columns name and date, sum values in column sum.
I need to get the next result:

In your case do
out = df.groupby(['name','date'],as_index=False).agg({'sum':'sum','order':'nunique'})
Out[652]:
name date sum order
0 A 20220501 34.1 2
1 B 20220502 77.6 3

import pandas as pd
df[['name','date','sum']].groupby(by=['name','date']).sum().reset_index().rename(columns={'sum':'order_count'}).join(df[['name','date','sum']].groupby(by=['name','date']).count().reset_index().drop(['name','date'],axis=1))

pandas : drop duplicates in the same time when grouping by

im doing a simple groupby on my data as shown in the code below. Is there a manner to do it directly without the drop_duplicates please, in the same line of code?
Thank you
df_brut['Revenue'] = df_brut.groupby(['cod', 'date', 'zone'])['Revenue'].transform('sum')
df_brut = df_brut.drop_duplicates()
df_brut.columns = ['cod','date', 'zone','SUM_']
My data
data1 = {'date': ['2021-06', '2021-06', '2021-07', '2021-07', '2021-07', '2021-07'], 'cod': ['12', '12', '14', '15', '15', '18'], 'zone': ['LA', 'LA', 'LA', 'PARIS', 'PARIS', 'PARIS'], 'Revenue': [10, 20, 30, 50, 40, 10]}
df_brut= pd.DataFrame(data1)
the grouped data expected is
data2 = {'date': ['2021-06', '2021-07', '2021-07', '2021-07'], 'cod': ['12', '14', '15','18'], 'zone': ['LA', 'LA', 'PARIS', 'PARIS'], 'SUM_': [30, 30, 90, 10]}
df_grouped= pd.DataFrame(data2)

You could do:
(df_brut.groupby(['cod', 'date', 'zone'], as_index=False)['Revenue']
.sum()
.rename({'Revenue': '_SUM'}, axis=1)
)

Changing values by condition in Pandas Python, similar to the lookup function in Excel

Following dataframe df is given:
df = pd.DataFrame({'ISIN': ['Cash', 'CH0038863350', 'DE0007164600'],
'Country': ['United States', 'Switzerland', 'Germany'], 'Category': ['A', 'B', 'C']})
If value of ISIN is 'Cash' the value of 'Category' shall be changed to 'Cash'. Thus means df will become
df = pd.DataFrame({'ISIN': ['Cash', 'CH0038863350', 'DE0007164600'],
'Country': ['United States', 'Switzerland', 'Germany'], 'Category': ['Cash', 'B', 'C']})
How to do this?

How to find out difference of two dataframes in terms of column name using Python

I want to find out the difference between two data frame in terms of column names.
This is sample table1
d1 = {'row_num': [1, 2, 3, 4, 5], 'name': ['john', 'tom', 'bob', 'rock', 'jimy'], 'DoB': ['01/02/2010', '01/02/2012', '11/22/2014', '11/22/2014', '09/25/2016'], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA']}
df1 = pd.DataFrame(data = d)
df1['month'] = pd.DatetimeIndex(df['DoB']).month
df1['year'] = pd.DatetimeIndex(df['DoB']).year
This is sample table2
d2 = {'row_num': [1, 2, 3, 4, 5], 'name': ['john', 'tom', 'bob', 'rock', 'jimy'], 'DoB': ['01/02/2010', '01/02/2012', '11/22/2014', '11/22/2014', '09/25/2016'], 'Address': ['NY', 'NJ', 'PA', 'NY', 'CA']}
df2 = pd.DataFrame(data = d)
table 2 or df2 does not have the month and year column like df1. I want to find out which columns of df1 are missing in df2.
I know there's 'EXCEPT' in sql but how to do it using pandas/python , Any suggestions ?

There's a function meant just for this purpose: pd.Index.difference
df1.columns.difference(df2.columns)
Index(['month', 'year'], dtype='object')
And, the corresponding columns;
df1[df1.columns.difference(df2.columns)]
month year
0 1 2010
1 1 2012
2 11 2014
3 11 2014
4 9 2016

You can do:
[col for col in df1.columns if col not in df2.columns] to find the columns of df1 not in df2 and the output gives you a list of columns name

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I convert a list/dictionary into a Dataframe? - python

Related

How to extract information in a dictionary in json

How to distinct (count), group by and sum data in DataFrame in Python?

pandas : drop duplicates in the same time when grouping by

Changing values by condition in Pandas Python, similar to the lookup function in Excel

How to find out difference of two dataframes in terms of column name using Python

Categories

Resources