Using python and pandas, I would like to achieve the output below. Whenever there are Null or Nan values present in the file then it needs to print the both row number and column name.
import pandas as pd
# List of Tuples
employees = [('Stuti', 'Null', 'Varanasi', 20000),
('Saumya', 'NAN', 'NAN', 35000),
('Saumya', 32, 'Delhi', 30000),
('Aaditya', 40, 'Dehradun', 24000),
('NAN', 45, 'Delhi', 70000)
]
# Create a DataFrame object from list
df = pd.DataFrame(employees,
columns =['Name', 'Age',
'City', 'Salary'])
print(df)
Expected Output:
Row 0: column Age missing
Row 1: Column Age, column City missing
Row 4: Column Name missing
Try isin to mask the missing values, then matrix multiply # with the columns to concatenate them:
s = df.isin(['Null','NAN'])
missing = s.loc[s.any(1)] # ('column ' + df.columns + ', ')
for r, val in missing.str[:-2].items():
print(f'Row {r}: {val} is missing')
Output:
Row 0: column Age is missing
Row 1: column Age, column City is missing
Row 4: column Name is missing
I pulled a list of historical option price of AAPL from the RobinHoood function robin_stocks.get_option_historicals(). The data was returned in a form of dictional of list of dictionary as shown below.
I am having difficulties to convert the below object (named historicalData) into a DataFrame. Can someone please help?
historicalData = {'data_points': [{'begins_at': '2020-10-05T13:30:00Z',
'open_price': '1.430000',
'close_price': '1.430000',
'high_price': '1.430000',
'low_price': '1.430000',
'volume': 0,
'session': 'reg',
'interpolated': False},
{'begins_at': '2020-10-05T13:40:00Z',
'open_price': '1.430000',
'close_price': '1.340000',
'high_price': '1.440000',
'low_price': '1.320000',
'volume': 0,
'session': 'reg',
'interpolated': False}],
'open_time': '0001-01-01T00:00:00Z',
'open_price': '0.000000',
'previous_close_time': '0001-01-01T00:00:00Z',
'previous_close_price': '0.000000',
'interval': '10minute',
'span': 'week',
'bounds': 'regular',
'id': '22b49380-8c50-4c76-8fb1-a4d06058f91e',
'instrument': 'https://api.robinhood.com/options/instruments/22b49380-8c50-4c76-8fb1-a4d06058f91e/'}
I tried the below code code but that didn't help:
import pandas as pd
df = pd.DataFrame(historicalData)
df
You didn't write that you want only data_points (as in the
other answer), so I assume that you want your whole dictionary
converted to a DataFrame.
To do it, start with your code:
df = pd.DataFrame(historicalData)
It creates a DataFrame, with data_points "exploded" to
consecutive rows, but they are still dictionaries.
Then rename open_price column to open_price_all:
df.rename(columns={'open_price': 'open_price_all'}, inplace=True)
The reason is to avoid duplicated column names after join
to be performed soon (data_points contain also open_price
attribute and I want the corresponding column from data_points
to "inherit" this name).
The next step is to create a temporary DataFrame - a split of
dictionaries in data_points to individual columns:
wrk = df.data_points.apply(pd.Series)
Print wrk to see the result.
And the last step is to join df with wrk and drop
data_points column (not needed any more, since it was
split into separate columns):
result = df.join(wrk).drop(columns=['data_points'])
This is pretty easy to solve with the below. I have chucked the dataframe to a list via list comprehension
import pandas as pd
df_list = [pd.DataFrame(dic.items(), columns=['Parameters', 'Value']) for dic in historicalData['data_points']]
You then could do:
df_list[0]
which will yield
Parameters Value
0 begins_at 2020-10-05T13:30:00Z
1 open_price 1.430000
2 close_price 1.430000
3 high_price 1.430000
4 low_price 1.430000
5 volume 0
6 session reg
7 interpolated False
I have a few Python dataframes in Pandas, I want to loop through them to find out which data frame meet my rows' criteria and save it in a new data frame.
d = {'Count' : ['10', '11', '12', '13','13.4','12.5']}
df_1= pd.DataFrame(data=d)
df_1
d = {'Count' : ['10', '-11', '-12', '13','16','2']}
df_2= pd.DataFrame(data=d)
df_2
Here is the logic I want to use, but it does not contain the right syntax,
for df in (df_1,df_2)
if df['Count'][0] >0 and df['Count'][1] >0 and df['Count'][2]>0 and df['Count'][3]>0
and (df['Count'][4] is between df['Count'][3]+0.5 and df['Count'][3]-0.5) is True:
df.save
The correct output is df_1... because it meets my condition. How do I create a new DataFrame or LIST to save the result as well?
Let me know if you have any questions in the comments. Main updates I made to your code was:
Replacing your chained indexing with .loc
Consolidating your first few separate and'd comparisons into a comparison on a slice of the series, reduced down to a single T/F with .all()
Code below:
import pandas as pd
# df_1 & df_2 input taken from you
d = {'Count' : ['10', '11', '12', '13','13.4','12.5']}
df_1= pd.DataFrame(data=d)
d = {'Count' : ['10', '-11', '-12', '13','16','2']}
df_2= pd.DataFrame(data=d)
# my solution here
df_1['Count'] = df_1['Count'].astype('float')
df_2['Count'] = df_2['Count'].astype('float')
my_dataframes = {'df_1': df_1, 'df_2': df_2}
good_dataframes = []
for df_name, df in my_dataframes.items():
if (df.loc[0:3, 'Count'] > 0).all() and (df.loc[3,'Count']-0.5 <= df.loc[4, 'Count'] <= df.loc[3, 'Count']+0.5):
good_dataframes.append(df_name)
good_dataframes_df = pd.DataFrame({'good': good_dataframes})
TEST:
>>> print(good_dataframes_df)
good
0 df_1
I have a dataframe df
df
Object Action Cost1 Cost2
0 123 renovate 10000 2000
1 456 do something 0 10
2 789 review 1000 50
and a dictionary (called dictionary)
dictionary
{'Object_new': ['Object'],
'Action_new': ['Action'],
'Total_Cost': ['Cost1', 'Cost2']}
Further, I have a (at the beginning empty) dataframe df_new that should contain almost the identicall information as df, except that the column names need to be different (naming according to the dictionary) and that some columns from df should be consolidated (e.g. a sum-operation) based on the dictionary.
The result should look like this:
df_new
Object_new Action_new Total_Cost
0 123 renovate 12000
1 456 do something 10
2 789 review 1050
How can I achieve this result using only the dictionary? I tried to use the .map() function but could not figure out how to perform the sum-operation with it.
The code to reproduce both dataframes and the dictionary are attached:
# import libraries
import pandas as pd
### create df
data_df = {'Object': [123, 456, 789],
'Action': ['renovate', 'do something', 'review'],
'Cost1': [10000, 0, 1000],
'Cost2': [2000, 10, 50],
}
df = pd.DataFrame(data_df)
### create dictionary
dictionary = {'Object_new':['Object'],
'Action_new':['Action'],
'Total_Cost' : ['Cost1', 'Cost2']}
### create df_new
# data_df_new = pd.DataFrame(columns=['Object_new', 'Action_new', 'Total_Cost' ])
data_df_new = {'Object_new': [123, 456, 789],
'Action_new': ['renovate', 'do something', 'review'],
'Total_Cost': [12000, 10, 1050],
}
df_new = pd.DataFrame(data_df_new)
A play with groupby:
inv_dict = {x:k for k,v in dictionary.items() for x in v}
df_new = df.groupby(df.columns.map(inv_dict),
axis=1).sum()
Output:
Action_new Object_new Total_Cost
0 renovate 123 12000
1 do something 456 10
2 review 789 1050
Given the complexity of your algorithm, I would suggest performing a Series addition operation to solve this problem.
Why? In Pandas, every column in a DataFrame works as a Series under the hood.
data_df_new = {
'Object_new': df['Object'],
'Action_new': df['Action'],
'Total_Cost': (df['Cost1'] + df['Cost2']) # Addition of two series
}
df_new = pd.DataFrame(data_df_new)
Running this code will map every value contained in your dataset, which will be stored in our dictionary.
You can use an empty data frame to copy the new column and use the to_dict to convert it to a dictionary.
import pandas as pd
import numpy as np
data_df = {'Object': [123, 456, 789],
'Action': ['renovate', 'do something', 'review'],
'Cost1': [10000, 0, 1000],
'Cost2': [2000, 10, 50],
}
df = pd.DataFrame(data_df)
print(df)
MyEmptydf = pd.DataFrame()
MyEmptydf['Object_new']=df['Object']
MyEmptydf['Action_new']=df['Action']
MyEmptydf['Total_Cost'] = df['Cost1'] + df['Cost2']
print(MyEmptydf)
dictionary = MyEmptydf.to_dict(orient="index")
print(dictionary)
you can run the code here:https://repl.it/repls/RealisticVillainousGlueware
If you trying to entirely avoid pandas and only use the dictionary this should solve it
Object = []
totalcost = []
action = []
for i in range(0,3):
Object.append(data_df['Object'][i])
totalcost.append(data_df['Cost1'][i]+data_df['Cost2'][i])
action.append(data_df['Action'][i])
dict2 = {'Object':Object, 'Action':action, 'TotalCost':totalcost}
I have a list of columns that I want to rename a portion of based on a list of values.
I am looking at a file which has 12 months of data and each month is a different column (I need to keep it in this specific format unfortunately). This file is generated once per month and I keep the column names more general since I have to do a lot of calculations on them based the month number (for example, I need to compare information against the average of month 8, 9, and 10 every month).
Here are the columns I want to rename:
['month_1_Sign',
'month_2_Sign',
'month_3_Sign',
'month_4_Sign',
'month_5_Sign',
'month_6_Sign',
'month_7_Sign',
'month_8_Sign',
'month_9_Sign',
'month_10_Sign',
'month_11_Sign',
'month_12_Sign',
'month_1_Actual',
'month_2_Actual',
'month_3_Actual',
'month_4_Actual',
'month_5_Actual',
'month_6_Actual',
'month_7_Actual',
'month_8_Actual',
'month_9_Actual',
'month_10_Actual',
'month_11_Actual',
'month_12_Actual',
'month_1_Target',
'month_2_Target',
'month_3_Target',
'month_4_Target',
'month_5_Target',
'month_6_Target',
'month_7_Target',
'month_8_Target',
'month_9_Target',
'month_10_Target',
'month_11_Target',
'month_12_Target']
Here are the names I want to place:
required_date_range = sorted(list(pd.Series(pd.date_range((dt.datetime.today().date() + pd.DateOffset(months=-13)), periods=12, freq='MS')).dt.strftime('%Y-%m-%d')))
['2015-03-01',
'2015-04-01',
'2015-05-01',
'2015-06-01',
'2015-07-01',
'2015-08-01',
'2015-09-01',
'2015-10-01',
'2015-11-01',
'2015-12-01',
'2016-01-01',
'2016-02-01']
So month_12 columns (month_12 is always the latest month) would be changed to '2016-02-01_Sign', '2016-02-01_Actual', '2016-02-01_Target' in this example.
I tried doing this but it doesn't change anything (trying to change the month_# with the actual date it refers to):
final.replace('month_10', required_date_range[9], inplace=True)
final.replace('month_11', required_date_range[10], inplace=True)
final.replace('month_12', required_date_range[11], inplace=True)
final.replace('month_1', required_date_range[0], inplace=True)
final.replace('month_2', required_date_range[1], inplace=True)
final.replace('month_3', required_date_range[2], inplace=True)
final.replace('month_4', required_date_range[3], inplace=True)
final.replace('month_5', required_date_range[4], inplace=True)
final.replace('month_6', required_date_range[5], inplace=True)
final.replace('month_7', required_date_range[6], inplace=True)
final.replace('month_8', required_date_range[7], inplace=True)
final.replace('month_9', required_date_range[8], inplace=True)
You could construct a dict and then call map on the split column str:
In [27]:
d = dict(zip([str(x) for x in range(1,13)], required_date_range))
d
Out[27]:
{'1': '2015-03-01',
'10': '2015-12-01',
'11': '2016-01-01',
'12': '2016-02-01',
'2': '2015-04-01',
'3': '2015-05-01',
'4': '2015-06-01',
'5': '2015-07-01',
'6': '2015-08-01',
'7': '2015-09-01',
'8': '2015-10-01',
'9': '2015-11-01'}
In [26]:
df.columns = df.columns.to_series().str.rsplit('_').str[1].map(d) + '_' + df.columns.to_series().str.rsplit('_').str[-1]
df.columns
Out[26]:
Index(['2015-03-01_Sign', '2015-04-01_Sign', '2015-05-01_Sign',
'2015-06-01_Sign', '2015-07-01_Sign', '2015-08-01_Sign',
'2015-09-01_Sign', '2015-10-01_Sign', '2015-11-01_Sign',
'2015-12-01_Sign', '2016-01-01_Sign', '2016-02-01_Sign',
'2015-03-01_Actual', '2015-04-01_Actual', '2015-05-01_Actual',
'2015-06-01_Actual', '2015-07-01_Actual', '2015-08-01_Actual',
'2015-09-01_Actual', '2015-10-01_Actual', '2015-11-01_Actual',
'2015-12-01_Actual', '2016-01-01_Actual', '2016-02-01_Actual',
'2015-03-01_Target', '2015-04-01_Target', '2015-05-01_Target',
'2015-06-01_Target', '2015-07-01_Target', '2015-08-01_Target',
'2015-09-01_Target', '2015-10-01_Target', '2015-11-01_Target',
'2015-12-01_Target', '2016-01-01_Target', '2016-02-01_Target'],
dtype='object')
You're going to want to use the .rename method instead of the .replace! For instance this code:
import pandas as pd
d = {'a': [1, 2, 4], 'b':[2,3,4],'c':[3,4,5]}
df = pd.DataFrame(d)
df.rename(columns={'a': 'x1', 'b': 'x2'}, inplace=True)
Changes the 'a' and 'b' column title to 'x1' and 'x2' respectively.
The first line of the renaming code you have would change to:
final.rename(columns={'month_10':required_date_range[9]}, inplace=True)
In fact you could do every column in that one command by adding entries to the columns dictionary argument.
final.rename(columns={'month_10':required_date_range[9],
'month_9':required_date-range[8], ... (and so on) }, inplace=True)
from collections import product
df = pd.DataFrame(np.random.rand(3, 12 * 3), columns=['month_' + str(c[0]) + '_' + c[1] for c in product(range(1, 13), ['Sign', 'Actual', 'Target'])])
First create a mapping to the relevant months.
mapping = {'month_' + str(n): date for n, date in enumerate(required_date_range, 1)}
>>> mapping
{'month_1': '2015-03-01',
'month_10': '2015-12-01',
'month_11': '2016-01-01',
'month_12': '2016-02-01',
'month_2': '2015-04-01',
'month_3': '2015-05-01',
'month_4': '2015-06-01',
'month_5': '2015-07-01',
'month_6': '2015-08-01',
'month_7': '2015-09-01',
'month_8': '2015-10-01',
'month_9': '2015-11-01'}
Then reassign columns, joining the mapped month (e.g. '2016-02-01') to the rest of the column name. This was done using a list comprehension.
df.columns = [mapping.get(c[:c.find('_', 6)]) + c[c.find('_', 6):] for c in cols]
>>> df.columns.tolist()
['2015-03-01_Sign',
'2015-04-01_Sign',
'2015-05-01_Sign',
'2015-06-01_Sign',
'2015-07-01_Sign',
'2015-08-01_Sign',
'2015-09-01_Sign',
'2015-10-01_Sign',
'2015-11-01_Sign',
'2015-12-01_Sign',
'2016-01-01_Sign',
'2016-02-01_Sign',
'2015-03-01_Actual',
'2015-04-01_Actual',
'2015-05-01_Actual',
'2015-06-01_Actual',
'2015-07-01_Actual',
'2015-08-01_Actual',
'2015-09-01_Actual',
'2015-10-01_Actual',
'2015-11-01_Actual',
'2015-12-01_Actual',
'2016-01-01_Actual',
'2016-02-01_Actual',
'2015-03-01_Target',
'2015-04-01_Target',
'2015-05-01_Target',
'2015-06-01_Target',
'2015-07-01_Target',
'2015-08-01_Target',
'2015-09-01_Target',
'2015-10-01_Target',
'2015-11-01_Target',
'2015-12-01_Target',
'2016-01-01_Target',
'2016-02-01_Target']