combining dataframes and adding values on common date index - python

I have many dataframes with one column (same name in all) whose indexes are date ranges - I want to merge/combine these dataframes into one, summing the values where any dates are common. below is a simplified example
range1 = pd.date_range('2021-10-01','2021-11-01')
range2 = pd.date_range('2021-11-01','2021-12-01')
df1 = pd.DataFrame(np.random.rand(len(range1),1), columns=['value'], index=range1)
df2 = pd.DataFrame(np.random.rand(len(range2),1), columns=['value'], index=range2)
here '2021-11-01' appears in both df1 and df2 with different values
I would like to obtain a single dataframe of 62 rows (32+31-1) where the 2021-11-01 date contains the sum of its values in df1 and df2

We can use pd.concate() on the two dataframes, then df.reset_index() to get a new regular-integer index, rename the date column, and then use df.groupby().sum().
df = pd.concat([df1,df2]) # this gives 63 rows by 1 column, where the column is the values and the dates are the index
df = df.reset_index() # moves the dates to a column, now called 'index', and makes a new integer index
df = df.rename(columns={'index':'Date'}) #renames the column
df.groupby('Date').sum()

Related

How to map one dataframe to another dataframe for cross-sectional panel data?

I have df1 and df2, where df1 is a balanced panel of 20 stocks with daily datetime data. Due to missing days (weekends, holidays) I am assigning each day available to an integer of how many days I have (1-252). df2 is a 2 column matrix which maps each day to the integer.
df2
date integer
2020-06-26, 1
2020-06-29, 2
2020-06-30, 3
2020-07-01, 4
2020-07-02, 5
...
2021-06-25, 252
I would like to map these dates to every asset I have in df1 for each date, therefore returning a single column of (0-252) repeated for each asset.
So far I have tried this:
df3 = (df1.merge(df2, left_on='date', right_on='integer'))
which returns an empty dataframe - I dont think I'm fully understanding the logic here
Assuming both df1 and df2 having the same column label as date hence,
df3 = df1.merge(df2)

Rows into columns in a multi-index Pandas DataFrame

I have a DataFrame with a multi-index, one of dates and the other numbered from 0 to 1267 as the image shows.
How do I have the index 0-1267 as columns instead of rows and have the dates as the only row index?
Select some column and then use Series.unstack by first level:
df1 = df['CUMULATIVE FRACTION'].unstack(0)
Or if need MultiIndex in columns use DataFrame.unstack:
df2 = df.unstack(0)

Indexing pandas dataframe when column names are integers

I don't seem to be able to subset data using integer column names using loc command
# 6*4 data set with column names as x,y,8,9
df = pd.DataFrame(np.random.randint(0,10,(6,4)),
index=('a','b','c','1','2','3'),
columns=['x','y', 8, 9])
df2 = df.loc[:,:'x']
df3 = df.loc[:,:'8']
df2 works but df3 throws error.
You can do either:
df3 = df.loc[:,8]
To get only column 8
Or:
df3 = df.loc[:,df.columns[:list(df.columns).index(8)+1]]
To get all columns until column 8 (inclusive - remove +1 to get exclusive).

Check Series label does not exist in a separate DataFrame

I'm iterating over two separate dataframes, where one dataframe is a subset of the other. I need to ensure that only the columns in the set (df1) which are not contained in the subset (df2) pass the conditional statement.
In this case, it would be comparing the Series object during each iteration in df1 to the dataframe, df2. Ideally I would like to compare just the labels associated with each column, not the values contained in the columns. My code below. Any help would be greatly appreciated!
for i in df1:
for j in df2:
if df1[i] is not in df2:
...do some stuff between df1[i] and df2[j]
To find out if the values of df1 are in df2 you can use:
df1.isin(df2)
To find all values in df1 that are not in df2 you can use:
df1[~df1.isin(df2)]
The values that are in df1 and df2 will be a nan in this case

pandas - Subtract 2 similar pivot table of data frame

I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!

Categories

Resources