Combine series by date

Combine series by date - python

The following 2 series of stocks in a single excel file:
Can be combined using the date as index?
The result should be like this:

You need a simple df.merge() here:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
OR
df = df1.join(df2, how='outer')

I am trying this:
df3 = pd.concat([df1, df2]).sort_values('Date').reset_index(drop=True)
or
df3 = df1.append(df2).sort_values('Date').reset_index(drop=True)

Related

pandas merge by excluding certain columns from merge

I want to merge two dataframes like:
df1.columns = A, B, C, E, ..., D
df2.columns = A, B, C, F, ..., D
If I merge them, it merges on all columns. Also since the number of columns is high I don't want to specify them in on. I prefer to exclude the columns which I don't want to be merged. How can I do that?
mdf = pd.merge(df1, df2, exclude D)
I expect the result be like:
mdf.columns = A, B, C, E, F ..., D_x, D_y

You mentioned you mentioned you don't want to use on "since the number of columns is much".
You could still use on this way even if there are a lot of columns:
mdf = pd.merge(df1, df2, on=[i for i in df1.columns if i != 'D'])
Or
By using pd.Index.difference
mdf = pd.merge(df1, df2, on=df1.columns.difference(['D']).tolist())

Another solution can be:
mdf = pd.merge(df1, df2, on= df1.columns.tolist().remove('D')

What about dropping the unwanted column after the merge?
You can use pandas.DataFrame.drop:
mdf = pd.merge(df1, df2).drop('D', axis=1)
or dropping before the merge:
mdf = pd.merge(df1.drop('D', axis=1), df2.drop('D', axis=1))

One solution is using intersection and then difference on df1 and df2 columns:
mdf = pd.merge(df1, df2, on=df1.columns.intersection(df2.columns).difference(['D']).tolist())
The other solution could be renaming columns you want to exclude from merge:
df2.rename(columns={"D":"D_y"}, inplace=True)
mdf = pd.merge(df1, df2)

python pandas loops to melt or pivot multiple df

I have several df with the same structure. I'd like to create a loop to melt them or create a pivot table.
I tried the following but are not working
my_df = [df1, df2, df3]
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
for df in my_df:
df = pd.pivot_table(df, values = 'my_value', index = ['A','B','C'], columns = ['my_column'])
Any help would be great. Thank you in advance

You need assign output to new list of DataFrames:
out = []
for df in my_df:
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
out.append(df)
Same idea in list comprehension:
out = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]
If need overwitten origional values in list:
for i, df in enumerate(my_df):
df = pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value')
my_df[i] = df
print (my_df)
If need overwrite variables df1, df2, df3:
df1, df2, df3 = [pd.melt(df, id_vars=['A','B','C'], value_name = 'my_value') for df in my_df]

How to append two completely different data sets in python?

I have to append two data sets. They have completely different rows and columns. I have tried the command:
df1 = pd.merge(df1, df2)but it gives an error.Data Frame 1
Data Frame 2

if they have the same number of columns and are on the same order, you could do :
df2.columns = df1.columns
df_concat = pd.concat([df1, df2], ignore_index=True)

How to do left outer join exclusion in pandas

I have two dataframes, A and B, and I want to get those in A but not in B, just like the one right below the top left corner.
Dataframe A has columns ['a','b' + others] and B has columns ['a','b' + others]. There are no NaN values. I tried the following:
1.
dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) | (~dfA['b'].isin(dfm['b'])
2.
dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) & (~dfA['b'].isin(dfm['b'])
3.
dfe = dfA[(~dfA['a'].isin(dfB['a']) | (~dfA['b'].isin(dfB['b'])
4.
dfe = dfA[(~dfA['a'].isin(dfB['a']) & (~dfA['b'].isin(dfB['b'])
but when I get len(dfm) and len(dfe), they don't sum up to dfA (it's off by a few numbers). I've tried doing this on dummy cases and #1 works, so maybe my dataset may have some peculiarities I am unable to reproduce.
What's the right way to do this?

Check out this link
df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True)
df = df[df['_merge'] == 'left_only']
One liner :
df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True
).query('_merge=="left_only"')

I think it would go something like the examples in: Pandas left outer join multiple dataframes on multiple columns
dfe = pd.merge(dFA, dFB, how='left', on=['a','b'], indicator=True)
dfe[dfe['_merge'] == 'left_only']

Intersection of pandas dataframe with multiple columns

I have a list of dataframes as:
[df1, df2, df3, ..., df100, oddDF]
Each dataframe dfi has DateTime as column1 and Temperature as column2. Except the dataframe oddDF which has DateTime as column1 and has temperature columns in column2 and column3.
I am looking to create a list of dataframe or one dataframe which has the common temperatures from each of df1, .. df100 and oddDF
I am trying the following:
dfs = [df0, df1, df2, .., df100, oddDF]
df_final = reduce(lambda left,right: pd.merge(left,right,on='DateTime'), dfs)
But it produces df_final as empty
If however I do just:
dfs = [df0, df1, df2, .., df100]
df_final = reduce(lambda left,right: pd.merge(left,right,on='DateTime'), dfs)
df_final produces the right answer.
How do I incorporate oddDF in the code also. I have checked to make sure that oddDF's DateTime column has the common dates with
df1, df2, .., df100

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combine series by date - python

The following 2 series of stocks in a single excel file: Can be combined using the date as index? The result should be like this:

You need a simple df.merge() here: df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer') OR df = df1.join(df2, how='outer')

I am trying this: df3 = pd.concat([df1, df2]).sort_values('Date').reset_index(drop=True) or df3 = df1.append(df2).sort_values('Date').reset_index(drop=True)

Related

pandas merge by excluding certain columns from merge

python pandas loops to melt or pivot multiple df

How to append two completely different data sets in python?

How to do left outer join exclusion in pandas

Intersection of pandas dataframe with multiple columns

Categories

Resources