This question already has answers here:
How do I combine two dataframes?
(8 answers)
Pandas Merging 101
(8 answers)
How to group dataframe rows into list in pandas groupby
(17 answers)
Closed 1 year ago.
I have two data frames with the same column names and the same indices. Each entry
in the data frames is an int or a float. I would like to combine the data frames into
a single data frame. I would like each entry of this data frame to be a list containing the individual elements from the separate data frames.
As an example, df1 and df2 are the original data frames:
A B
df1 = 0 0 1
A B
df2 = 0 2 3
I would like to produce the following dataframe:
A B
df3 = 0 [0, 2] [1, 3]
I tried the following:
merger = lambda s1, s2: s1.append(s2)
df1.combine(df2, merger)
This gives me the error:
ValueError: cannot reindex from a duplicate axis
I can think of a few ways to do it with loops but I'd like to avoid that if possible. It seems like this is something that should be built into pandas.
Cheers
Try with
out = pd.concat([df1,df2]).groupby(level=0).agg(list)
Related
This question already has answers here:
Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?
(3 answers)
Closed 2 years ago.
The below code will create a table containing max temps for each day. What I would like to do is return the Index for all these max temp values so I can apply to the original df
df = pd.DataFrame('date':list1,'max_temp':list2)
grouped = df.groupby(by=date,as_index=False).max()
You can define another column called "index" before sorting the dataframe:
import pandas as pd
list1 = [7, 9, 3, 4]
list2 = [8, 6, 8, 9]
df = pd.DataFrame({'date': list1, 'max_temp': list2})
df['index'] = df.index
grouped = df.groupby(by="date", as_index=False).max()
print(grouped)
Output:
date max_temp index
0 3 8 2
1 4 9 3
2 7 8 0
3 9 6 1
Now, using df.query, we can get a "date" value by the "column" index:
print(grouped.query("index==0")["date"])
Output:
2 7
Name: date, dtype: int64
df.groupby('date')['max_temp'].idxmax()
It would seem i've found a great solution from the following link...
Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?
(Although this doesn't seem to be the answer they accepted for some reason). Anyway, the following worked well for me if anyone finds them selves in the same position...
idx = df.groupby('date')['max_temp'].transform(max) == df['max_temp']
This question already has answers here:
Pandas Melt Function
(2 answers)
Closed 2 years ago.
I have a data frame where there's multiple options for a certain index (1-M relationship) - e.g. States as Index and Counties as respective columns. I want to group it in a way that creates just one column but with all the values. This is a basic transformation but somehow I can't get it right.
Sorry I don't know how to insert code that actually is already run so here I present the code to create example DFs as to what I'd like to create.
pd.DataFrame({'INDEX': ['INDEX1','INDEX2','INDEX3'],
'col1': ['a','b','d'],
'col2': ['c','f',np.nan],
'col3': ['e',np.nan,np.nan]})
and I want it to transform it so that I end up with this data frame:
pd.DataFrame({'INDEX': ['INDEX1','INDEX1','INDEX1','INDEX2','INDEX2','INDEX3'],
'col1': ['a','c','e','b','f','d']})
You can use melt here:
df = pd.melt(df, id_vars=['INDEX']).drop(columns=['variable']).dropna()
print(df)
INDEX value
0 INDEX1 a
1 INDEX2 b
2 INDEX3 d
3 INDEX1 c
4 INDEX2 f
6 INDEX1 e
This question already has answers here:
How to shift a column in Pandas DataFrame
(9 answers)
Closed 3 years ago.
In my code I assign a value to a cell in a data frame, based on another value in the same data frame but in another row.
The code, using a for-loop is as follows:
df = pd.DataFrame({'A':[1, 2, 3],'B':[4, 5, 6]})
for i in range(1, df.shape[0]):
df.loc[i, 'C'] = df.loc[i-1, 'B']
Output:
A B C
0 1 4 NaN
1 2 5 4.0
2 3 6 5.0
This code gives me the output I want, but the code is rather slow. I read about df.itterrows and df.apply but I cannot find out how this can work for my code since I refer to other rows. Does anyone know a faster way to iterate over rows, referring to other rows in the pandas data frame?
Change your code this way
df.iloc[i]['C'] = df.iloc[i-1]['B']
This question already has answers here:
Merge two dataframes by index
(7 answers)
Pandas Merging 101
(8 answers)
Closed 5 years ago.
I have two dataframe table :
df1
id A
1 wer
3 dfg
5 dfg
df2
id A
2 fgv
4 sdfsdf
I want to join this to dataframe for one that will look like that:
df3
id A
1 wer
2 fgv
3 dfg
...
df3 = df1.merge(df2,how='outer',sort=True)
There is concat method in pandas that you can use.
df3 = pd.concat([df1, df2])
You can sort index with -
df3 = df3.sort_index()
Or reset index like
df3 = df3.reset_index(drop=True)
I see you have ellipsis (...) at the end of your df3 dataframe if that means continuation in dataframe use above otherwise go for Jibril's answer
This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 6 months ago.
I currently have dataframe at the top. Is there a way to use a groupby function to get another dataframe to group the data and concatenate the words into the format like further below using python pandas?
Thanks
[
You can apply join on your column after groupby:
df.groupby('index')['words'].apply(','.join)
Example:
In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df
Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd
In [327]:
df.groupby('id')['words'].apply(','.join)
Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object
If you want to save even more ink, you don't need to use .apply() since .agg() can take a function to apply to each group:
df.groupby('id')['words'].agg(','.join)
OR
# this way you can add multiple columns and different aggregates as needed.
df.groupby('id').agg({'words': ','.join})