Pandas concat dataframe row seem index

Pandas concat dataframe row seem index - python

I have two dataframe and I will concat but just row will have the seem index
my first dataframe look like this
a b c d e f
20018-01-05 1.702556 -0.885554 0.766257 -0.731700 -1.071232 1.806680
20018-01-06 -0.968689 -0.700311 1.024988 -0.705764 0.804285 -0.337177
20018-01-07 1.249893 -0.613356 1.975736 -0.093838 0.428004 0.634204
20018-01-08 0.430000 0.502100 0.194092 0.588685 -0.507332 1.404635
20018-01-09 1.005721 0.604771 -2.296667 0.157201 1.583537 1.359332
and i will concat this dataframe
g h
20018-01-05 13.702556 -03.885554
20018-01-06 -03.968689 -03.700311
20018-01-07 13.249893 -03.613356
20018-01-22 03.430000 03.502100
20018-01-23 13.005721 03.604771
I would like concat just the tree first line with the seem index in drop others
my final dataframe should look like this
a b c d e f g h
20018-01-05 1.702556 -0.885554 0.766257 -0.731700 -1.071232 1.806680 13.702556 -03.885554
20018-01-06 -0.968689 -0.700311 1.024988 -0.705764 0.804285 -0.337177 -03.968689 -03.700311
20018-01-07 1.249893 -0.613356 1.975736 -0.093838 0.428004 0.634204 13.249893 -03.613356

Try this
>>> pd.concat((df1, df2), axis=1).ix[:2, 1:]
a b ... g h
0 1.702556 -0.885554 ... 13.702556 -03.885554
1 -0.968689 -0.700311 ... -03.968689 -03.700311
2 1.249893 -0.613356 ... 13.249893 -03.613356
[3 rows x 9 columns]

Related

Create list of several datasets

I have several datasets like df_1,df_2,...df_100.
First I want to create a list of these datasets.
df=[df_1,df_2,...,df_100]
This is what I did which did not work for me.
df=[]
for i in range(1,101):
df.append("df_"+str(i))
I need the above one so that I do the following
final=pandas.concat(df,ignore_index=True)
This gives me an error since df is a list of strings, not datasets. I want to create a list of several datasets.
In R, I did the following
final=do.call(rbind,mget(paste0("df_",1:100)))
Is there anything similar in python?

Use built-in functions globals or locals to get variable by name
>>> [globals()[d] for d in df]
Example:
>>> df_1
A B C
9l6rvsotz5 0.209350 -1.360556 0.059560
jTonmSOIVv 1.046584 0.251718 0.567056
eGaK0n8y9N -0.347716 -0.292623 0.591843
>>> df_2
A B C
TIVsJWSDWe -0.169969 0.345766 0.674683
EJjXuhL3pi -0.527015 -1.089954 -1.658116
dm3IYAyC7z 1.653666 -0.203685 -1.441150
>>> df_3
A B C
DbmE1sc3MI 0.215871 -0.382257 0.662477
9qZd6bvPVy 0.150985 0.135556 0.308615
qiVrxD64IF -1.384027 0.765303 -0.734394
>>> df = ["df_{}".format(i) for i in range(1, 4)]
>>> df
['df_1', 'df_2', 'df_3']
>>> pd.concat([globals()[d] for d in df], ignore_index=True)
A B C
0 0.209350 -1.360556 0.059560
1 1.046584 0.251718 0.567056
2 -0.347716 -0.292623 0.591843
3 -0.169969 0.345766 0.674683
4 -0.527015 -1.089954 -1.658116
5 1.653666 -0.203685 -1.441150
6 0.215871 -0.382257 0.662477
7 0.150985 0.135556 0.308615
8 -1.384027 0.765303 -0.734394

groupby and sum two columns and set as one column in pandas

I have the following data frame:
import pandas as pd
data = pd.DataFrame()
data['Home'] = ['A','B','C','D','E','F']
data['HomePoint'] = [3,0,1,1,3,3]
data['Away'] = ['B','C','A','E','D','D']
data['AwayPoint'] = [0,3,1,1,0,0]
i want to groupby the columns ['Home', 'Away'] and change the name as Team. Then i like to sum homepoint and awaypoint as name as Points.
Team Points
A 4
B 0
C 4
D 1
E 4
F 3
How can I do it?
I was trying different approach using the following post:
Link
But I was not able to get the format that I wanted.
Greatly appreciate your advice.
Thanks
Zep.

A simple way is to create two new Series indexed by the teams:
home = pd.Series(data.HomePoint.values, data.Home)
away = pd.Series(data.AwayPoint.values, data.Away)
Then, the result you want is:
home.add(away, fill_value=0).astype(int)
Note that home + away does not work, because team F never played away, so would result in NaN for them. So we use Series.add() with fill_value=0.
A complicated way is to use DataFrame.melt():
goo = data.melt(['HomePoint', 'AwayPoint'], var_name='At', value_name='Team')
goo.HomePoint.where(goo.At == 'Home', goo.AwayPoint).groupby(goo.Team).sum()
Or from the other perspective:
ooze = data.melt(['Home', 'Away'])
ooze.value.groupby(ooze.Home.where(ooze.variable == 'HomePoint', ooze.Away)).sum()

You can concatenate, pairwise, columns of your input dataframe. Then use groupby.sum.
# calculate number of pairs
n = int(len(df.columns)/2)+1)
# create list of pairwise dataframes
df_lst = [data.iloc[:, 2*i:2*(i+1)].set_axis(['Team', 'Points'], axis=1, inplace=False) \
for i in range(n)]
# concatenate list of dataframes
df = pd.concat(df_lst, axis=0)
# perform groupby
res = df.groupby('Team', as_index=False)['Points'].sum()
print(res)
Team Points
0 A 4
1 B 0
2 C 4
3 D 1
4 E 4
5 F 3

How to get the highest values from many columns and show in what rows it happened using pandas?

I have a dataframe from which I want to know the highest value for each column. But I also want to know in what row it happened.
With my code I have to put the name of each column each time. Is there a better way to get all highest values from all columns?
df2.loc[df2['ALL'].idxmax()]
THE DATAFRAME
WHAT I GET WITH MY CODE
WHAT I WANT
THE DATAFRAME

You can stack your frame and then sort the values from largest to smallest and then take the first occurrence of your column names.
First I will create some fake data
df = pd.DataFrame(np.random.rand(10,5), columns=list('abcde'),
index=list('nopqrstuvw'))
df.columns.name = 'level_0'
df.index.name = 'level_1'
Output
level_0 a b c d e
level_1
n 0.417317 0.821350 0.443729 0.167315 0.281859
o 0.166944 0.223317 0.418765 0.226544 0.508055
p 0.881260 0.789210 0.289563 0.369656 0.610923
q 0.893197 0.494227 0.677377 0.065087 0.228854
r 0.394382 0.573298 0.875070 0.505148 0.334238
s 0.046179 0.039642 0.930811 0.326114 0.880804
t 0.143488 0.561449 0.832186 0.486752 0.323215
u 0.891823 0.616401 0.247078 0.497050 0.995108
v 0.888553 0.386260 0.816100 0.874761 0.769073
w 0.557239 0.601758 0.932839 0.274614 0.854063
Now stack, sort and drop all but the first column occurrence
df.stack()\
.sort_values(ascending=False)\
.reset_index()\
.drop_duplicates('level_0')\
.sort_values('level_0')[['level_0', 0, 'level_1']]
level_0 0 level_1
3 a 0.893197 q
12 b 0.821350 n
1 c 0.932839 w
9 d 0.874761 v
0 e 0.995108 u

Summing 3 columns in a dataframe

This should be easy:
I have a data frame with the following columns
a,b,min,w,w_min
all I want to do is sum up the columns min,w,and w_min and read that result into another data frame.
I've looked, but I can not find a previously asked question that directly relates back to this. Everything I've found seems much more complex then what I'm trying to do.

You can just pass a list of cols and select these to perform the summation on:
In [64]:
df = pd.DataFrame(columns=['a','b','min','w','w_min'], data = np.random.randn(10,5) )
df
Out[64]:
a b min w w_min
0 0.626671 0.850726 0.539850 -0.669130 -1.227742
1 0.856717 2.108739 -0.079023 -1.107422 -1.417046
2 -1.116149 -0.013082 0.871393 -1.681556 -0.170569
3 -0.944121 -2.394906 -0.454649 0.632995 1.661580
4 0.590963 0.751912 0.395514 0.580653 0.573801
5 -1.661095 -0.592036 -1.278102 -0.723079 0.051083
6 0.300866 -0.060604 0.606705 1.412149 0.916915
7 -1.640530 -0.398978 0.133140 -0.628777 -0.464620
8 0.734518 1.230869 -1.177326 -0.544876 0.244702
9 -1.300137 1.328613 -1.301202 0.951401 -0.693154
In [65]:
cols=['min','w','w_min']
df[cols].sum()
Out[65]:
min -1.743700
w -1.777642
w_min -0.525050
dtype: float64

DataFrame Subset

I have a dataframe already and am subsetting some of it to another dataframe.
I do that like this:
D = njm[['svntygene', 'intgr', 'lowgr', 'higr', 'lumA', 'lumB', 'wndres', 'nlbrst', 'Erneg', 'basallike']]
I want to try and set it by the integer position though, something like this:
D = njm.iloc[1:, 2:, 3:, 7:]
But I get an error. How would I do this part? Read the docs but could not find a clear answer.
Also, is it possible to pass a list to this as values too?
Thanks.

This is covered in the iloc section of the documentation: you can pass a list with the desired indices.
>>> df = pd.DataFrame(np.random.random((5,5)),columns=list("ABCDE"))
>>> df
A B C D E
0 0.605594 0.229728 0.390391 0.754185 0.516801
1 0.384228 0.106261 0.457507 0.833473 0.786098
2 0.364943 0.664588 0.330835 0.846941 0.229110
3 0.025799 0.681206 0.235821 0.418825 0.878566
4 0.811800 0.761962 0.883281 0.932983 0.665609
>>> df.iloc[:,[1,2,4]]
B C E
0 0.229728 0.390391 0.516801
1 0.106261 0.457507 0.786098
2 0.664588 0.330835 0.229110
3 0.681206 0.235821 0.878566
4 0.761962 0.883281 0.665609

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas concat dataframe row seem index - python

Try this >>> pd.concat((df1, df2), axis=1).ix[:2, 1:] a b ... g h 0 1.702556 -0.885554 ... 13.702556 -03.885554 1 -0.968689 -0.700311 ... -03.968689 -03.700311 2 1.249893 -0.613356 ... 13.249893 -03.613356 [3 rows x 9 columns]

Related

Create list of several datasets

groupby and sum two columns and set as one column in pandas

How to get the highest values from many columns and show in what rows it happened using pandas?

Summing 3 columns in a dataframe

DataFrame Subset

Categories

Resources