How to pivot a dataframe [duplicate] - python

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I have a df that contains two rows and I wanted to use the first row as a header in a new df.
This is what my data looks like:
ver
time
a
2.31
b
3.45
b
3.75
a
2.21
b
3.87
b
4.02
a
1.97
a
3.56
This is what I am trying to get:
a
b
2.31
3.45
2.21
3.75
1.97
3.87
3.56
4.02

Try with cumcount create the key then pivot
out = df.assign(key=df.groupby('ver').cumcount()).pivot('key','ver','time')
ver a b
key
0 2.31 3.45
1 2.21 3.75
2 1.97 3.87
3 3.56 4.02

Related

Can i add new column in DataFrame with Interpolation?

this is my current DataFrame:
Df:
DATA
4.15
4.02
3.70
3.51
3.17
2.95
2.86
NaN
NaN
i alredy know that 4.15(first value) is 100%, 2.86(last value) is 30% and 2.5 is 0%. firstly, i want to interpolate first column the NaN(second last)value based on last NaN is 2.5(this is alredy predfined). after this i want to create second column and interpolate based on first coumn and available these three percentage value.
is it possible?
i have tried this code but it is not giving expected results:
df = pd.DataFrame({'DATA':range(df.DATA.min(), df.DATA.max()+1)}).merge(df, on='DATA', how='left')
df.Voltage = df.Voltage.interpolate()
Expected output:
Df:
DATA %
4.15 100%
4.02 89%
3.70 75%
3.51 70%
3.17 50%
2.95 35%
2.86 30%
2.74 15%
2.5 0%
Your logic is unclear, my understanding is that you want to compute a rank, but the provided output is unclear, please details the computations.
What I would do:
df.loc[df.index[-1], 'DATA'] = 2.5
df['DATA'] = df['DATA'].interpolate()
# compute rank
s = df['DATA'].rank(pct=True)
# rescale to 0-1 and convert to %
df['%'] = ((s-s.min())/(1-s.min())).mul(100)
output:
DATA %
0 4.15 100.0
1 4.02 87.5
2 3.70 75.0
3 3.51 62.5
4 3.17 50.0
5 2.95 37.5
6 2.86 25.0
7 2.68 12.5
8 2.50 0.0

How to calculated pivot table using python

I have a sample table below:
Temperature Voltage Data
25 3.3 2.15
25 3.3 2.21
25 3.3 2.23
25 3.3 2.26
25 3.3 2.19
25 3.45 2.4
25 3.45 2.37
25 3.45 2.42
25 3.45 2.34
25 3.45 2.35
105 3.3 3.2
105 3.3 3.22
105 3.3 3.23
105 3.3 3.24
105 3.3 3.26
105 3.45 3.33
105 3.45 3.32
105 3.45 3.34
105 3.45 3.3
105 3.45 3.36
I would like to calculate the average Data for each Temperature and Voltage case. I could do this in excel by making a pivot table but I would like to learn how to do it in python script so I can automate this data processing part.
Thank you,
Victor
P.S. sorry for the weird format table. I'm not exactly sure how to correctly copy and paste a table in here.
I think the function you need is .groupby() if you are familiar with it:
df.groupby(['Temperature','Voltage'])['Data'].mean()
This will generate a mean value of the value Data for each unique Temperature and Voltage combination. This is an example:
import pandas as pd
data = {
'Temperature': [25,25,25,25,25,25,25,25,25,25,105,105,105,105,105,105,105,105,105,105],
'Voltage': [3.3,3.3,3.3,3.3,3.3,3.45,3.45,3.45,3.45,3.45,3.3,3.3,3.3,3.3,3.3,3.45,3.45,3.45,3.45,3.45],
'Data': [2.15,2.21,2.23,2.26,2.19,2.4,2.37,2.42,2.34,2.35,3.2,3.22,3.23,3.24,3.26,3.33,3.32,3.34,3.3,3.36]
}
df = pd.DataFrame(data)
print(df.groupby(['Temperature','Voltage'])['Data'].mean())
Output:
Temperature Voltage
25 3.30 2.208
3.45 2.376
105 3.30 3.230
3.45 3.330

Plotting Contour plot for a dataframe with x axis as datetime and y axis as depth

I have a dataframe with the indexes as datetime and columns as depths. I would like to plot a contour plot which looks something like the image below. Any ideas how I should go about doing this? I tried using the plt.contour() function but I think I have to sort out the arrays for the data first. I am unsure about this part.
Example of my dataframe:
Datetime -1.62 -2.12 -2.62 -3.12 -3.62 -4.12 -4.62 -5.12
2019-05-24 15:45:00 4.61 5.67 4.86 3.91 3.35 3.07 3.03 2.84
2019-05-24 15:50:00 3.76 4.82 4.13 3.32 2.84 2.40 2.18 1.89
2019-05-24 15:55:00 3.07 3.77 3.23 2.82 2.41 2.21 1.93 1.81
2019-05-24 16:00:00 2.50 2.95 2.63 2.29 1.97 1.73 1.57 1.48
2019-05-24 16:05:00 2.94 3.62 3.23 2.82 2.62 2.31 2.01 1.81
2019-05-24 16:10:00 3.07 3.77 3.23 2.82 2.51 2.31 2.10 1.89
2019-05-24 16:15:00 2.71 3.20 2.86 2.70 2.51 2.31 2.18 1.97
2019-05-24 16:20:00 2.50 3.07 2.86 2.82 2.73 2.50 2.37 2.22
2019-05-24 16:25:00 2.40 3.20 3.10 2.93 2.73 2.50 2.57 2.84
2019-05-24 16:30:00 2.21 2.95 2.86 2.70 2.73 2.72 2.91 3.49
2019-05-24 16:35:00 2.04 2.72 2.63 2.59 2.62 2.72 3.03 3.35
2019-05-24 16:40:00 1.73 2.31 2.33 2.39 2.62 2.95 3.57
Example of the plot I want:
For the X Y Z input in plt.contour(), I would like to find out what structure of data it requires. It says it requires a 2D array structure, but I am confused. How do I get that with my current dataframe?
I have worked out a solution. Note that the X (tt2) - time input, and Y(depth) - depth input, have to match the Z(mat2) matrix dimensions for the plt.contourf to work. I realised plt.contourf produces the image i want rather than plt.contour, which only plots the contour lines.
Example of my code:
tt2 = [...]
depth = [...]
plt.title('SSC Contour Plot')
fig=plt.contourf(tt2,depth,mat2,cmap='jet',levels=
[0,2,4,6,8,10,12,14,16,18,20,22,24,26], extend= "both")
plt.gca().invert_yaxis() #to flip the depth from shallowest to deepest
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d/%m/%Y %H:%M:%S'))
#plt.gca().xticklabels(tt)
cbar = plt.colorbar()
cbar.set_label("mg/l")
yy = len(colls2)
plt.ylim(yy-15,0) #assumming last 10 depth readings have NaN
plt.xlabel("Datetime")
plt.xticks(rotation=45)
plt.ylabel("Depth (m)")
plt.savefig(path+'SSC contour plot.png') #save plot
plt.show()
Example of plot produced

Transposing a column in a pandas dataframe while keeping other column intact with duplicates

My data frame is as follows
selection_id last_traded_price
430494 1.46
430494 1.48
430494 1.56
430494 1.57
430495 2.45
430495 2.67
430495 2.72
430495 2.87
I have lots of rows that contain selection id's and I need to keep selection_id column the same but transpose the data in last traded price to look like this.
selection_id last_traded_price
430494 1.46 1.48 1.56 1.57 e.t.c
430495 2.45 2.67 2.72 2.87 e.t.c
I've tried a to use a pivot
(df.pivot(index='selection_id', columns=last_traded_price', values='last_traded_price')
Pivot isn't working due to duplicate rows in selection_id.
is it possible to transpose the data first and drop the duplicates after?
Option 1
groupby + apply
v = df.groupby('selection_id').last_traded_price.apply(list)
pd.DataFrame(v.tolist(), index=v.index)
0 1 2 3
selection_id
430494 1.46 1.48 1.56 1.57
430495 2.45 2.67 2.72 2.87
Option 2
You can do this with pivot, as long as you have another column of counts to pass for the pivoting (it needs to be pivoted along something, that's why).
df['Count'] = df.groupby('selection_id').cumcount()
df.pivot('selection_id', 'Count', 'last_traded_price')
Count 0 1 2 3
selection_id
430494 1.46 1.48 1.56 1.57
430495 2.45 2.67 2.72 2.87
You can use cumcount for Counter for new columns names created by set_index + unstack or pandas.pivot:
g = df.groupby('selection_id').cumcount()
df = df.set_index(['selection_id',g])['last_traded_price'].unstack()
print (df)
0 1 2 3
selection_id
430494 1.46 1.48 1.56 1.57
430495 2.45 2.67 2.72 2.87
Similar solution with pivot:
df = pd.pivot(index=df['selection_id'],
columns=df.groupby('selection_id').cumcount(),
values=df['last_traded_price'])
print (df)
0 1 2 3
selection_id
430494 1.46 1.48 1.56 1.57
430495 2.45 2.67 2.72 2.87

manipulating more than 2 dataframes

I have 6 different dataframes and I would like to append one after the other .
The only way I find to do so is append 2 each time, although I believe there must be a more efficient way to do this .
I am also looking forward after that to change the index and header names, that I also know how to do one by one, but I also believe there must also be an efficient way to do so.
The last problem I am facing is how to set an index with with the column that is NaN , how shall I refer to it in order to set_index?  
 
df1
 NaN     1      2      3
1   A   17.03   13.41  19.61
7   B   3.42    1.51    5.44
8   C   5.65    2.81    1.89
df2
NaN     1      2      3
1  J   1.60   2.65   1.44
5  H   26.78  27.04  21.06
df3
NaN    1      2      3
1   L   1.20   1.41   2.04
2   M   1.23   1.72   2.47
4   R  66.13  51.49  16.62
5   F     --  46.89  22.35
df4
 NaN    1      2      3
1   A   17.03   13.41  19.61
7   B   3.42    1.51    5.44
8   C   5.65    2.81    1.89
df5
NaN    1      2      3
1  J   1.60   2.65   1.44
5  H   26.78  27.04  21.06
df6
NaN    1      2      3
1   L   1.20   1.41   2.04
2   M   1.23   1.72   2.47
4   R  66.13  51.49  16.62
5   F     --  46.89  22.35
You can use concat, for select NaN column is possible use df.columns[0] with set_index and list comprehension:
dfs = [df1,df2, df3, ...]
df = pd.concat([df.set_index(df.columns[0], append=True) for df in dfs])
print (df)
1 2 3
NaN
1 A 17.03 13.41 19.61
7 B 3.42 1.51 5.44
8 C 5.65 2.81 1.89
1 J 1.6 2.65 1.44
5 H 26.78 27.04 21.06
1 L 1.20 1.41 2.04
2 M 1.23 1.72 2.47
4 R 66.13 51.49 16.62
5 F -- 46.89 22.35
EDIT:
It seems NaN values can be strings:
print (df3.columns)
Index(['NaN', '1', '2', '3'], dtype='object')
dfs = [df1,df2, df3]
df = pd.concat([df.set_index('NaN', append=True) for df in dfs])
print (df)
1 2 3
NaN
1 A 17.03 13.41 19.61
7 B 3.42 1.51 5.44
8 C 5.65 2.81 1.89
1 J 1.6 2.65 1.44
5 H 26.78 27.04 21.06
1 L 1.20 1.41 2.04
2 M 1.23 1.72 2.47
4 R 66.13 51.49 16.62
5 F -- 46.89 22.35
Or if there are np.nan for me works also:
#converting to `NaN` if necessary
#df1.columns = df1.columns.astype(float)
#df2.columns = df2.columns.astype(float)
#df3.columns = df3.columns.astype(float)
dfs = [df1,df2, df3]
df = pd.concat([df.set_index(np.nan, append=True) for df in dfs])
print (df)
1.0 2.0 3.0
nan
1 A 17.03 13.41 19.61
7 B 3.42 1.51 5.44
8 C 5.65 2.81 1.89
1 J 1.6 2.65 1.44
5 H 26.78 27.04 21.06
1 L 1.20 1.41 2.04
2 M 1.23 1.72 2.47
4 R 66.13 51.49 16.62
5 F -- 46.89 22.35

Categories

Resources