Pandas convert some value in row into columns - python

I have a problem in python. The table looks like the following table I have columns values from 1 to 6: the values are random just to show the general idea
time
sensor
sample
value1
value2
value3
value4
value5
value6
22.10
ACCX
6
0.23
0.44
0.53
0.23
0.44
0.53
22.10
ACCY
6
0.87
0.32
0.12
0.87
0.32
0.12
22.10
ACCZ
6
0.44
0.33
0.45
0.63
0.44
0.93
22.12
ACCX
6
0.63
0.44
0.93
0.87
0.32
0.12
22.12
ACCY
6
0.87
0.32
0.12
0.44
0.33
0.45
22.12
ACCZ
6
0.44
0.33
0.45
0.34
0.22
0.78
22.15
ACCX
6
0.23
0.44
0.53
0.64
0.53
0.25
22.15
ACCY
6
0.87
0.32
0.12
0.87
0.32
0.12
22.15
ACCZ
6
0.44
0.33
0.45
0.44
0.33
0.45
22.18
ACCX
6
0.63
0.44
0.93
0.87
0.32
0.12
22.18
ACCY
6
0.87
0.32
0.12
0.44
0.33
0.45
22.18
ACCZ
6
0.44
0.33
0.45
0.87
0.32
0.12
And I need to convert rows that have same the time and sensor to columns. I need all rows with date and sensor to appear like this where the date will be repeated 6 times:
time
ACCX
ACCY
ACCZ
22.10
0.23
0.44
0.23
22.10
0.87
0.32
0.12
22.10
0.44
0.33
0.45
22.10
0.23
0.44
0.23
22.10
0.87
0.32
0.12
22.10
0.44
0.33
0.45
22.12
0.23
0.44
0.53
22.12
0.87
0.32
0.12
22.12
0.44
0.33
0.45
22.12
0.44
0.33
0.45
22.12
0.63
0.44
0.93
22.12
0.87
0.32
0.12
22.15
0.44
0.33
0.45
22.15
0.23
0.44
0.53
22.15
0.87
0.32
0.12
22.15
0.44
0.33
0.45
22.15
0.63
0.44
0.93
22.15
0.87
0.32
0.12
22.18
0.44
0.33
0.45
22.18
0.44
0.33
0.45
22.18
0.63
0.44
0.93
22.18
0.87
0.32
0.12
22.18
0.44
0.33
0.45
22.18
0.44
0.33
0.45

First drag the valueN items into a column (together with the resp. column labels) by .melting the dataframe, then .pivot the sensors into the columns, and do some cleaning up:
res = (
df.drop(columns="sample")
.melt(id_vars=["time", "sensor"])
.pivot(index=["time", "variable"], columns="sensor")
.droplevel(-1).reset_index()
.droplevel(0, axis=1).rename(columns={"": "time"})
)
But: The result for your sample doesn't look like the expected result (the values)?

Related

How to add sum() and mean() value above the df column values in the same line?

Supposed we have a df with a sum() value in the below DataFrame, thanks so much for #jezrael 's answer here, now sum value is in the first line, and avg value is the second line, but it's ugly, how to let sum value and avg value in the same column and with index name:Total? Also place it in the first line as below
# Total 27.56 25.04 -1.31
code in pandas is as below:
df.columns=['value_a','value_b','name','up_or_down','difference']
df1 = df[['value_a','value_b']].sum().to_frame().T
df2 = df[['difference']].mean().to_frame().T
df = pd.concat([df1,df2, df], ignore_index=True)
df
value_a value_b name up_or_down difference
project_name
27.56 25.04
-1.31
2021-project11 0.43 0.48 2021-project11 up 0.05
2021-project1 0.62 0.56 2021-project1 down -0.06
2021-project2 0.51 0.47 2021-project2 down -0.04
2021-porject3 0.37 0.34 2021-porject3 down -0.03
2021-porject4 0.64 0.61 2021-porject4 down -0.03
2021-project5 0.32 0.25 2021-project5 down -0.07
2021-project6 0.75 0.81 2021-project6 up 0.06
2021-project7 0.60 0.60 2021-project7 down 0.00
2021-project8 0.85 0.74 2021-project8 down -0.11
2021-project10 0.67 0.67 2021-project10 down 0.00
2021-project9 0.73 0.73 2021-project9 down 0.00
2021-project11 0.54 0.54 2021-project11 down 0.00
2021-project12 0.40 0.40 2021-project12 down 0.00
2021-project13 0.76 0.77 2021-project13 up 0.01
2021-project14 1.16 1.28 2021-project14 up 0.12
2021-project15 1.01 0.94 2021-project15 down -0.07
2021-project16 1.23 1.24 2021-project16 up 0.01
2022-project17 0.40 0.36 2022-project17 down -0.04
2022-project_11 0.40 0.40 2022-project_11 down 0.00
2022-project4 1.01 0.80 2022-project4 down -0.21
2022-project1 0.65 0.67 2022-project1 up 0.02
2022-project2 0.75 0.57 2022-project2 down -0.18
2022-porject3 0.32 0.32 2022-porject3 down 0.00
2022-project18 0.91 0.56 2022-project18 down -0.35
2022-project5 0.84 0.89 2022-project5 up 0.05
2022-project19 0.61 0.48 2022-project19 down -0.13
2022-project6 0.77 0.80 2022-project6 up 0.03
2022-project20 0.63 0.54 2022-project20 down -0.09
2022-project8 0.59 0.55 2022-project8 down -0.04
2022-project21 0.58 0.54 2022-project21 down -0.04
2022-project10 0.76 0.76 2022-project10 down 0.00
2022-project9 0.70 0.71 2022-project9 up 0.01
2022-project22 0.62 0.56 2022-project22 down -0.06
2022-project23 2.03 1.74 2022-project23 down -0.29
2022-project12 0.39 0.39 2022-project12 down 0.00
2022-project24 1.35 1.55 2022-project24 up 0.20
project25 0.45 0.42 project25 down -0.03
project26 0.53 NaN project26 down NaN
project27 0.68 NaN project27 down NaN
Thanks so much for any advice
Use DataFrame.agg with dictionary for aggregate functions:
df.columns=['value_a','value_b','name','up_or_down','difference']
df1 = df.agg({'value_a':'sum', 'value_b':'sum', 'difference':'mean'}).to_frame('Total').T
df = pd.concat([df1,df])
print (df.head())
value_a value_b difference name up_or_down
Total 27.56 25.04 -0.035405 NaN NaN
2021-project11 0.43 0.48 0.050000 2021-project11 up
2021-project1 0.62 0.56 -0.060000 2021-project1 down
2021-project2 0.51 0.47 -0.040000 2021-project2 down
2021-porject3 0.37 0.34 -0.030000 2021-porject3 down

Python(Pandas) - Create a column by matching column's values into dataframe

I have the below assumed dataframe
a b c d e F
0.02 0.62 0.31 0.67 0.27 a
0.30 0.07 0.23 0.42 0.00 a
0.82 0.59 0.34 0.73 0.29 a
0.90 0.80 0.13 0.14 0.07 d
0.50 0.62 0.94 0.34 0.53 d
0.59 0.84 0.95 0.42 0.54 d
0.13 0.33 0.87 0.20 0.25 d
0.47 0.37 0.84 0.69 0.28 e
Column F represents the columns of the dataframe.
For each row of column F I want to find relevant row and column from the rest of the dataframe and return the values into one column
The outcome will look like this:
a b c d e f To_Be_Filled
0.02 0.62 0.31 0.67 0.27 a 0.02
0.30 0.07 0.23 0.42 0.00 a 0.30
0.82 0.59 0.34 0.73 0.29 a 0.82
0.90 0.80 0.13 0.14 0.07 d 0.14
0.50 0.62 0.94 0.34 0.53 d 0.34
0.59 0.84 0.95 0.42 0.54 d 0.42
0.13 0.33 0.87 0.20 0.25 d 0.20
0.47 0.37 0.84 0.69 0.28 e 0.28
I am able to identify each case with the below, but not sure how to do it across the whole dataframe.
test.loc[test.iloc[:,5]==a,test.columns==a]
Many thanks in advance.
You can use lookup:
df['To_Be_Filled'] = df.lookup(np.arange(len(df)), df['F'])
df
Out:
a b c d e F To_Be_Filled
0 0.02 0.62 0.31 0.67 0.27 a 0.02
1 0.30 0.07 0.23 0.42 0.00 a 0.30
2 0.82 0.59 0.34 0.73 0.29 a 0.82
3 0.90 0.80 0.13 0.14 0.07 d 0.14
4 0.50 0.62 0.94 0.34 0.53 d 0.34
5 0.59 0.84 0.95 0.42 0.54 d 0.42
6 0.13 0.33 0.87 0.20 0.25 d 0.20
7 0.47 0.37 0.84 0.69 0.28 e 0.28
np.arange(len(df)) can be replaced with df.index.

Select columns from a DataFrame based on values in a row in pandas

Say I have the same dataframe from this question:
A0 A1 A2 B0 B1 B2 C0 C1
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72
But instead of wanting to return the minimum value of each row (of only B0, B1, B2)
A0 A1 A2 B0 B1 B2 C0 C1 Minimum
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 0.42
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 0.00
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 0.51
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 0.51
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 0.17
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 0.01
I want to return the column name which contains the minimum value of each row (of only B0, B1, B2):
A0 A1 A2 B0 B1 B2 C0 C1 col_of_min
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 B2
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 B0
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 B0
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 B2
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 B0
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 B2
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 B1
What's the best way to do this?
you can use filter() in conjunction with idxmin() method:
In [40]: x
Out[40]:
A0 A1 A2 B0 B1 B2 C0 C1
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72
In [41]: x['col_of_min'] = x.filter(like='B').idxmin(axis=1)
In [42]: x
Out[42]:
A0 A1 A2 B0 B1 B2 C0 C1 col_of_min
0 0.84 0.47 0.55 0.46 0.76 0.42 0.24 0.75 B2
1 0.43 0.47 0.93 0.39 0.58 0.83 0.35 0.39 B0
2 0.12 0.17 0.35 0.00 0.19 0.22 0.93 0.73 B0
3 0.95 0.56 0.84 0.74 0.52 0.51 0.28 0.03 B2
4 0.73 0.19 0.88 0.51 0.73 0.69 0.74 0.61 B0
5 0.18 0.46 0.62 0.84 0.68 0.17 0.02 0.53 B2
6 0.38 0.55 0.80 0.87 0.01 0.88 0.56 0.72 B1

How to realize the probability marginalize function using DataFrame in pandas?

I have a probability table like this:
BC_array =[np.array(['B=n','B=m','B=s','B=n','B=m','B=s']),np.array(['C=F', 'C=F', 'C=F', 'C=T', 'C=T', 'C=T'])]
pD_BC_array=np.array([[0.9,0.8,0.1,0.3,0.4,0.01],[0.08,0.17,0.01,0.05,0.05,0.01],[0.01,0.01,0.87,0.05,0.15,0.97],[0.01,0.02,0.02,0.6,0.4,0.01]])
pD_BC=pd.DataFrame(pD_BC_array,index=['D=h','D=c','D=s','D=r'],columns=BC_array)
B=n B=m B=s B=n B=m B=s
C=F C=F C=F C=T C=T C=T
D=h 0.90 0.80 0.10 0.30 0.40 0.01
D=c 0.08 0.17 0.01 0.05 0.05 0.01
D=s 0.01 0.01 0.87 0.05 0.15 0.97
D=r 0.01 0.02 0.02 0.60 0.40 0.01
How could I marginalize 'C'(sum up all the 'C=F' and 'C=T' together) and get table:
B=n B=m B=s
D=h 1.20 1.20 0.11
D=c 0.13 0.22 0.02
D=s 0.06 0.16 1.84
D=r 0.61 0.42 0.03
like this?
You can call sum on the df and pass params axis=1 for row-wise and level=0 to sum along that level:
In [259]:
pD_BC.sum(axis=1, level=0)
Out[259]:
B=m B=n B=s
D=h 1.20 1.20 0.11
D=c 0.22 0.13 0.02
D=s 0.16 0.06 1.84
D=r 0.42 0.61 0.03

A python random function acts differently when assigned to a list or called directly

I have a python function that randomize a dictionary representing a position specific scoring matrix.
for example:
mat = {
'A' : [ 0.53, 0.66, 0.67, 0.05, 0.01, 0.86, 0.03, 0.97, 0.33, 0.41, 0.26 ]
'C' : [ 0.14, 0.04, 0.13, 0.92, 0.99, 0.04, 0.94, 0.00, 0.07, 0.23, 0.35 ]
'T' : [ 0.25, 0.07, 0.01, 0.01, 0.00, 0.04, 0.00, 0.03, 0.06, 0.12, 0.14 ]
'G' : [ 0.08, 0.23, 0.20, 0.02, 0.00, 0.06, 0.04, 0.00, 0.54, 0.24, 0.25 ]
}
The scambling function:
def scramble_matrix(matrix, iterations):
mat_len = len(matrix["A"])
pos1 = pos2 = 0
for count in range(iterations):
pos1,pos2 = random.sample(range(mat_len), 2)
#suffle the matrix:
for nuc in matrix.keys():
matrix[nuc][pos1],matrix[nuc][pos2] = matrix[nuc][pos2],matrix[nuc][pos1]
return matrix
def print_matrix(matrix):
for nuc in matrix.keys():
print nuc+"[",
for count in matrix[nuc]:
print "%.2f"%count,
print "]"
now to the problem...
When I try to scramble a matrix directly, It's works fine:
print_matrix(mat)
print ""
print_matrix(scramble_matrix(mat,10))
gives:
A[ 0.53 0.66 0.67 0.05 0.01 0.86 0.03 0.97 0.33 0.41 0.26 ]
C[ 0.14 0.04 0.13 0.92 0.99 0.04 0.94 0.00 0.07 0.23 0.35 ]
T[ 0.25 0.07 0.01 0.01 0.00 0.04 0.00 0.03 0.06 0.12 0.14 ]
G[ 0.08 0.23 0.20 0.02 0.00 0.06 0.04 0.00 0.54 0.24 0.25 ]
A[ 0.41 0.97 0.03 0.86 0.53 0.66 0.33.05 0.67 0.26 0.01 ]
C[ 0.23 0.00 0.94 0.04 0.14 0.04 0.07 0.92 0.13 0.35 0.99 ]
T[ 0.12 0.03 0.00 0.04 0.25 0.07 0.06 0.01 0.01 0.14 0.00 ]
G[ 0.24 0.00 0.04 0.06 0.08 0.23 0.54 0.02 0.20 0.25 0.00 ]
but when I try to assign this scrambling to a list , it does not work!!! ...
print_matrix(mat)
s=[]
for x in range(3):
s.append(scramble_matrix(mat,10))
for matrix in s:
print ""
print_matrix(matrix)
result:
A[ 0.53 0.66 0.67 0.05 0.01 0.86 0.03 0.97 0.33 0.41 0.26 ]
C[ 0.14 0.04 0.13 0.92 0.99 0.04 0.94 0.00 0.07 0.23 0.35 ]
T[ 0.25 0.07 0.01 0.01 0.00 0.04 0.00 0.03 0.06 0.12 0.14 ]
G[ 0.08 0.23 0.20 0.02 0.00 0.06 0.04 0.00 0.54 0.24 0.25 ]
A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ]
C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ]
T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ]
G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ]
A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ]
C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ]
T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ]
G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ]
A[ 0.01 0.66 0.97 0.67 0.03 0.05 0.33 0.53 0.26 0.41 0.86 ]
C[ 0.99 0.04 0.00 0.13 0.94 0.92 0.07 0.14 0.35 0.23 0.04 ]
T[ 0.00 0.07 0.03 0.01 0.00 0.01 0.06 0.25 0.14 0.12 0.04 ]
G[ 0.00 0.23 0.00 0.20 0.04 0.02 0.54 0.08 0.25 0.24 0.06 ]
What is the problem???
Why the scrambling do not work after the first time, and all the list filled with the same matrix?!
Your scrambling function is modifying the existing matrix, it is not creating a new one.
You create a matrix, scramble it and add it to a list. Then you scramble it again and add it again to the list. Both elements of the list contain now the same matrix object, which got scrambled twice.
You are shuffling the same matrix in-place for 3 times. But you really want to shuffle 3 copies of original matrix. So you should do:
from copy import deepcopy
print_matrix(mat)
s=[]
for x in range(3):
s.append(scramble_matrix(deepcopy(mat),10)) # note the deepcopy()
for matrix in s:
print ""
print_matrix(matrix)

Categories

Resources