I have process where the end product is a Pandas DF where the output, which is variable in terms of data and length, is structured like this example of the output.
9 80340796
10 80340797
11 80340798
12 80340799
13 80340800
14 80340801
15 80340802
16 80340803
17 80340804
18 80340805
19 80340806
20 80340807
21 80340808
22 80340809
23 80340810
24 80340811
25 80340812
26 80340813
27 80340814
28 80340815
29 80340816
30 80340817
31 80340818
32 80340819
33 80340820
34 80340821
35 80340822
36 80340823
37 80340824
38 80340825
39 80340826
40 80340827
41 80340828
42 80340829
43 80340830
44 80340831
45 80340832
46 80340833
I need to get the numbers in the second column above, into the following grid format based on the numbers in the first column above.
1 2 3 4 5 6 7 8 9 10 11 12
A 1 9 17 25 33 41 49 57 65 73 81 89
B 2 10 18 26 34 42 50 58 66 74 82 90
C 3 11 19 27 35 43 51 59 67 75 83 91
D 4 12 20 28 36 44 52 60 68 76 84 92
E 5 13 21 29 37 45 53 61 69 77 85 93
F 6 14 22 30 38 46 54 62 70 78 86 94
G 7 15 23 31 39 47 55 63 71 79 87 95
H 8 16 24 32 40 48 56 64 72 80 88 96
So the end result in this example would be
Any advice on how to go about this would be much appreciated. I've been asked for this by a colleague, so the data is easy to read for their team (as it matches the layout of a physical test) but I have no idea how to produce it.
pandas pivot table, can do what you want in your question, but first you have to create 2 auxillary columns, 1 determing which column the value has to go in, another which row it is. You can get that as shown in the following example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'num': list(range(9, 28)), 'val': list(range(80001, 80020))})
max_rows = 8
df['row'] = (df['num']-1)%8
df['col'] = np.ceil(df['num']/8).astype(int)
df.pivot_table(values=['val'], columns=['col'], index=['row'])
val
col 2 3 4
row
0 80001.0 80009.0 80017.0
1 80002.0 80010.0 80018.0
2 80003.0 80011.0 80019.0
3 80004.0 80012.0 NaN
4 80005.0 80013.0 NaN
5 80006.0 80014.0 NaN
6 80007.0 80015.0 NaN
7 80008.0 80016.0 NaN
Related
I have 12 columns filled with wages. I want to calculate the mean but my output is 12 different means from each column, but I want one mean which is calculated with the whole dataset as one.
This is how my df looks:
Month 1 Month 2 Month 3 Month 4 ... Month 9 Month 10 Month 11 Month 12
0 1429.97 2816.61 2123.29 2123.29 ... 2816.61 2816.61 1429.97 1776.63
1 3499.53 3326.20 3499.53 2112.89 ... 1939.56 2806.21 2632.88 2459.55
2 2599.95 3119.94 3813.26 3466.60 ... 3466.60 3466.60 2946.61 2946.61
3 2599.95 2946.61 3466.60 2773.28 ... 2253.29 3119.94 1906.63 2773.28
I used this code to calculate the mean:
mean = df.mean()
Do i have to convert these 12 columns into one column or how can i calculate one mean?
Just call the mean again to get the mean of those 12 values:
df.mean().mean()
Use numpy.mean with convert values to 2d array:
mean = np.mean(df.to_numpy())
print (mean)
2914.254166666667
Or use DataFrame.melt:
mean = df.melt()['value'].mean()
print (mean)
2914.254166666666
You can also use stack:
df.stack().mean()
Suppose this dataframe:
>>> df
A B C D E F G H
0 60 1 59 25 8 27 34 43
1 81 48 32 30 60 3 90 22
2 66 15 21 5 23 36 83 46
3 56 42 14 86 41 64 89 56
4 28 53 89 89 52 13 12 39
5 64 7 2 16 91 46 74 35
6 81 81 27 67 26 80 19 35
7 56 8 17 39 63 6 34 26
8 56 25 26 39 37 14 41 27
9 41 56 68 38 57 23 36 8
>>> df.stack().mean()
41.6625
I want to calculate the derivatives of curvatures on surface using vtk and python. I first calculate the curvatures using:
curvatures = vtk.vtkCurvatures()
curvatures.SetInputConnection(reader.GetOutputPort())
curvatures.SetCurvatureTypeToGaussian()
and calculate the derivative of curvatures using:
Derivativers = vtk.vtkCellDerivatives()
Derivativers.SetInputConnection(curvatures.GetOutputPort())
It seems that the results are the same with vtkCurvatures and vtkCellDerivatives.
What should I do to get the derivative of curvature on a surface. Many thanks!
I think your code is correct as it is. But we need to be sure that the curvature point data array is the currently active scalar array. I have attached a input data file that you can save with name 'Test.vtk'. It has two point data arrays -- PointIds (a scalar array) and PointNormals( a vector array). Then we will calculate Gaussian curvatures which will become the third array of scalars of the point data. We will print names of all the point data arrays irrespective of whether they are scalars or vectors. Then we will explicitly set the 'Gauss_Curvature' scalar array as the Active Scalar. We will compute Cell Derivatives which will create a Cell Data Vector array called 'ScalarGradient' which will be the gradient of the curvatures. This will be saved in a file 'Output.vtk'
import vtk
rd = vtk.vtkPolyDataReader()
rd.SetFileName('Test.vtk')
curv = vtk.vtkCurvatures()
curv.SetInputConnection(rd.GetOutputPort())
curv.SetCurvatureTypeToGaussian()
curv.Update()
pd = curv.GetOutput()
for i in range(pd.GetPointData().GetNumberOfArrays()):
print(pd.GetPointData().GetArrayName(i))
# This will print the following:
# PointIds
# PointNormals
# Gauss_Curvature
# To set the active scalar to Gauss_Curvature
pd.GetPointData().SetActiveScalars('Gauss_Curvature')
curvdiff = vtk.vtkCellDerivatives()
curvdiff.SetInputData(pd)
curvdiff.SetVectorModeToComputeGradient()
curvdiff.Update()
writer = vtk.vtkPolyDataWriter()
writer.SetFileName('Output.vtk')
writer.SetInputConnection(curvdiff.GetOutputPort())
writer.Write()
gives me the following outputs -- first for the curvature and then the gradient. Notice that the color scale in the two figures are different. So the curvature and derivative values are different although the color scheme makes them look similar.
In case you want to reproduce the results, the input vtk file is as below
# vtk DataFile Version 4.2
vtk output
ASCII
DATASET POLYDATA
POINTS 72 double
2.0927648978 0.33091989273 -0.39812666792 1.6450815105 0.64303293033 -1.236079764 1.7000810807 1.2495041516 -0.44287861593
1.0622264471 1.4540269048 -1.1853937884 0.8533187462 0.72833963362 -1.8409362444 0.161573121 1.415272931 -1.6182009866
-0.4682233113 2.0970647997 -0.17539653223 0.30090053169 1.9778473 -0.80327873468 -0.62604403311 1.746197318 -1.0984268611
0.62604948422 1.746195345 1.0984268742 0.4682298575 2.0970633231 0.17539654742 -0.30089435724 1.9778482191 0.80327874624
1.3794219731 1.1031586743 1.2360880686 1.9321437012 0.84755424016 0.44288858377 1.3329709879 1.6469225081 0.39813606858
-1.3329658439 1.6469266769 -0.39813605266 -1.3794185207 1.1031629885 -1.2360880529 -1.9321410548 0.84756028031 -0.44288857482
-0.16156870247 1.4152734137 1.6182009959 -1.0622219128 1.4540302146 1.1853938087 -0.85331647216 0.72834227646 1.8409362479
-1.7000771766 1.2495094572 0.44287862867 -2.0927638628 0.33092642637 0.39812667143 -1.6450795106 0.64303805991 1.2360797754
0.10502897512 0.5677157381 2.0771002606 -0.54417928828 -0.19289519204 2.0770984773 0.43913323132 -0.37482057542 2.077101172
1.0574135878 0.37481822068 1.8409414841 1.3064404335 -0.56771795917 1.6182050108 1.7903331906 0.19289323113 1.1854016225
-0.72812102639 -1.6469234624 1.18539471 -0.20411225533 -1.1031605232 1.8409380189 -1.1448850389 -0.84755547744 1.6181982897
0.26564737208 -1.7461967516 1.236085002 -0.23207016686 -2.0970637037 0.44288263714 0.75978960067 -1.9778489401 0.39813448025
1.1992202745 -1.4152750453 1.0984284306 1.5819944619 -1.4540310306 0.17539958384 1.8633106814 -0.72834386503 0.80328466622
-1.825278792 -0.33092031521 1.0984201446 -2.0502257619 -0.64303229501 0.17538963068 -1.5624229303 -1.2495043655 0.80327527281
-0.26565282447 -1.7461959014 -1.2360850131 0.23206361633 -2.0970644256 -0.44288265596 -0.7597957797 -1.977846564 -0.39813449851
-1.1992246997 -1.4152712955 -1.0984284473 -1.5819990123 -1.4540260972 -0.17539960215 -1.8633129661 -0.72833804688 -0.80328468018
0.20410881451 -1.1031611451 -1.8409380327 1.1448823984 -0.84755903977 -1.6181983017 0.72811588321 -1.6469257176 -1.1853947189
2.0502237661 -0.64303869999 -0.17538964133 1.5624190405 -1.2495092418 -0.80327529169 1.8252777661 -0.33092600698 -1.0984201511
-0.43913440065 -0.37481918558 -2.0771011678 -0.10502720377 0.56771608521 -2.0771002475 0.54417868626 -0.19289687027 -2.0770984714
-1.3064422115 -0.56771386838 -1.6182050202 -1.7903325818 0.19289882961 -1.185401614 -1.057412421 0.3748215375 -1.8409414839
-0.76083174443 1.3178134523 -1.9919051229 -0.7608358562 -1.3178110596 -1.9919051353 -2.4621262785 3.8465962003e-06 -0.47023127203
1.5216839818 -2.3645462409e-06 -1.991898872 2.4621262803 -3.846902628e-06 0.47023127288 1.2310617434 -2.1322669408 -0.47022115796
-1.2310684033 -2.1322631023 0.47022113869 -1.5216839821 2.3661982943e-06 1.9918988726 0.76083174316 -1.3178134534 1.9919051234
0.76083585779 1.317811059 1.9919051359 -1.2310617441 2.1322669425 0.47022115881 1.2310684021 2.1322631008 -0.47022113785
POLYGONS 140 560
3 12 14 9
3 27 69 24
3 70 21 19
3 1 53 63
3 2 14 13
3 38 36 37
3 28 68 36
3 39 67 23
3 64 38 51
3 13 14 12
3 20 24 18
3 34 35 33
3 40 41 39
3 16 58 17
3 20 18 19
3 26 27 24
3 11 6 70
3 10 14 71
3 22 39 23
3 6 10 7
3 3 5 7
3 29 64 13
3 41 30 32
3 57 45 47
3 54 61 57
3 66 30 41
3 50 43 42
3 30 33 31
3 33 35 36
3 65 37 35
3 37 36 35
3 26 68 28
3 68 33 36
3 27 28 29
3 28 36 38
3 29 28 38
3 38 37 51
3 61 48 42
3 37 65 52
3 66 34 30
3 43 65 35
3 32 30 31
3 30 34 33
3 40 39 22
3 41 32 39
3 66 41 46
3 32 67 39
3 67 32 25
3 33 68 31
3 32 31 25
3 31 26 25
3 27 26 28
3 26 31 68
3 64 29 38
3 12 69 27
3 18 9 11
3 69 12 9
3 18 24 69
3 20 67 25
3 26 24 25
3 24 20 25
3 13 12 29
3 12 27 29
3 18 11 19
3 11 9 10
3 69 9 18
3 9 14 10
3 70 6 15
3 11 10 6
3 10 71 7
3 71 14 2
3 70 15 21
3 6 8 15
3 21 17 22
3 15 8 16
3 20 23 67
3 19 11 70
3 21 23 19
3 23 20 19
3 22 17 62
3 22 23 21
3 15 17 21
3 62 40 22
3 58 57 47
3 62 17 58
3 62 47 40
3 58 16 59
3 15 16 17
3 6 7 8
3 16 60 59
3 48 54 56
3 8 5 60
3 2 1 3
3 7 5 8
3 3 1 4
3 2 3 71
3 71 3 7
3 3 4 5
3 5 55 60
3 49 50 48
3 8 60 16
3 60 55 59
3 4 55 5
3 54 57 59
3 1 63 4
3 56 55 4
3 49 48 56
3 44 45 42
3 63 56 4
3 48 61 54
3 56 54 55
3 54 59 55
3 59 57 58
3 47 62 58
3 40 46 41
3 57 61 45
3 47 45 46
3 43 34 44
3 47 46 40
3 46 44 66
3 37 52 51
3 42 48 50
3 42 43 44
3 43 35 34
3 45 44 46
3 44 34 66
3 61 42 45
3 50 65 43
3 65 50 52
3 56 63 49
3 51 52 53
3 49 63 53
3 50 49 52
3 49 53 52
3 2 0 1
3 1 0 53
3 0 51 53
3 0 64 51
3 13 64 0
3 2 13 0
POINT_DATA 72
SCALARS PointIds vtkIdType
LOOKUP_TABLE default
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71
NORMALS PointNormals double
0.94738623196 0.18727650058 -0.25958975291 0.78313719053 0.35367076216 -0.51148131227 0.83545291047 0.50824408436 -0.20906072109
0.47898857295 0.62402000487 -0.61738884061 0.34465195337 0.40584589543 -0.84646567573 0.15649087604 0.66776200195 -0.72773931766
-0.15609353126 0.97764567412 -0.14086782943 0.059136449433 0.91410106494 -0.40115099829 -0.27742338135 0.85504231805 -0.43810832201
0.27739675558 0.85505949665 0.43809165386 0.1561128187 0.97764027026 0.14088395868 -0.05910174957 0.91410169764 0.40115467037
0.6978536347 0.50139725414 0.51146954756 0.85786633279 0.46941626794 0.20907826874 0.63588503517 0.72681646701 0.25959207486
-0.63587825439 0.72682167945 -0.25959409059 -0.69785435483 0.50138010962 -0.51148537136 -0.85787788306 0.46940090711 -0.20906536337
-0.15651072102 0.66775823558 0.72773850593 -0.47897825964 0.62400201859 0.61741502054 -0.34463970914 0.40587327082 0.84645753521
-0.8354604399 0.50822639825 0.20907362693 -0.94738511041 0.18728497536 0.25958773195 -0.78315168201 0.35366155935 0.51146548701
-0.0042059530133 0.19834561529 0.98012311821 -0.16967339936 -0.10281294988 0.98012266318 0.17387129188 -0.095532679284 0.98012360499
0.52381294065 0.095528092331 0.84645991446 0.65654796659 -0.19833859028 0.72774073074 0.77988819356 0.10280486141 0.61741846913
-0.30091501143 -0.72680688867 0.6174155023 -0.17918019275 -0.50140137515 0.8464579845 -0.50004323045 -0.46941965564 0.72773755886
0.085293841035 -0.8550588207 0.51146786196 -0.022407612419 -0.97764266904 0.20907584885 0.31149794024 -0.91410144601 0.25959117785
0.60180306843 -0.66776342878 0.4380925359 0.76860432156 -0.62401813766 0.14088563004 0.82118451863 -0.40586818759 0.40115707731
-0.87920342686 -0.18729607405 0.43808847833 -0.92471829912 -0.35362213463 0.14088098936 -0.76208606823 -0.50823351579 0.40115273655
-0.08527862936 -0.85505086473 -0.51148369876 0.022426691996 -0.97764498848 -0.20906295701 -0.31150584284 -0.91409817614 -0.25959320919
-0.60177487516 -0.6677778914 -0.43810921854 -0.76861864432 -0.62400413386 -0.14086951598 -0.82120131764 -0.40583781416 -0.40115341765
0.17915035572 -0.50139827931 -0.84646613373 0.5000564084 -0.4694043424 -0.72773838139 0.30092542605 -0.72682480482 -0.61738933507
0.92471333385 -0.3536415386 -0.14086487277 0.7620681671 -0.50826324483 -0.40114907784 0.8792018588 -0.18726442077 -0.43810515655
-0.17386795802 -0.09554490851 -0.98012300434 0.0041936861197 0.19834885542 -0.98012251507 0.16968233044 -0.10280393451 -0.9801220627
-0.65654129971 -0.19835764899 -0.72774155087 -0.77990892195 0.10280480888 -0.61739229403 -0.52379534825 0.095552395563 -0.84646805779
-0.3035934934 0.52568869256 -0.79465866212 -0.30345974591 -0.52576590314 -0.79465866742 -0.98224561547 8.0181630296e-06 -0.18759944248
0.60705977261 7.7220189155e-05 -0.79465616874 0.98224726985 9.1023794904e-07 0.18759078032 0.49111663606 -0.85065410882 -0.18759540755
-0.49112361722 -0.85065199031 0.18758673723 -0.60706560357 -2.8418937296e-07 0.79465171803 0.30352977695 -0.52573221141 0.79465421184
0.30352929122 0.52573248433 0.79465421681 -0.49112519531 0.85065107744 0.18758674521 0.49113052171 0.85064609373 -0.18759539936
import random
random.sample(range(1, 100), 10)
df = pd.DataFrame({"A": random.sample(range(1, 100), 10),
"B":random.sample(range(1, 100), 10),
"C":random.sample(range(1, 100), 10)})
df["D"]="need_to_calc"
df
I need the value of Column D, Row 9 to equal the average of the block of cells from rows 6 through 8 across columns A through C. I want to do this for all rows.
I am not sure how to do this in a single pythonic action. Instead I have hacky temporary columns and ugly nonsense.
Is there a cleaner way to define this column without temporary tables?
You can do it like this:
means = df.rolling(3).mean().shift(1)
df['D'] = (means['A'] + means['B'] + means['C'])/3
Output:
A B C D
0 43 57 15 NaN
1 86 34 68 NaN
2 40 12 78 NaN
3 97 24 54 48.111111
4 90 42 10 54.777778
5 34 54 98 49.666667
6 98 36 31 55.888889
7 16 5 24 54.777778
8 35 53 67 44.000000
9 80 66 37 40.555556
You can do it so:
df["D"]= (df.sum(axis=1).rolling(window=3, min_periods=3).sum()/9).shift(1)
Example:
A B C D
0 62 89 12 need_to_calc
1 44 13 63 need_to_calc
2 28 21 54 need_to_calc
3 93 93 4 need_to_calc
4 95 84 42 need_to_calc
5 68 68 35 need_to_calc
6 3 92 56 need_to_calc
7 13 88 83 need_to_calc
8 22 37 23 need_to_calc
9 64 58 5 need_to_calc
Output:
A B C D
0 62 89 12 NaN
1 44 13 63 NaN
2 28 21 54 NaN
3 93 93 4 42.888889
4 95 84 42 45.888889
5 68 68 35 57.111111
6 3 92 56 64.666667
7 13 88 83 60.333333
8 22 37 23 56.222222
9 64 58 5 46.333333
I just had a quick question. How would one go about getting the last cell value of an excel spreadsheet when working with it as a dataframe using pandas, for every single different column. I'm having quite some difficulty with this, I know the index can be found with len(), but I can't quite wrap my finger around it. Thank you any help would be greatly appreciated.
If you want the last cell of a dataframe meaning the most bottom right cell, then you can use .iloc:
df = pd.DataFrame(np.arange(1,101).reshape((10,-1)))
df
Output:
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10
1 11 12 13 14 15 16 17 18 19 20
2 21 22 23 24 25 26 27 28 29 30
3 31 32 33 34 35 36 37 38 39 40
4 41 42 43 44 45 46 47 48 49 50
5 51 52 53 54 55 56 57 58 59 60
6 61 62 63 64 65 66 67 68 69 70
7 71 72 73 74 75 76 77 78 79 80
8 81 82 83 84 85 86 87 88 89 90
9 91 92 93 94 95 96 97 98 99 100
Use .iloc with -1 index selection on both rows and columns.
df.iloc[-1,-1]
Output:
100
DataFrame.head(n) gets the top n results from the dataframe. DataFrame.tail(n) gets the bottom n results from the dataframe.
If your dataframe is named df, you could use df.tail(1) to get the last row of the dataframe. The returned value is also a dataframe.
I am trying to randomize all rows in a data frame except for the first. I would like for the first row to always appear first, and the remaining rows can be in any randomized order.
My data frame is:
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
Any suggestions as to how I can approach this?
try this:
df = pd.concat([df[:1], df[1:].sample(frac=1)]).reset_index(drop=True)
test:
In [38]: df
Out[38]:
a b c d e
0 2.070074 2.216060 -0.015823 0.686516 -0.738393
1 -1.213517 0.994057 0.634805 0.517844 -0.128375
2 0.937532 0.814923 -0.231120 1.970019 1.438927
3 1.499967 0.105707 1.255207 0.929084 -3.359826
4 0.418702 -0.894226 -1.088968 0.631398 0.152026
5 1.214119 -0.122633 0.983818 -0.445202 -0.807955
6 0.252078 -0.258703 -0.445209 -0.179094 1.180077
7 1.428827 -0.569009 -0.718485 0.161108 1.300349
8 -1.403100 2.154548 -0.492264 -0.544538 -0.061745
9 0.468671 0.004839 -0.738240 -0.385624 -0.532640
In [39]: df = pd.concat([df[:1], df[1:].sample(frac=1)]).reset_index(drop=True)
In [40]: df
Out[40]:
a b c d e
0 2.070074 2.216060 -0.015823 0.686516 -0.738393
1 0.468671 0.004839 -0.738240 -0.385624 -0.532640
2 0.418702 -0.894226 -1.088968 0.631398 0.152026
3 -1.213517 0.994057 0.634805 0.517844 -0.128375
4 1.428827 -0.569009 -0.718485 0.161108 1.300349
5 0.937532 0.814923 -0.231120 1.970019 1.438927
6 0.252078 -0.258703 -0.445209 -0.179094 1.180077
7 1.499967 0.105707 1.255207 0.929084 -3.359826
8 -1.403100 2.154548 -0.492264 -0.544538 -0.061745
9 1.214119 -0.122633 0.983818 -0.445202 -0.807955
Use numpy's shuffle
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(100).reshape(20, 5), columns=list('ABCDE'))
np.random.shuffle(df.values[1:, :])
print df
A B C D E
0 0 1 2 3 4
1 55 56 57 58 59
2 10 11 12 13 14
3 80 81 82 83 84
4 90 91 92 93 94
5 70 71 72 73 74
6 25 26 27 28 29
7 40 41 42 43 44
8 65 66 67 68 69
9 5 6 7 8 9
10 45 46 47 48 49
11 85 86 87 88 89
12 15 16 17 18 19
13 30 31 32 33 34
14 60 61 62 63 64
15 20 21 22 23 24
16 35 36 37 38 39
17 95 96 97 98 99
18 75 76 77 78 79
19 50 51 52 53 54
np.random.shuffle shuffles an ndarray in place. The dataframe is just a wrapper on an ndarray. You can access that ndarray with the values attribute. To specify that all but the first row get shiffled, operate on the array slice [1:, :].