How to convert Dataframe into Series?

How to convert Dataframe into Series? - python

I want to convert N columns into one series. How to do it effectively?
Input:
0 1 2 3
0 64 98 47 58
1 80 94 81 46
2 18 43 79 84
3 57 35 81 31
Expected Output:
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
dtype: int64
So Far I tried:
print df[0].append(df[1]).append(df[2]).append(df[3]).reset_index(drop=True)
I'm not satisfied with my solution, moreover it won't work for dynamic columns. Please help me to find a better approach.

You can use unstack
pd.Series(df.unstack().values)

you need np.flatten
pd.Series(df.values.flatten(order='F'))
out[]
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
dtype: int64

Here's yet another short one.
>>> pd.Series(df.values.ravel(order='F'))
>>>
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
dtype: int64

You can also use Series class and .values attribute:
pd.Series(df.values.T.flatten())
Output:
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
dtype: int64

Use pd.melt() -
df.melt()['value']
Output
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
Name: value, dtype: int64

df.T.stack().reset_index(drop=True)
Out:
0 64
1 80
2 18
3 57
4 98
5 94
6 43
7 35
8 47
9 81
10 79
11 81
12 58
13 46
14 84
15 31
dtype: int64

Related

printing a number table of square

n: 8
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
How to print a number table like this in python with n that can be any number?
I am using a very stupid way to print it but the result is not the one expected:
n = int(input('n: '))
if n == 4:
print(' 0 1 2 3\n4 5 6 7\n8 9 10 11\n12 13 14 15')
if n == 5:
print(' 0 1 2 3 4\n5 6 7 8 9\n10 11 12 13 14\n15 16 17 18 19\n20 21 22 23 24')
if n == 6:
print(' 0 1 2 3 4 5\n6 7 8 9 10 11\n12 13 14 15 16 17\n18 19 20 21 22 23\n24 25 26 27 28 29\n30 31 32 33 34 35')
if n == 7:
print(' 0 1 2 3 4 5 6\n7 8 9 10 11 12 13\n14 15 16 17 18 19 20\n21 22 23 24 25 26 27\n28 29 30 31 32 33 34\n35 36 37 38 39 40 41\n42 43 44 45 46 47 48')
if n == 8:
print(' 0 1 2 3 4 5 6 7\n8 9 10 11 12 13 14 15\n16 17 18 19 20 21 22 23\n24 25 26 27 28 29 30 31\n32 33 34 35 36 37 38 39\n40 41 42 43 44 45 46 47\n48 49 50 51 52 53 54 55\n56 57 58 59 60 61 62 63')
if n == 9:
print(' 0 1 2 3 4 5 6 7 8\n9 10 11 12 13 14 15 16 17\n18 19 20 21 22 23 24 25 26\n27 28 29 30 31 32 33 34 35\n36 37 38 39 40 41 42 43 44\n45 46 47 48 49 50 51 52 53\n54 55 56 57 58 59 60 61 62\n63 64 65 66 67 68 69 70 71\n72 73 74 75 76 77 78 79 80')
if n == 10:
print(' 0 1 2 3 4 5 6 7 8 9\n10 11 12 13 14 15 16 17 18 19\n20 21 22 23 24 25 26 27 28 29\n30 31 32 33 34 35 36 37 38 39\n40 41 42 43 44 45 46 47 48 49\n50 51 52 53 54 55 56 57 58 59\n60 61 62 63 64 65 66 67 68 69\n70 71 72 73 74 75 76 77 78 79\n80 81 82 83 84 85 86 87 88 89\n90 91 92 93 94 95 96 97 98 99')
here is the result:
n: 8
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

I won't show you the code directly, here is some tips for you. Do you know % operator in python? And how to use it to break lines. As for the format, zfill function will help you. You may need to learn for or while statement to solve your problem

You can do this with a range loop and a list comprehension.
In order for the output to look right you need to figure out what the width of the largest value in the square will be. You then need to format each value to fit in that width (right-justified). Something like this:
def number_square(n):
w = len(str(n*n-1))
for r in range(n):
print(*[f'{c:>{w}}' for c in range(r*n, r*n+n)])
number_square(8)
Output:
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

Place data from a Pandas DF into a Grid or Template

I have process where the end product is a Pandas DF where the output, which is variable in terms of data and length, is structured like this example of the output.
9 80340796
10 80340797
11 80340798
12 80340799
13 80340800
14 80340801
15 80340802
16 80340803
17 80340804
18 80340805
19 80340806
20 80340807
21 80340808
22 80340809
23 80340810
24 80340811
25 80340812
26 80340813
27 80340814
28 80340815
29 80340816
30 80340817
31 80340818
32 80340819
33 80340820
34 80340821
35 80340822
36 80340823
37 80340824
38 80340825
39 80340826
40 80340827
41 80340828
42 80340829
43 80340830
44 80340831
45 80340832
46 80340833
I need to get the numbers in the second column above, into the following grid format based on the numbers in the first column above.
1 2 3 4 5 6 7 8 9 10 11 12
A 1 9 17 25 33 41 49 57 65 73 81 89
B 2 10 18 26 34 42 50 58 66 74 82 90
C 3 11 19 27 35 43 51 59 67 75 83 91
D 4 12 20 28 36 44 52 60 68 76 84 92
E 5 13 21 29 37 45 53 61 69 77 85 93
F 6 14 22 30 38 46 54 62 70 78 86 94
G 7 15 23 31 39 47 55 63 71 79 87 95
H 8 16 24 32 40 48 56 64 72 80 88 96
So the end result in this example would be
Any advice on how to go about this would be much appreciated. I've been asked for this by a colleague, so the data is easy to read for their team (as it matches the layout of a physical test) but I have no idea how to produce it.

pandas pivot table, can do what you want in your question, but first you have to create 2 auxillary columns, 1 determing which column the value has to go in, another which row it is. You can get that as shown in the following example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'num': list(range(9, 28)), 'val': list(range(80001, 80020))})
max_rows = 8
df['row'] = (df['num']-1)%8
df['col'] = np.ceil(df['num']/8).astype(int)
df.pivot_table(values=['val'], columns=['col'], index=['row'])
val
col 2 3 4
row
0 80001.0 80009.0 80017.0
1 80002.0 80010.0 80018.0
2 80003.0 80011.0 80019.0
3 80004.0 80012.0 NaN
4 80005.0 80013.0 NaN
5 80006.0 80014.0 NaN
6 80007.0 80015.0 NaN
7 80008.0 80016.0 NaN

Calculating the Derivatives of Curvatures on surface

I want to calculate the derivatives of curvatures on surface using vtk and python. I first calculate the curvatures using:
curvatures = vtk.vtkCurvatures()
curvatures.SetInputConnection(reader.GetOutputPort())
curvatures.SetCurvatureTypeToGaussian()
and calculate the derivative of curvatures using:
Derivativers = vtk.vtkCellDerivatives()
Derivativers.SetInputConnection(curvatures.GetOutputPort())
It seems that the results are the same with vtkCurvatures and vtkCellDerivatives.
What should I do to get the derivative of curvature on a surface. Many thanks!

I think your code is correct as it is. But we need to be sure that the curvature point data array is the currently active scalar array. I have attached a input data file that you can save with name 'Test.vtk'. It has two point data arrays -- PointIds (a scalar array) and PointNormals( a vector array). Then we will calculate Gaussian curvatures which will become the third array of scalars of the point data. We will print names of all the point data arrays irrespective of whether they are scalars or vectors. Then we will explicitly set the 'Gauss_Curvature' scalar array as the Active Scalar. We will compute Cell Derivatives which will create a Cell Data Vector array called 'ScalarGradient' which will be the gradient of the curvatures. This will be saved in a file 'Output.vtk'
import vtk
rd = vtk.vtkPolyDataReader()
rd.SetFileName('Test.vtk')
curv = vtk.vtkCurvatures()
curv.SetInputConnection(rd.GetOutputPort())
curv.SetCurvatureTypeToGaussian()
curv.Update()
pd = curv.GetOutput()
for i in range(pd.GetPointData().GetNumberOfArrays()):
print(pd.GetPointData().GetArrayName(i))
# This will print the following:
# PointIds
# PointNormals
# Gauss_Curvature
# To set the active scalar to Gauss_Curvature
pd.GetPointData().SetActiveScalars('Gauss_Curvature')
curvdiff = vtk.vtkCellDerivatives()
curvdiff.SetInputData(pd)
curvdiff.SetVectorModeToComputeGradient()
curvdiff.Update()
writer = vtk.vtkPolyDataWriter()
writer.SetFileName('Output.vtk')
writer.SetInputConnection(curvdiff.GetOutputPort())
writer.Write()
gives me the following outputs -- first for the curvature and then the gradient. Notice that the color scale in the two figures are different. So the curvature and derivative values are different although the color scheme makes them look similar.
In case you want to reproduce the results, the input vtk file is as below
# vtk DataFile Version 4.2
vtk output
ASCII
DATASET POLYDATA
POINTS 72 double
2.0927648978 0.33091989273 -0.39812666792 1.6450815105 0.64303293033 -1.236079764 1.7000810807 1.2495041516 -0.44287861593
1.0622264471 1.4540269048 -1.1853937884 0.8533187462 0.72833963362 -1.8409362444 0.161573121 1.415272931 -1.6182009866
-0.4682233113 2.0970647997 -0.17539653223 0.30090053169 1.9778473 -0.80327873468 -0.62604403311 1.746197318 -1.0984268611
0.62604948422 1.746195345 1.0984268742 0.4682298575 2.0970633231 0.17539654742 -0.30089435724 1.9778482191 0.80327874624
1.3794219731 1.1031586743 1.2360880686 1.9321437012 0.84755424016 0.44288858377 1.3329709879 1.6469225081 0.39813606858
-1.3329658439 1.6469266769 -0.39813605266 -1.3794185207 1.1031629885 -1.2360880529 -1.9321410548 0.84756028031 -0.44288857482
-0.16156870247 1.4152734137 1.6182009959 -1.0622219128 1.4540302146 1.1853938087 -0.85331647216 0.72834227646 1.8409362479
-1.7000771766 1.2495094572 0.44287862867 -2.0927638628 0.33092642637 0.39812667143 -1.6450795106 0.64303805991 1.2360797754
0.10502897512 0.5677157381 2.0771002606 -0.54417928828 -0.19289519204 2.0770984773 0.43913323132 -0.37482057542 2.077101172
1.0574135878 0.37481822068 1.8409414841 1.3064404335 -0.56771795917 1.6182050108 1.7903331906 0.19289323113 1.1854016225
-0.72812102639 -1.6469234624 1.18539471 -0.20411225533 -1.1031605232 1.8409380189 -1.1448850389 -0.84755547744 1.6181982897
0.26564737208 -1.7461967516 1.236085002 -0.23207016686 -2.0970637037 0.44288263714 0.75978960067 -1.9778489401 0.39813448025
1.1992202745 -1.4152750453 1.0984284306 1.5819944619 -1.4540310306 0.17539958384 1.8633106814 -0.72834386503 0.80328466622
-1.825278792 -0.33092031521 1.0984201446 -2.0502257619 -0.64303229501 0.17538963068 -1.5624229303 -1.2495043655 0.80327527281
-0.26565282447 -1.7461959014 -1.2360850131 0.23206361633 -2.0970644256 -0.44288265596 -0.7597957797 -1.977846564 -0.39813449851
-1.1992246997 -1.4152712955 -1.0984284473 -1.5819990123 -1.4540260972 -0.17539960215 -1.8633129661 -0.72833804688 -0.80328468018
0.20410881451 -1.1031611451 -1.8409380327 1.1448823984 -0.84755903977 -1.6181983017 0.72811588321 -1.6469257176 -1.1853947189
2.0502237661 -0.64303869999 -0.17538964133 1.5624190405 -1.2495092418 -0.80327529169 1.8252777661 -0.33092600698 -1.0984201511
-0.43913440065 -0.37481918558 -2.0771011678 -0.10502720377 0.56771608521 -2.0771002475 0.54417868626 -0.19289687027 -2.0770984714
-1.3064422115 -0.56771386838 -1.6182050202 -1.7903325818 0.19289882961 -1.185401614 -1.057412421 0.3748215375 -1.8409414839
-0.76083174443 1.3178134523 -1.9919051229 -0.7608358562 -1.3178110596 -1.9919051353 -2.4621262785 3.8465962003e-06 -0.47023127203
1.5216839818 -2.3645462409e-06 -1.991898872 2.4621262803 -3.846902628e-06 0.47023127288 1.2310617434 -2.1322669408 -0.47022115796
-1.2310684033 -2.1322631023 0.47022113869 -1.5216839821 2.3661982943e-06 1.9918988726 0.76083174316 -1.3178134534 1.9919051234
0.76083585779 1.317811059 1.9919051359 -1.2310617441 2.1322669425 0.47022115881 1.2310684021 2.1322631008 -0.47022113785
POLYGONS 140 560
3 12 14 9
3 27 69 24
3 70 21 19
3 1 53 63
3 2 14 13
3 38 36 37
3 28 68 36
3 39 67 23
3 64 38 51
3 13 14 12
3 20 24 18
3 34 35 33
3 40 41 39
3 16 58 17
3 20 18 19
3 26 27 24
3 11 6 70
3 10 14 71
3 22 39 23
3 6 10 7
3 3 5 7
3 29 64 13
3 41 30 32
3 57 45 47
3 54 61 57
3 66 30 41
3 50 43 42
3 30 33 31
3 33 35 36
3 65 37 35
3 37 36 35
3 26 68 28
3 68 33 36
3 27 28 29
3 28 36 38
3 29 28 38
3 38 37 51
3 61 48 42
3 37 65 52
3 66 34 30
3 43 65 35
3 32 30 31
3 30 34 33
3 40 39 22
3 41 32 39
3 66 41 46
3 32 67 39
3 67 32 25
3 33 68 31
3 32 31 25
3 31 26 25
3 27 26 28
3 26 31 68
3 64 29 38
3 12 69 27
3 18 9 11
3 69 12 9
3 18 24 69
3 20 67 25
3 26 24 25
3 24 20 25
3 13 12 29
3 12 27 29
3 18 11 19
3 11 9 10
3 69 9 18
3 9 14 10
3 70 6 15
3 11 10 6
3 10 71 7
3 71 14 2
3 70 15 21
3 6 8 15
3 21 17 22
3 15 8 16
3 20 23 67
3 19 11 70
3 21 23 19
3 23 20 19
3 22 17 62
3 22 23 21
3 15 17 21
3 62 40 22
3 58 57 47
3 62 17 58
3 62 47 40
3 58 16 59
3 15 16 17
3 6 7 8
3 16 60 59
3 48 54 56
3 8 5 60
3 2 1 3
3 7 5 8
3 3 1 4
3 2 3 71
3 71 3 7
3 3 4 5
3 5 55 60
3 49 50 48
3 8 60 16
3 60 55 59
3 4 55 5
3 54 57 59
3 1 63 4
3 56 55 4
3 49 48 56
3 44 45 42
3 63 56 4
3 48 61 54
3 56 54 55
3 54 59 55
3 59 57 58
3 47 62 58
3 40 46 41
3 57 61 45
3 47 45 46
3 43 34 44
3 47 46 40
3 46 44 66
3 37 52 51
3 42 48 50
3 42 43 44
3 43 35 34
3 45 44 46
3 44 34 66
3 61 42 45
3 50 65 43
3 65 50 52
3 56 63 49
3 51 52 53
3 49 63 53
3 50 49 52
3 49 53 52
3 2 0 1
3 1 0 53
3 0 51 53
3 0 64 51
3 13 64 0
3 2 13 0
POINT_DATA 72
SCALARS PointIds vtkIdType
LOOKUP_TABLE default
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71
NORMALS PointNormals double
0.94738623196 0.18727650058 -0.25958975291 0.78313719053 0.35367076216 -0.51148131227 0.83545291047 0.50824408436 -0.20906072109
0.47898857295 0.62402000487 -0.61738884061 0.34465195337 0.40584589543 -0.84646567573 0.15649087604 0.66776200195 -0.72773931766
-0.15609353126 0.97764567412 -0.14086782943 0.059136449433 0.91410106494 -0.40115099829 -0.27742338135 0.85504231805 -0.43810832201
0.27739675558 0.85505949665 0.43809165386 0.1561128187 0.97764027026 0.14088395868 -0.05910174957 0.91410169764 0.40115467037
0.6978536347 0.50139725414 0.51146954756 0.85786633279 0.46941626794 0.20907826874 0.63588503517 0.72681646701 0.25959207486
-0.63587825439 0.72682167945 -0.25959409059 -0.69785435483 0.50138010962 -0.51148537136 -0.85787788306 0.46940090711 -0.20906536337
-0.15651072102 0.66775823558 0.72773850593 -0.47897825964 0.62400201859 0.61741502054 -0.34463970914 0.40587327082 0.84645753521
-0.8354604399 0.50822639825 0.20907362693 -0.94738511041 0.18728497536 0.25958773195 -0.78315168201 0.35366155935 0.51146548701
-0.0042059530133 0.19834561529 0.98012311821 -0.16967339936 -0.10281294988 0.98012266318 0.17387129188 -0.095532679284 0.98012360499
0.52381294065 0.095528092331 0.84645991446 0.65654796659 -0.19833859028 0.72774073074 0.77988819356 0.10280486141 0.61741846913
-0.30091501143 -0.72680688867 0.6174155023 -0.17918019275 -0.50140137515 0.8464579845 -0.50004323045 -0.46941965564 0.72773755886
0.085293841035 -0.8550588207 0.51146786196 -0.022407612419 -0.97764266904 0.20907584885 0.31149794024 -0.91410144601 0.25959117785
0.60180306843 -0.66776342878 0.4380925359 0.76860432156 -0.62401813766 0.14088563004 0.82118451863 -0.40586818759 0.40115707731
-0.87920342686 -0.18729607405 0.43808847833 -0.92471829912 -0.35362213463 0.14088098936 -0.76208606823 -0.50823351579 0.40115273655
-0.08527862936 -0.85505086473 -0.51148369876 0.022426691996 -0.97764498848 -0.20906295701 -0.31150584284 -0.91409817614 -0.25959320919
-0.60177487516 -0.6677778914 -0.43810921854 -0.76861864432 -0.62400413386 -0.14086951598 -0.82120131764 -0.40583781416 -0.40115341765
0.17915035572 -0.50139827931 -0.84646613373 0.5000564084 -0.4694043424 -0.72773838139 0.30092542605 -0.72682480482 -0.61738933507
0.92471333385 -0.3536415386 -0.14086487277 0.7620681671 -0.50826324483 -0.40114907784 0.8792018588 -0.18726442077 -0.43810515655
-0.17386795802 -0.09554490851 -0.98012300434 0.0041936861197 0.19834885542 -0.98012251507 0.16968233044 -0.10280393451 -0.9801220627
-0.65654129971 -0.19835764899 -0.72774155087 -0.77990892195 0.10280480888 -0.61739229403 -0.52379534825 0.095552395563 -0.84646805779
-0.3035934934 0.52568869256 -0.79465866212 -0.30345974591 -0.52576590314 -0.79465866742 -0.98224561547 8.0181630296e-06 -0.18759944248
0.60705977261 7.7220189155e-05 -0.79465616874 0.98224726985 9.1023794904e-07 0.18759078032 0.49111663606 -0.85065410882 -0.18759540755
-0.49112361722 -0.85065199031 0.18758673723 -0.60706560357 -2.8418937296e-07 0.79465171803 0.30352977695 -0.52573221141 0.79465421184
0.30352929122 0.52573248433 0.79465421681 -0.49112519531 0.85065107744 0.18758674521 0.49113052171 0.85064609373 -0.18759539936

Rolling average across several columns and rows

import random
random.sample(range(1, 100), 10)
df = pd.DataFrame({"A": random.sample(range(1, 100), 10),
"B":random.sample(range(1, 100), 10),
"C":random.sample(range(1, 100), 10)})
df["D"]="need_to_calc"
df
I need the value of Column D, Row 9 to equal the average of the block of cells from rows 6 through 8 across columns A through C. I want to do this for all rows.
I am not sure how to do this in a single pythonic action. Instead I have hacky temporary columns and ugly nonsense.
Is there a cleaner way to define this column without temporary tables?

You can do it like this:
means = df.rolling(3).mean().shift(1)
df['D'] = (means['A'] + means['B'] + means['C'])/3
Output:
A B C D
0 43 57 15 NaN
1 86 34 68 NaN
2 40 12 78 NaN
3 97 24 54 48.111111
4 90 42 10 54.777778
5 34 54 98 49.666667
6 98 36 31 55.888889
7 16 5 24 54.777778
8 35 53 67 44.000000
9 80 66 37 40.555556

You can do it so:
df["D"]= (df.sum(axis=1).rolling(window=3, min_periods=3).sum()/9).shift(1)
Example:
A B C D
0 62 89 12 need_to_calc
1 44 13 63 need_to_calc
2 28 21 54 need_to_calc
3 93 93 4 need_to_calc
4 95 84 42 need_to_calc
5 68 68 35 need_to_calc
6 3 92 56 need_to_calc
7 13 88 83 need_to_calc
8 22 37 23 need_to_calc
9 64 58 5 need_to_calc
Output:
A B C D
0 62 89 12 NaN
1 44 13 63 NaN
2 28 21 54 NaN
3 93 93 4 42.888889
4 95 84 42 45.888889
5 68 68 35 57.111111
6 3 92 56 64.666667
7 13 88 83 60.333333
8 22 37 23 56.222222
9 64 58 5 46.333333

Issue with merging time series variables to create new DataFrame with arbitrary index

So I am trying to merge the following columns of data which are currently indexed as daily entries (but only have points once per week). I have separated the columns into year variables but am having trouble getting them into a combined dataframe and disregard the date index so that I can build out min/max columns by week over the years. I am not sure how to get merge/join function to do this.
#Create year variables, append to new dataframe with new index
I have the following:
def minmaxdata():
Totrigs = dataforgraphs()
tr = Totrigs
yrs=[tr['2007'],tr['2008'],tr['2009'],tr['2010'],tr['2011'],tr['2012'],tr['2013'],tr['2014']]
yrlist = ['tr07','tr08','tr09','tr10','tr11','tr12','tr13','tr14']
dic = dict(zip(yrlist,yrs))
yr07,yr08,yr09,yr10,yr11,yr12,yr13,yr14 =dic['tr07'],dic['tr08'],dic['tr09'],dic['tr10'],dic['tr11'],dic['tr12'],dic['tr13'],dic['tr14']
minmax = yr07.append([yr08,yr09,yr10,yr11,yr12,yr13,yr14],ignore_index=True)
I would like a Dataframe like the following:
2007 2008 2009 2010 2011 2012 2013 2014 min max
1 10 13 10 12 34 23 22 14 10 34
2 25 ...
3 22
4 ...
5
.
.
. ...
52

I'm not sure what your original data look like, but I don't think it's a good idea to hard-code all years. You lose re-usability. I'll setup a sequence of random integers indexed by date with one date per week.
In [65]: idx = pd.date_range ('2007-1-1','2014-12-31',freq='W')
In [66]: df = pd.DataFrame(np.random.randint(100, size=len(idx)), index=idx, columns=['value'])
In [67]: df.head()
Out[67]:
value
2007-01-07 7
2007-01-14 2
2007-01-21 85
2007-01-28 55
2007-02-04 36
In [68]: df.tail()
Out[68]:
value
2014-11-30 76
2014-12-07 34
2014-12-14 43
2014-12-21 26
2014-12-28 17
Then get year of the week:
In [69]: df['year'] = df.index.year
In [70]: df['week'] = df.groupby('year').cumcount()+1
(You may try df.index.week for week# but I've seen weird behavior like starting from week #53 in Jan.)
Finally, do a pivot table to transform and get row-wise max/min:
In [71]: df2 = df.pivot_table(index='week', columns='year', values='value')
In [72]: df2['max'] = df2.max(axis=1)
In [73]: df2['min'] = df2.min(axis=1)
And now our dataframe df2 looks like this and should be what you need:
In [74]: df2
Out[74]:
year 2007 2008 2009 2010 2011 2012 2013 2014 max min
week
1 7 82 13 32 24 58 18 10 82 7
2 2 5 29 0 2 97 59 83 97 0
3 85 89 8 83 63 73 47 49 89 8
4 55 5 1 44 78 10 13 87 87 1
5 36 41 48 98 98 24 24 69 98 24
6 51 43 62 60 44 57 34 33 62 33
7 37 66 72 46 28 11 73 36 73 11
8 30 13 86 93 46 67 95 15 95 13
9 78 84 16 21 70 39 43 90 90 16
10 9 2 88 15 39 81 44 96 96 2
11 34 76 16 44 44 26 30 77 77 16
12 2 24 23 13 25 69 25 74 74 2
13 66 91 67 77 18 47 95 66 95 18
14 59 52 22 42 40 99 88 21 99 21
15 76 17 31 57 43 31 91 67 91 17
16 76 38 53 43 84 45 78 9 84 9
17 88 53 34 22 99 93 61 42 99 22
18 78 19 82 19 5 80 55 69 82 5
19 54 92 56 6 2 85 7 67 92 2
20 8 56 86 41 60 76 31 81 86 8
21 64 76 11 38 41 98 39 72 98 11
22 21 86 34 1 15 27 26 95 95 1
23 82 90 3 17 62 18 93 20 93 3
24 47 42 32 27 83 8 22 14 83 8
25 15 66 70 16 4 22 26 14 70 4
26 12 68 21 7 86 2 27 10 86 2
27 85 85 9 39 17 94 67 42 94 9
28 73 80 96 49 46 23 69 84 96 23
29 57 74 6 71 79 31 79 7 79 6
30 18 84 85 34 71 69 0 62 85 0
31 24 40 93 53 72 46 44 71 93 24
32 95 4 58 57 68 27 95 71 95 4
33 65 84 87 41 38 45 71 33 87 33
34 62 14 41 83 79 63 44 13 83 13
35 49 96 50 62 25 45 69 63 96 25
36 6 38 86 34 98 60 67 80 98 6
37 99 44 26 19 19 20 57 17 99 17
38 2 40 7 65 68 58 68 13 68 2
39 72 31 83 65 69 39 10 76 83 10
40 90 31 42 20 7 8 62 79 90 7
41 10 46 82 96 30 43 12 84 96 10
42 79 38 28 78 25 9 80 2 80 2
43 64 83 63 40 29 86 10 15 86 10
44 89 91 62 48 53 69 16 0 91 0
45 99 26 85 45 26 53 79 86 99 26
46 35 14 46 25 74 6 68 44 74 6
47 17 9 84 88 29 83 85 1 88 1
48 18 69 55 16 77 35 16 76 77 16
49 60 4 36 50 81 28 50 34 81 4
50 36 29 38 28 81 86 71 43 86 28
51 41 82 95 27 95 77 74 26 95 26
52 2 81 89 82 28 2 11 17 89 2
53 NaN NaN NaN NaN NaN 0 NaN NaN 0 0
EDIT:
If you need max/min over a certain columns, just list them. In this case (2007-2013), they are consecutive so you can do the following.
df2['max_2007to2013'] = df2[range(2007,2014)].max(axis=1)
If not, simply list them like: df2[[2007,2010,2012,2013]].max(axis=1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert Dataframe into Series? - python

You can use unstack pd.Series(df.unstack().values)

you need np.flatten pd.Series(df.values.flatten(order='F')) out[] 0 64 1 80 2 18 3 57 4 98 5 94 6 43 7 35 8 47 9 81 10 79 11 81 12 58 13 46 14 84 15 31 dtype: int64

Here's yet another short one. >>> pd.Series(df.values.ravel(order='F')) >>> 0 64 1 80 2 18 3 57 4 98 5 94 6 43 7 35 8 47 9 81 10 79 11 81 12 58 13 46 14 84 15 31 dtype: int64

You can also use Series class and .values attribute: pd.Series(df.values.T.flatten()) Output: 0 64 1 80 2 18 3 57 4 98 5 94 6 43 7 35 8 47 9 81 10 79 11 81 12 58 13 46 14 84 15 31 dtype: int64

Use pd.melt() - df.melt()['value'] Output 0 64 1 80 2 18 3 57 4 98 5 94 6 43 7 35 8 47 9 81 10 79 11 81 12 58 13 46 14 84 15 31 Name: value, dtype: int64

df.T.stack().reset_index(drop=True) Out: 0 64 1 80 2 18 3 57 4 98 5 94 6 43 7 35 8 47 9 81 10 79 11 81 12 58 13 46 14 84 15 31 dtype: int64

Related

printing a number table of square

Place data from a Pandas DF into a Grid or Template

Calculating the Derivatives of Curvatures on surface

Rolling average across several columns and rows

Issue with merging time series variables to create new DataFrame with arbitrary index

Categories

Resources