changing columns names in pandas - python

I'm trying to change the names of the columns in a pandas dataframe. I use python 3.7. I have 30 columns numbered 0-29 and I want to change their names to 1-30. I know it's a silly question, but I'm trying to do it in minimum lines as possible, but I couldn't find anything efficient online. can anyone please help me?
Thank you

If you have dataframe like this:
0 1 2 3
0 a d e f
1 b g h i
2 c j k l
Then you can do:
df.columns = df.columns.astype(int) + 1
print(df)
Prints:
1 2 3 4
0 a d e f
1 b g h i
2 c j k l

Another way is to recreate the index with RangeIndex
df.columns = pd.RangeIndex(1, len(df.columns)+1)
FYI, you can read the documentation about Int64Index and RangeIndex: RangeIndex is an optimized version of Int64Index

Here you can use this. I believe you will find it short and simple enough.
df.columns = [list(range(1,31))]

In this case, You can use list comprehension to rename your dataframe columns
df = df[[i for i in range(1,30)]]

You can use below ...
Sample Data:
Just creating random sample data with 30 columns as follows, where we see the default RangeIndex starting Index startwith 0 by having step=1, which we can change to get the desired.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 30)))
print(df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
0 37 87 10 94 76 42 94 80 2 54 98 18 27 32 94 41 97 61 22 87 67 43 12 49 67 92 69 52 78 49
1 80 77 64 81 91 36 46 83 54 25 55 5 4 57 68 59 36 94 79 14 27 7 36 37 15 3 9 32 50 95
2 58 91 87 59 60 65 90 97 55 48 11 62 76 28 89 99 78 60 92 25 93 35 41 69 88 19 85 18 56 52
3 50 5 80 32 42 96 89 62 77 89 72 8 1 3 52 92 71 95 42 18 9 76 5 53 56 18 17 5 3 40
4 37 92 30 45 14 15 96 29 0 45 59 59 82 51 78 30 25 95 50 22 34 12 24 59 63 5 75 15 85 95
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
95 49 58 9 18 44 48 15 74 76 70 81 88 36 32 35 96 93 95 2 69 20 40 22 19 55 92 33 45 20 82
96 75 15 65 77 4 2 45 16 42 25 12 47 35 64 3 89 47 68 59 52 82 37 67 32 64 62 7 81 79 42
97 7 95 21 52 42 84 0 85 0 2 16 97 45 56 30 15 33 49 82 60 51 29 3 37 51 8 65 73 55 56
98 69 66 25 61 85 50 76 27 51 44 46 53 56 67 20 15 5 77 54 18 18 48 34 2 89 84 55 26 19 4
99 41 63 23 46 33 78 86 32 4 9 13 40 13 17 22 78 60 96 56 3 30 78 65 66 15 43 98 79 10 23
[100 rows x 30 columns]
print(df.columns)
RangeIndex(start=0, stop=30, step=1) <-- default behaviour
Solution :
We can change the default RangeIndex to start=1 as follows in order to get the result you desired.
df.columns = df.columns+1
print(df.columns)
RangeIndex(start=1, stop=31, step=1)
print(df)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 37 87 10 94 76 42 94 80 2 54 98 18 27 32 94 41 97 61 22 87 67 43 12 49 67 92 69 52 78 49
1 80 77 64 81 91 36 46 83 54 25 55 5 4 57 68 59 36 94 79 14 27 7 36 37 15 3 9 32 50 95
2 58 91 87 59 60 65 90 97 55 48 11 62 76 28 89 99 78 60 92 25 93 35 41 69 88 19 85 18 56 52
3 50 5 80 32 42 96 89 62 77 89 72 8 1 3 52 92 71 95 42 18 9 76 5 53 56 18 17 5 3 40
4 37 92 30 45 14 15 96 29 0 45 59 59 82 51 78 30 25 95 50 22 34 12 24 59 63 5 75 15 85 95
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
95 49 58 9 18 44 48 15 74 76 70 81 88 36 32 35 96 93 95 2 69 20 40 22 19 55 92 33 45 20 82
96 75 15 65 77 4 2 45 16 42 25 12 47 35 64 3 89 47 68 59 52 82 37 67 32 64 62 7 81 79 42
97 7 95 21 52 42 84 0 85 0 2 16 97 45 56 30 15 33 49 82 60 51 29 3 37 51 8 65 73 55 56
98 69 66 25 61 85 50 76 27 51 44 46 53 56 67 20 15 5 77 54 18 18 48 34 2 89 84 55 26 19 4
99 41 63 23 46 33 78 86 32 4 9 13 40 13 17 22 78 60 96 56 3 30 78 65 66 15 43 98 79 10 23
[100 rows x 30 columns]
for more, you can look at the help(df.columns)
| start : int (default: 0), or other RangeIndex instance
| If int and "stop" is not given, interpreted as "stop" instead.
| stop : int (default: 0)
| step : int (default: 1)
| name : object, optional
| Name to be stored in the index.
| copy : bool, default False
| Unused, accepted for homogeneity with other index types.
|
| Attributes
| ----------
| start
| stop
| step
|
| Methods
| -------
| from_range

Related

issubset method different than subSet in superSet - Error in Python3.x

Why issubset method of sets in python3.x don't return the same than subSet in superSet ?
logically is correctly but the console return me unexpected result
works fine with shorts sets but large sets the (subSet in superSet) make mistakes
def isStrictSuperset(superSet, subSet):
strictSuperset = False
# condition1 = subSet.issubset(superSet) # why this is difrent than de follow condition
condition1 = subSet in superSet # Error! incorrect result line
condition2 = superSet != subSet
if condition1 and condition2:
strictSuperset = True
return strictSuperset # return if strict superset or not
if __name__ == "__main__":
# list of string
superSet = input().split(' ')
subSet = input().split(" ")
# convert the list of string to set of integers
superSet = set(int(x) for x in superSet)
subSet = set (int(x) for x in subSet)
# output
print( isStrictSuperset(superSet, subSet) )
input:
51 28 10 61 99 31 55 7 88 48 18 80 18 36 49 21 36 1 49 53 11 78 46 87 82 28 76 50 89 31 14 81 87 39 3 69 26 18 85 18 23 43 75 5 64 47 34 19 2 54 92 45 79 80 59 16 75 80 55 24 56 74 76 31 22 74 20 93 79 81 12 57 21 79 65 32 57 37 47 84 82 28 72 15 53 50 86 58 83 88 3 44 76 63 32 14 13 38 29 70 38 4 71 15 45 4 94 24 46 6 95 48 15 82 92 62 6 67 38 20 60 78 37 84 32 39 51 88 13 99 6 3 64 37 83 68 18 51 98 37 11 48 63 97 30 90 73 44 63 25 78 12 25 91 36 38 59 12 36 51 58 61 82 91 31 41 36 99 28 50 28 64 22 56 26 39 75 53 8 41 94 86 35 69 48 17 80 32 12 29 2 33 51 79 58 74 91 46 6 54 66 0 75 60 30 95 57 36 70 32 83 1 88 27 57 2 67 28 18 51 61 16 40 79 96 78 27 72 85 45 73 12 89 31 11 24 42 94 22 84 1 67 8 62 80 77 81 58 1 6 63 30 64 37 44 60 11 14 68 28 81 86 30 17 81 14 30 44 64 89 7 94 89 13 59 88 34 42 6 51 10 19 66 91 46 22 41 34 98 4 26 90 84 90 44 90 84 13 36 6 97 21 30 52 46 15 83 89 45 83 33 11 3 18 6 82 17 23 13 91 27 39 76 11 86 12 97 64 51 48 84 35 66 15 48 32 99 11 18 93 11 85 71 63 57 76 1 80 45 19 7 39 80 70 78 3 17 51 14 99 47 83 17 82 23 59 59 41 77 22 7 35 22 98 59 90 80 72 60 67 22 75 3 99 18 81 47 48 18 98 18 37 47 65 98 86 82 5 30 87 25 17 97 60 93 33 99 89 62 98 40 27 70 57 49 93 46 11 38 94 43 75 61 75 55 45 26 9 84 89 40 87 14 61 31 99 53 6 83 55 15 95 46 8 58 73 58 57 9 7 49 21 31 88 31 32 61 30 19 69 78 33 3 0 70 73 40 91 91 96 72 79 0 41 91 51 10 80 50 77 30 38 1 85 56 90 78 36 31 0 82 12 95 28 1 65 72 75 89 54
81 79 97 20 68 23 19 12 53 86 26 36 4 64 10 43 12 75 98 30 12 33 27 1 32 68 64 49 99 10 16 9 7 47 23 29 30 94 57 25 38 15 57 33 79 28 45 98 20 50 34 93 6 14 9 29 56 13 44 67 5 23 32 38 78 20 55 35 25 91 64 10 47 32 97 44 85 65 87 36 91 88 78 6 48 86 67 56 44 18 98 39 10 80 47 65 49 98 63 21
output: False
expected output: True
subset in superset checkes whether subset is an element of superset; i.e., it checks ∈, not ⊆.
You can simply use < to check whether a set is a proper subset of another: https://docs.python.org/3/library/stdtypes.html#frozenset.issubset
print({1, 2} in {1,2,3}) # False
print({1, 2} < {1,2,3}) # True

Project Euler problem 11 in Python - Row by row iterations not working

In order to solve problem 11, I have sought to implement 4 loops. Each of the 4 loops iterates in a different direction, so for example the first loop (which I will use to demonstrate my issue below) starts vertically from the top left of the grid. The logic of the loop is to go through the top row and then move down a row and follow the same multiplication pattern. After 16 iterations there are no more combinations of numbers and so the loop stops.
In order to test whether or not the function works, I want to print a list of all the iterations to ensure that it prints 360 unique numbers. The idea being that I can then alter the code to start with figure = 0, and with each iteration I can check to see if the number produced is bigger than the current value for figure. If it is, then figure is replaced with the value of that iteration.
My issue is that the output of my code is the same list of 20 numbers 16 times. Any help with this one would be highly appreciated! I know that there are many ways of doing this, and that I can look up the answers, but I want to get my own logic/solution working before I look at any answers, and this is the main blocker at the moment.
#code starts here
twenmat = [20*20 matrix]
newlist = []
figure = 0
for items in twenmat:
for x in range(0,20):
y = 0
newlist.append(twenmat[0+y][x]*twenmat[1+y][x]*twenmat[2+y][x]*twenmat[3+y][x])
y = y + 1
if y == 16:
break
print(newlist)
#end of script
Rather than manipulating individual coordinates, you could just shift the matrix by 1, 2 and 3 in each direction and perform cell by cell of multiplications between shifted matrices. Record the maximum of these products as you go through the 4 directions (right, down, down-right, up-right):
data =\
"""08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48"""
M = [ [*map(int,line.split())] for line in data.split("\n") ]
...
# shift the matrix by a positive or negative amount vertically and horizontally
# empty positions are filled with 1 so that the products aren't impacted
def shift(m,v,h):
if v<0 : m = [[1]*len(m)]*-v + m[:v]
else : m = m[v:] + [[1]*len(m)]*v
if h<0 : m = [ [1]*-h + r[:h] for r in m ]
else : m = [ r[h:] + [1]*h for r in m ]
return m
# base matrix multiplied cell by cell with 3 shifted versions ...
maxProd = 0
for dv,dh in [(0,1),(1,0),(1,1),(-1,1)]:
m = M # start with non-shifted values
for i in range(1,4):
# multiply by each shifted copies cell by cell
m = [ [a*b for a,b in zip(r0,r1)]
for r0,r1 in zip(m,shift(M,dv*i,dh*i)) ]
# record maximum of all resulting products
maxProd = max(maxProd,max((max(row) for row in m)))
print(maxProd) # 70600674
To illustrate this shifting process, let's look at the 3 shifted versions going down-right on the main diagonal (offset: 1,1):
shifted by 1:
49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 4 56 62 0 1
49 31 73 55 79 14 29 93 71 40 67 53 88 30 3 49 13 36 65 1
70 95 23 4 60 11 42 69 24 68 56 1 32 56 71 37 2 36 91 1
31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80 1
47 32 60 99 3 45 2 44 75 33 53 78 36 84 20 35 17 12 50 1
98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70 1
26 20 68 2 62 12 20 95 63 94 39 63 8 40 91 66 49 94 21 1
55 58 5 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72 1
36 23 9 75 0 76 44 20 45 35 14 0 61 33 97 34 31 33 95 1
17 53 28 22 75 31 67 15 94 3 80 4 62 16 14 9 53 56 92 1
39 5 42 96 35 31 47 55 58 88 24 0 17 54 24 36 29 85 57 1
56 0 48 35 71 89 7 5 44 44 37 44 60 21 58 51 54 17 58 1
80 81 68 5 94 47 69 28 73 92 13 86 52 17 77 4 89 55 40 1
52 8 83 97 35 99 16 7 97 57 32 16 26 26 79 33 27 98 66 1
36 68 87 57 62 20 72 3 46 33 67 46 55 12 32 63 93 53 69 1
42 16 73 38 25 39 11 24 94 72 18 8 46 29 32 40 62 76 36 1
69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 4 36 16 1
73 35 29 78 31 90 1 74 31 49 71 48 86 81 16 23 57 5 54 1
70 54 71 83 51 54 69 16 92 33 48 61 43 52 1 89 19 67 48 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
shifted by 2:
31 73 55 79 14 29 93 71 40 67 53 88 30 3 49 13 36 65 1 1
95 23 4 60 11 42 69 24 68 56 1 32 56 71 37 2 36 91 1 1
16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80 1 1
32 60 99 3 45 2 44 75 33 53 78 36 84 20 35 17 12 50 1 1
81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70 1 1
20 68 2 62 12 20 95 63 94 39 63 8 40 91 66 49 94 21 1 1
58 5 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72 1 1
23 9 75 0 76 44 20 45 35 14 0 61 33 97 34 31 33 95 1 1
53 28 22 75 31 67 15 94 3 80 4 62 16 14 9 53 56 92 1 1
5 42 96 35 31 47 55 58 88 24 0 17 54 24 36 29 85 57 1 1
0 48 35 71 89 7 5 44 44 37 44 60 21 58 51 54 17 58 1 1
81 68 5 94 47 69 28 73 92 13 86 52 17 77 4 89 55 40 1 1
8 83 97 35 99 16 7 97 57 32 16 26 26 79 33 27 98 66 1 1
68 87 57 62 20 72 3 46 33 67 46 55 12 32 63 93 53 69 1 1
16 73 38 25 39 11 24 94 72 18 8 46 29 32 40 62 76 36 1 1
36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 4 36 16 1 1
35 29 78 31 90 1 74 31 49 71 48 86 81 16 23 57 5 54 1 1
54 71 83 51 54 69 16 92 33 48 61 43 52 1 89 19 67 48 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
shifter by 3:
23 4 60 11 42 69 24 68 56 1 32 56 71 37 2 36 91 1 1 1
71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80 1 1 1
60 99 3 45 2 44 75 33 53 78 36 84 20 35 17 12 50 1 1 1
28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70 1 1 1
68 2 62 12 20 95 63 94 39 63 8 40 91 66 49 94 21 1 1 1
5 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72 1 1 1
9 75 0 76 44 20 45 35 14 0 61 33 97 34 31 33 95 1 1 1
28 22 75 31 67 15 94 3 80 4 62 16 14 9 53 56 92 1 1 1
42 96 35 31 47 55 58 88 24 0 17 54 24 36 29 85 57 1 1 1
48 35 71 89 7 5 44 44 37 44 60 21 58 51 54 17 58 1 1 1
68 5 94 47 69 28 73 92 13 86 52 17 77 4 89 55 40 1 1 1
83 97 35 99 16 7 97 57 32 16 26 26 79 33 27 98 66 1 1 1
87 57 62 20 72 3 46 33 67 46 55 12 32 63 93 53 69 1 1 1
73 38 25 39 11 24 94 72 18 8 46 29 32 40 62 76 36 1 1 1
41 72 30 23 88 34 62 99 69 82 67 59 85 74 4 36 16 1 1 1
29 78 31 90 1 74 31 49 71 48 86 81 16 23 57 5 54 1 1 1
71 83 51 54 69 16 92 33 48 61 43 52 1 89 19 67 48 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Each number is moved to the next position diagonally so the product of cells at a given position corresponds to the 4 values going down-right on the main diagonal.
We do this for all directions to get the maximum product.

How to write this code in an optimal (pythonic) way?

I have the following code in R and I need to write it in an optimal way in python using pandas. I wrote it but it takes a long time to run.
1) is there someone who can confirm that this is an equivalent of R code in python
2) how to write it in a pythonic way(optimal way)
in R
for (i in 1:dim(df1)[1])
df1$column1[i] <- sum(df2[i,4:33])
in Python
for i in range(df1.shape[0]):
df1['column1'][i] = df2.iloc[i,3:34].sum()
These are two ways to make the replacement
df1['column1'] = df2.iloc[:, 3:34].sum(axis=1)
OR
df1.loc[:, 'column1'] = df2.iloc[:, 3:34].sum(axis=1)
Use vectorized operations:
>>> df = pd.DataFrame(np.random.randint(0, 100, (10, 15)), columns=list('abcdefghijklmno'))
>>> df
a b c d e f g h i j k l m n o
0 71 93 12 32 17 23 35 57 26 89 4 29 28 83 30
1 98 78 75 0 61 81 8 17 93 71 48 47 72 52 11
2 13 62 93 48 31 23 42 66 77 99 59 1 40 72 87
3 7 5 5 43 83 19 59 36 18 96 50 60 46 45 54
4 32 69 93 6 7 12 15 49 29 11 37 83 75 97 84
5 52 53 43 61 93 85 91 99 65 62 35 89 55 77 62
6 44 7 41 56 40 11 39 91 87 46 95 48 30 75 16
7 93 15 63 23 14 20 7 33 29 31 41 40 82 0 16
8 46 63 59 59 81 51 34 41 89 68 20 64 95 70 74
9 33 58 49 91 51 46 43 83 37 53 47 32 42 12 59
Then simply:
>>> df['column1'] = df.iloc[:, 3:8].sum(axis=1)
>>> df
a b c d e f g h i j k l m n o column1
0 71 93 12 32 17 23 35 57 26 89 4 29 28 83 30 164
1 98 78 75 0 61 81 8 17 93 71 48 47 72 52 11 167
2 13 62 93 48 31 23 42 66 77 99 59 1 40 72 87 210
3 7 5 5 43 83 19 59 36 18 96 50 60 46 45 54 240
4 32 69 93 6 7 12 15 49 29 11 37 83 75 97 84 89
5 52 53 43 61 93 85 91 99 65 62 35 89 55 77 62 429
6 44 7 41 56 40 11 39 91 87 46 95 48 30 75 16 237
7 93 15 63 23 14 20 7 33 29 31 41 40 82 0 16 97
8 46 63 59 59 81 51 34 41 89 68 20 64 95 70 74 266
9 33 58 49 91 51 46 43 83 37 53 47 32 42 12 59 314
>>>

How to create a pandas dataframe array ,whose specific column always has value greater than a particular column -by using np.random.randint

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
print(df)
I want column 'A' always to have a value greater than column 'B'.
df.A, df.B = df[['A', 'B']].max(axis=1), df[['A', 'B']].min(axis=1)
Try this:
newdf = df.apply(lambda x: x if x[0]>x[1] else [*x[:2][::-1],*x[2:]],axis=1)
print(newdf)
Output:
A B C D
0 85 14 22 85
1 62 54 20 1
2 82 78 48 59
3 81 59 54 39
4 92 12 79 44
5 69 64 8 11
6 49 34 48 69
7 68 28 80 27
8 72 17 2 40
9 26 15 49 62
10 29 2 86 12
11 69 7 32 99
12 39 35 65 32
13 45 36 36 12
14 54 21 29 79
15 91 82 35 80
16 67 16 4 37
17 94 82 93 37
18 64 18 2 15
19 13 11 28 82
20 78 9 93 45
21 72 41 16 33
22 92 71 62 69
23 87 79 71 11
24 31 14 8 24
25 85 27 43 3
26 82 34 14 52
27 41 32 39 48
28 13 12 24 86
29 96 17 14 80
.. .. .. .. ..
70 17 13 20 91
71 26 7 57 96
72 41 0 24 58
73 98 68 90 13
74 88 35 81 56
75 65 43 70 86
76 82 81 44 68
77 97 45 23 66
78 81 45 78 48
79 62 24 43 62
80 43 13 42 49
81 97 28 75 45
82 3 0 54 40
83 57 46 16 38
84 87 46 35 13
85 41 13 78 89
86 62 36 94 23
87 84 35 69 93
88 63 18 39 3
89 45 42 30 6
90 81 8 49 82
91 28 28 11 47
92 97 81 49 92
93 86 24 82 40
94 76 72 30 51
95 93 92 1 69
96 97 76 38 81
97 87 49 26 64
98 98 25 93 55
99 57 2 87 10
[100 rows x 4 columns]
You can apply it to any no of columns.
import numpy as np
import pandas as pd
#np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
#we are just sorting values of each rows in descending order.
df.values[:,::-1].sort()
print(df)
It gives following output:
A B C D
0 72 37 12 9
1 79 75 64 5
2 76 71 16 1
3 50 25 20 6
4 84 28 18 11
5 68 50 29 14
6 96 94 87 87
7 86 13 9 7
8 63 61 57 22
9 81 60 1 0
10 88 47 13 8
11 72 71 30 3
12 70 57 49 21
13 68 43 24 3
14 80 76 52 26
15 82 64 41 15
16 98 87 68 25
17 26 25 22 7
18 67 27 23 9
19 83 57 38 37
20 34 32 10 8

Issue with merging time series variables to create new DataFrame with arbitrary index

So I am trying to merge the following columns of data which are currently indexed as daily entries (but only have points once per week). I have separated the columns into year variables but am having trouble getting them into a combined dataframe and disregard the date index so that I can build out min/max columns by week over the years. I am not sure how to get merge/join function to do this.
#Create year variables, append to new dataframe with new index
I have the following:
def minmaxdata():
Totrigs = dataforgraphs()
tr = Totrigs
yrs=[tr['2007'],tr['2008'],tr['2009'],tr['2010'],tr['2011'],tr['2012'],tr['2013'],tr['2014']]
yrlist = ['tr07','tr08','tr09','tr10','tr11','tr12','tr13','tr14']
dic = dict(zip(yrlist,yrs))
yr07,yr08,yr09,yr10,yr11,yr12,yr13,yr14 =dic['tr07'],dic['tr08'],dic['tr09'],dic['tr10'],dic['tr11'],dic['tr12'],dic['tr13'],dic['tr14']
minmax = yr07.append([yr08,yr09,yr10,yr11,yr12,yr13,yr14],ignore_index=True)
I would like a Dataframe like the following:
2007 2008 2009 2010 2011 2012 2013 2014 min max
1 10 13 10 12 34 23 22 14 10 34
2 25 ...
3 22
4 ...
5
.
.
. ...
52
I'm not sure what your original data look like, but I don't think it's a good idea to hard-code all years. You lose re-usability. I'll setup a sequence of random integers indexed by date with one date per week.
In [65]: idx = pd.date_range ('2007-1-1','2014-12-31',freq='W')
In [66]: df = pd.DataFrame(np.random.randint(100, size=len(idx)), index=idx, columns=['value'])
In [67]: df.head()
Out[67]:
value
2007-01-07 7
2007-01-14 2
2007-01-21 85
2007-01-28 55
2007-02-04 36
In [68]: df.tail()
Out[68]:
value
2014-11-30 76
2014-12-07 34
2014-12-14 43
2014-12-21 26
2014-12-28 17
Then get year of the week:
In [69]: df['year'] = df.index.year
In [70]: df['week'] = df.groupby('year').cumcount()+1
(You may try df.index.week for week# but I've seen weird behavior like starting from week #53 in Jan.)
Finally, do a pivot table to transform and get row-wise max/min:
In [71]: df2 = df.pivot_table(index='week', columns='year', values='value')
In [72]: df2['max'] = df2.max(axis=1)
In [73]: df2['min'] = df2.min(axis=1)
And now our dataframe df2 looks like this and should be what you need:
In [74]: df2
Out[74]:
year 2007 2008 2009 2010 2011 2012 2013 2014 max min
week
1 7 82 13 32 24 58 18 10 82 7
2 2 5 29 0 2 97 59 83 97 0
3 85 89 8 83 63 73 47 49 89 8
4 55 5 1 44 78 10 13 87 87 1
5 36 41 48 98 98 24 24 69 98 24
6 51 43 62 60 44 57 34 33 62 33
7 37 66 72 46 28 11 73 36 73 11
8 30 13 86 93 46 67 95 15 95 13
9 78 84 16 21 70 39 43 90 90 16
10 9 2 88 15 39 81 44 96 96 2
11 34 76 16 44 44 26 30 77 77 16
12 2 24 23 13 25 69 25 74 74 2
13 66 91 67 77 18 47 95 66 95 18
14 59 52 22 42 40 99 88 21 99 21
15 76 17 31 57 43 31 91 67 91 17
16 76 38 53 43 84 45 78 9 84 9
17 88 53 34 22 99 93 61 42 99 22
18 78 19 82 19 5 80 55 69 82 5
19 54 92 56 6 2 85 7 67 92 2
20 8 56 86 41 60 76 31 81 86 8
21 64 76 11 38 41 98 39 72 98 11
22 21 86 34 1 15 27 26 95 95 1
23 82 90 3 17 62 18 93 20 93 3
24 47 42 32 27 83 8 22 14 83 8
25 15 66 70 16 4 22 26 14 70 4
26 12 68 21 7 86 2 27 10 86 2
27 85 85 9 39 17 94 67 42 94 9
28 73 80 96 49 46 23 69 84 96 23
29 57 74 6 71 79 31 79 7 79 6
30 18 84 85 34 71 69 0 62 85 0
31 24 40 93 53 72 46 44 71 93 24
32 95 4 58 57 68 27 95 71 95 4
33 65 84 87 41 38 45 71 33 87 33
34 62 14 41 83 79 63 44 13 83 13
35 49 96 50 62 25 45 69 63 96 25
36 6 38 86 34 98 60 67 80 98 6
37 99 44 26 19 19 20 57 17 99 17
38 2 40 7 65 68 58 68 13 68 2
39 72 31 83 65 69 39 10 76 83 10
40 90 31 42 20 7 8 62 79 90 7
41 10 46 82 96 30 43 12 84 96 10
42 79 38 28 78 25 9 80 2 80 2
43 64 83 63 40 29 86 10 15 86 10
44 89 91 62 48 53 69 16 0 91 0
45 99 26 85 45 26 53 79 86 99 26
46 35 14 46 25 74 6 68 44 74 6
47 17 9 84 88 29 83 85 1 88 1
48 18 69 55 16 77 35 16 76 77 16
49 60 4 36 50 81 28 50 34 81 4
50 36 29 38 28 81 86 71 43 86 28
51 41 82 95 27 95 77 74 26 95 26
52 2 81 89 82 28 2 11 17 89 2
53 NaN NaN NaN NaN NaN 0 NaN NaN 0 0
EDIT:
If you need max/min over a certain columns, just list them. In this case (2007-2013), they are consecutive so you can do the following.
df2['max_2007to2013'] = df2[range(2007,2014)].max(axis=1)
If not, simply list them like: df2[[2007,2010,2012,2013]].max(axis=1)

Categories

Resources