Using apply to the pandas group object with original function - python

I have a multi-index df and I want to add a new column by apply an operation
class weight height time
A 45 150 85
50 160 80
55 155 74
B 78 180 90
51 158 65
40 155 68
C 80 185 90
86 175 81
52 162 73
def operation(col):
concat = ''
for i in col:
concat += (str(i))
return concat
and the result df should look like
df['new'] = df.groupby(level=0)['height'].apply(operation)
class weight height time new
A 45 150 85 150160155
50 160 80
55 155 74
B 78 180 90 180158155
51 158 65
40 155 68
C 80 185 90 185175162
86 175 81
52 162 73
However, the resultant df actually add NaN to new column. What am I doing wrong?

IIUC,
use transform instead of apply
df['new'] = df.groupby(level=0)['height'].transform(operation)
Output:
height time new
class weight
A 45 150 85 150160155
50 160 80 150160155
55 155 74 150160155
B 78 180 90 180158155
51 158 65 180158155
40 155 68 180158155
C 80 185 90 185175162
86 175 81 185175162
52 162 73 185175162
OR
df['new'] = df.groupby(level=0)['height'].transform(operation).drop_duplicates()
Output:
height time new
class weight
A 45 150 85 150160155
50 160 80 NaN
55 155 74 NaN
B 78 180 90 180158155
51 158 65 NaN
40 155 68 NaN
C 80 185 90 185175162
86 175 81 NaN
52 162 73 NaN

#Concat height in each class, put it in a dict and map it back to the column class
df['new']=df['class'].map(df.groupby('class').height.apply(lambda x: x.astype(str).str.cat()).to_dict())
#Select duplicated(keep=first), invert in np.where clause to null what you need done so
df['new']=np.where(~df['new'].duplicated(keep='first'),df['new'],'')
print(df)
class weight height time new
0 A 45 150 85 150160155
1 A 50 160 80
2 A 55 155 74
3 B 78 180 90 180158155
4 B 51 158 65
5 B 40 155 68
6 C 80 185 90 185175162
7 C 86 175 81
8 C 52 162 73

Related

How to drop the all the 1's in a correlation matrix

I'm trying to change/eliminate the 1's that run diagonally in a correlation matrix so that when I take the average of the rows of the correlation matrix, the 1s don't affect the mean of each of the rows.
Let's say I have the dataset,
A B C D E F
0 45 100 58 78 80 35
1 49 80 80 104 58 20
2 49 80 65 78 79 20
3 65 100 80 159 83 45
4 65 123 78 115 100 50
5 45 122 84 100 85 20
6 60 120 78 44 105 55
7 62 80 109 48 78 25
8 63 39 85 65 79 25
9 80 52 100 50 103 30
10 80 43 78 64 120 60
11 60 60 130 43 135 45
12 80 50 111 59 115 50
13 82 65 130 63 78 90
14 83 58 85 80 45 80
15 100 64 100 65 30 70
When I do dfcorr = df.corr()
dfcorr, I get
A B C D E F
A 1.000000 0.842125 0.834808 0.832773 0.844158 0.806787
B 0.842125 1.000000 0.847606 0.907595 0.818668 0.863645
C 0.834808 0.847606 1.000000 0.718199 0.804671 0.582033
D 0.832773 0.907595 0.718199 1.000000 0.884236 0.878421
E 0.844158 0.818668 0.804671 0.884236 1.000000 0.718668
F 0.806787 0.863645 0.582033 0.878421 0.718668 1.000000
I want all the 1's to be dropped so that if I want to take the mean of each of the rows, the 1's won't affect them.
If you are working with it as a data frame this will work:
df=pd.DataFrame({'c1':[1, 0, 0.3, 0.4], 'c2':[0.2, 1, 0.6, 0.4], 'c3':[0.1, 0, 1, 0.4], 'c4':[0.7, 0.2, 0.2, 1]} )
df.where(df!=1).mean(axis=1)
This only works correctly if all 1's are on the diagonal.

From Matlab to Python Code [z,index]=sort(abs(z));

i am trying to convert code from matlab to python.
Can you please help me to convert this code from matlab to python?
in matlab code
z is list and z length is 121
z= 7.0502 5.8030 4.4657 3.0404 1.5416 0 -1.5416 -3.0404 -4.4657
-5.8030 -7.0502 7.5944 6.3059 4.8990 3.3662 1.7189 0 -1.7189 -3.3662 -4.8990 -6.3059 -7.5944 8.2427 6.9282 5.4611 3.8122 1.9735 0 -1.9735 -3.8122 -5.4611 -6.9282 -8.2427 9.0135 7.7027 6.2075 4.4590 2.3803 0 -2.3803 -4.4590 -6.2075 -7.7027 -9.0135 9.9185 8.6576 7.2038 5.4466 3.1530 0 -3.1530 -5.4466 -7.2038 -8.6576 -9.9185 10.9545 9.7980 8.4853 6.9282 4.8990 0 -4.8990 -6.9282 -8.4853 -9.7980 -10.9545 12.0986 11.0885 9.9947 8.8128 7.6119 -6.9282 -7.6119 -8.8128 -9.9947 -11.0885 -12.0986 13.3133 12.4632 11.5988 10.7649 10.0829 -9.7980 -10.0829 -10.7649 -11.5988 -12.4632 -13.3133 14.5583 13.8564 13.1842 12.5910 12.1612 -12.0000 -12.1612 -12.5910 -13.1842 -13.8564 -14.5583 15.8011 15.2238 14.6969 14.2594 13.9626 -13.8564 -13.9626 -14.2594 -14.6969 -15.2238 -15.8011 17.0207 16.5431 16.1227 15.7875 15.5684 -15.4919 -15.5684 -15.7875 -16.1227 -16.5431 -17.0207
Matlab code : [z,index]=sort(abs(z));
after the code
z = 0 0 0 0 0 0 1.5416 1.5416 1.7189 1.7189 1.9735 1.9735 2.3803 2.3803 3.0404 3.0404 3.1530 3.1530 3.3662 3.3662 3.8122 3.8122 4.4590 4.4590 4.4657 4.4657 4.8990 4.8990 4.8990 4.8990 5.4466 5.4466 5.4611 5.4611 5.8030 5.8030 6.2075 6.2075 6.3059 6.3059 6.9282 6.9282 6.9282 6.9282 6.9282 7.0502 7.0502 7.2038 7.2038 7.5944 7.5944 7.6119 7.6119 7.7027 7.7027 8.2427 8.2427 8.4853 8.4853 8.6576 8.6576 8.8128 8.8128 9.0135 9.0135 9.7980 9.7980 9.7980 9.9185 9.9185 9.9947 9.9947 10.0829 10.0829 10.7649 10.7649 10.9545 10.9545 11.0885 11.0885 11.5988 11.5988 12.0000 12.0986 12.0986 12.1612 12.1612 12.4632 12.4632 12.5910
12.5910 13.1842 13.1842 13.3133 13.3133 13.8564 13.8564 13.8564 13.9626 13.9626 14.2594 14.2594 14.5583 14.5583 14.6969 14.6969 15.2238 15.2238 15.4919 15.5684 15.5684 15.7875 15.7875 15.8011 15.8011 16.1227 16.1227 16.5431 16.5431 17.0207 17.0207
and index is
index = 6 17 28 39 50 61 5 7 16 18 27 29 38 40 4 8 49 51 15 19 26 30 37 41 3 9 14 20 60 62 48 52 25 31 2 10 36 42 13 21 24 32 59 63 72 1 11 47 53 12 22 71 73 35 43 23 33 58 64 46 54 70 74 34 44 57 65 83 45 55 69 75 82 84 81 85 56 66 68 76 80 86 94 67 77 93 95 79 87 92 96 91 97 78 88 90 98 105 104 106 103 107 89 99 102 108 101 109 116 115 117 114 118 100 110 113 119 112 120 111 121
so what is the [z,index] in python ?
Do you need to return the index? If you don't, you could use:
z = abs(z)
new_list = sorted(map(abs, z))
index = sorted(range(len(z)), key=lambda k: z[k])
where x is the output and z is the list.
EDIT:
Try that now

Grayscale image array rotation for data augmentation

I am trying to rotate some images for data augmentation to train a network for image segmentation task. After searching a lot, the best candidate for rotating each image and its corresponding mask was to use the scipy.ndimage.rotate function, but the problem with this is that after rotating the mask image numpy array ( which includes only 0 and 255 values for pixel values) the rotated mask has got all the values from 0 to 255 while I expect the mask array to have only 0 and 255 as its pixel values.
Here is the code:
from scipy.ndimage import rotate
import numpy as np
ample = dataset[1]
print(np.unique(sample['image']))
print(np.unique(sample['mask']))
print(sample['image'].shape)
print(sample['mask'].shape)
rot_image = rotate(sample['image'], 60, reshape = False)
rot_mask = rotate(sample['mask'], 60, reshape = False)
print(np.unique(rot_image))
print(np.unique(rot_mask))
print(rot_image.shape)
print(rot_mask.shape)
Here are the results:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 114 115 118 119 120 121 125 139]
[ 0 255]
(512, 512, 1)
(512, 512, 1)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 115 117 118 124 125 132 135]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 18 20
24 25 26 28 31 34 35 38 39 41 42 43 45 46 48 49 50 51
52 58 59 62 66 67 68 73 75 76 79 80 82 85 86 88 90 96
98 101 108 109 111 114 116 118 119 123 124 125 127 128 130 138 140 142
146 148 151 156 157 158 161 164 165 166 168 169 176 180 184 185 188 189
194 196 197 198 199 201 203 204 205 207 208 210 211 213 216 217 218 219
220 221 222 225 228 229 230 231 233 234 235 237 239 240 241 242 243 244
245 246 247 248 249 250 251 252 253 254 255]
(512, 512, 1)
(512, 512, 1)
It seems to be a simple problem to rotate image array, but I'm searching for days and I didn't find any solution to this problem. I am really confused how to prevent mask array values( 0 and 255) to take all values from 0 to 255 after rotation. I mean something like this:
x = np.unique(sample['mask'])
rot_mask = rotate(sample['mask'], 30, reshape = False)
x_rot = np.unique(rot_mask)
print(np.unique(x - x_rot))
[ 0]
Since you are using numpy arrays to represent images, why not using numpy functions? This library has all sorts of array manipulations. Try the rot90 function

Conditional summing of columns in pandas

I have the following database in Pandas:
Student-ID Last-name First-name HW1 HW2 HW3 HW4 HW5 M1 M2 Final
59118211 Alf Brian 96 90 88 93 96 78 60 59.0
59260567 Anderson Jill 73 83 96 80 84 80 52 42.5
59402923 Archangel Michael 99 80 60 94 98 41 56 0.0
59545279 Astor John 93 88 97 100 55 53 53 88.9
59687635 Attach Zach 69 75 61 65 91 90 63 69.0
I want to add only those columns which have "HW" in them. Any suggestions on how I can do that?
Note: The number of columns containing HW may differ. So I can't reference them directly.
You could all df.filter(regex='HW') to return column names like 'HW' and then apply sum row-wise via sum(axis-1)
In [23]: df
Out[23]:
StudentID Lastname Firstname HW1 HW2 HW3 HW4 HW5 HW6 HW7 M1
0 59118211 Alf Brian 96 90 88 93 96 97 88 10
1 59260567 Anderson Jill 73 83 96 80 84 99 80 100
2 59402923 Archangel Michael 99 80 60 94 98 73 97 50
3 59545279 Astor John 93 88 97 100 55 96 86 60
4 59687635 Attach Zach 69 75 61 65 91 89 82 55
5 59829991 Bake Jake 56 0 77 78 0 79 0 10
In [24]: df.filter(regex='HW').sum(axis=1)
Out[24]:
0 648
1 595
2 601
3 615
4 532
5 290
dtype: int64
John's solution - using df.filter() - is more elegant, but you could also consider a list comprehension ...
df[[x for x in df.columns if 'HW' in x]].sum(axis=1)

Python Nested Loop For Sequence of Several Rowes

How do i use a loop nested within a loop to create this output:
100
101 102
103 104 105
106 107 108 109
You can do this using a while loop with certain other variables:
>>> st, end, length = 100, 110, 1
>>> while st < end:
... print (" ".join(map(lambda x: "%3d" % x, range(st,st+length))))
... st += length
... length += 1
...
100
101 102
103 104 105
106 107 108 109
Note that I have used a lambda instead of str in the map, this is so that different width numbers don't break up the indentation.
You can similarly base this on the width of the final line as below (in which case there is no need to keep track of the end variable I was doing above):
>>> st, width = 1, 1
>>> while width <= 15:
... print (" ".join(map(lambda x: "%3d" % x, range(st, st + width))))
... st += width
... width += 1
...
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45
46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77 78
79 80 81 82 83 84 85 86 87 88 89 90 91
92 93 94 95 96 97 98 99 100 101 102 103 104 105
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
Something like this???
start = 100
lines = 5
for i in range(lines):
stop = start + i + 1
for j in range(a, stop):
print j,
start = stop
print
Just use two-level for loop:
x = 100
for i in range(1, 5):
for j in range(i):
print(x, end=" ")
x += 1
print("")

Categories

Resources