re.findall seems to have side effects - python

I'm rather new to python. Interested in the imaging library (pillow). I wrote a program that collects data from images then writes it to a file. When retrieving the data from the file I use regular expressions (re.findall()) to find, and separate the data, then a print statement, to see the data in the terminal (Linux).
The part that creates and writes to the file is;
fout = open('pic_data', 'a')
fout.write(' '.join(map(str, FN1)))
fout.write('\n')
fout.write(' '.join(map(str, X1)))
fout.write('\n')
fout.close()
The data file contains:
/img-col/new-one/pic1_picture-file_1280.jpg
19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81
/img-col/new-one/pic2_picture-file_500.jpg
108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27
/img-col/new-one/pic3_picture-file_500.jpg
169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144
/img-col/new-one/pic4_picture-file_500.jpg
108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27
/img-col/new-one/pic5_picture-file_1280.jpg
19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81
The code that opens the file and prints it to the terminal is:
import re
from numpy import *
file = open('pic_data', 'r')
for line in file.readlines():
line = line.strip('\n')
#print line, type(line)
a1 = re.findall('^\/.*\.jpg', line, flags=0)
print a1, 'number 1 - a1'
a2 = re.findall('^\d.*\d$', line, flags=0)
print a2, 'number 2 - a2'
x = array(a2)
print x, 'number 3 - x'
This is where the misbehavior happens, when printing to the terminal there are empty holes in the data. This is what the output looks like:
grumpy#grumpy-desktop ~ $ python /home/grumpy/z-working-stuff/py-2.py
['/img-col/new-one/pic1_picture-file_1280.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 2 - a2
[ '19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 3 - x
['/img-col/new-one//img-col/new-one/pic2_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 2 - a2
[ '108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 3 - x
['/img-col/new-one//img-col/new-one/pic3_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144'] number 2 - a2
[ '169 126 91 37 24 18 170 115 74 158 117 87 196 130 96 201 136 104 207 135 110 200 136 101 160 110 61 188 121 92 196 174 153 176 158 144'] number 3 - x
['/img-col/new-one//img-col/new-one/pic4_picture-file_500.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 2 - a2
[ '108 74 64 51 40 38 152 116 102 155 121 109 165 133 120 198 171 160 255 255 253 119 85 73 26 28 23 26 26 24 90 67 61 28 30 27'] number 3 - x
['/img-col/new-one//img-col/new-one/pic5_picture-file_1280.jpg'] number 1 - a1
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
['19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 2 - a2
[ '19 16 7 197 161 127 38 28 18 180 119 90 202 124 102 215 151 116 255 235 208 252 216 192 244 208 174 84 36 26 193 158 126 170 118 81'] number 3 - x
grumpy#grumpy-desktop ~ $
Technically all the data is present, but with holes - the empty square brackets, these should not be empty.
[] number 2 - a2
[] number 3 - x
[] number 1 - a1
The 'number # - X' is for debugging, let me know what should be there, etc.
The problem is I am running a coef (coef = corrcoef(x,y)) on the data and the module chocks on empty data sets. Where are these empty matches coming from, and how can I avoid or ignore them?

Related

Compute average on pandas dataframe with based on another column criteria

I have the following dataframe:
CodigoCliente
Ene-2021
Feb-2021
Mar-2021
Abr-2021
May-2021
Jun-2021
Jul-2021
Ago-2021
Sep-2021
Oct-2021
Nov-2021
Diciembre-2021
Ceros
Promedio
GM
LM
IC
PromP
Cliente1
166
155
166
172
153
162
127
141
158
163
148
147
0
154.83
7
5
G
Cliente2
180
147
139
133
138
145
136
149
131
139
107
116
0
138.33
6
6
L
Cliente3
57
263
142
146
152
179
159
150
120
164
160
149
0
153.42
5
7
L
Cliente4
152
70
103
145
130
140
143
123
125
86
65
68
0
112.5
7
5
G
Cliente5
82
70
49
58
45
70
50
51
58
48
55
67
0
58.58
4
8
L
I compute the average of the data and columns GM and LM store the number of months that are greater and lower than average, respectively. In the column IC I have a G if GM cell is greater than LM or L if is lower. With this information, I want to fill the column PromP based on IC column. If this cell has a G compute an average only with the months that are greater than average store in Promedio column and if this cell has a L the average is computed only with the months that are lower than average.
The desired result is
CodigoCliente
Ene-2021
Feb-2021
Mar-2021
Abr-2021
May-2021
Jun-2021
Jul-2021
Ago-2021
Sep-2021
Oct-2021
Nov-2021
Diciembre-2021
Ceros
Promedio
GM
LM
IC
PromP
Cliente1
166
155
166
172
153
162
127
141
158
163
148
147
0
154.83
7
5
G
163.14
Cliente2
180
147
139
133
138
145
136
149
131
139
107
116
0
138.33
6
6
L
149.83
Cliente3
57
263
142
146
152
179
159
150
120
164
160
149
0
153.42
5
7
L
130.85
Cliente4
152
70
103
145
130
140
143
123
125
86
65
68
0
112.5
7
5
G
136.86
Cliente5
82
70
49
58
45
70
50
51
58
48
55
67
0
58.58
4
8
L
51.75
I solved the problem with nested for loops because I confused with where method of pandas. But is too slow. Does anyone have some suggestion about use pandas instructions only?
Use apply!
Write your function and apply your function. You might need lambda for that.
Look here:
https://www.statology.org/pandas-apply-lambda/

Select pandas dataframe where row and column== 0 to F

I have a dataframe A of index and column labelled 0 to F (0-15) in hex.
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 99 124 119 123 242 107 111 197 48 1 103 43 254 215 171 118
1 202 130 201 125 250 89 71 240 173 212 162 175 156 164 114 192
2 183 253 147 38 54 63 247 204 52 165 229 241 113 216 49 21
3 4 199 35 195 24 150 5 154 7 18 128 226 235 39 178 117
4 9 131 44 26 27 110 90 160 82 59 214 179 41 227 47 132
5 83 209 0 237 32 252 177 91 106 203 190 57 74 76 88 207
6 208 239 170 251 67 77 51 133 69 249 2 127 80 60 159 168
7 81 163 64 143 146 157 56 245 188 182 218 33 16 255 243 210
8 205 12 19 236 95 151 68 23 196 167 126 61 100 93 25 115
9 96 129 79 220 34 42 144 136 70 238 184 20 222 94 11 219
A 224 50 58 10 73 6 36 92 194 211 172 98 145 149 228 121
B 231 200 55 109 141 213 78 169 108 86 244 234 101 122 174 8
C 186 120 37 46 28 166 180 198 232 221 116 31 75 189 139 138
D 112 62 181 102 72 3 246 14 97 53 87 185 134 193 29 158
E 225 248 152 17 105 217 142 148 155 30 135 233 206 85 40 223
F 140 161 137 13 191 230 66 104 65 153 45 15 176 84 187 22
I did dataframe A by this
df_sbox=pd.DataFrame(from_a_2d_nparray)
df_sbox.index = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F']
df_sbox.columns = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'A', 'B', 'C', 'D', 'E', 'F']
I want to select A where index == 0 - F and column == 0 -F and assign it to a 2D matrix.
What can i use for selecting A where "index == 0 - F and column == 0 -F" in 1 statement?
You can use hex with pandas.DataFrame.loc:
num1 = 10 #row 'A' in hex
num2 = 3 #column 3
df.loc[hex(num1)[2:].upper(), hex(num2)[2:].upper()]
#10
Explanation
You can use python built-in function hex to get the hex representation of an integer:
hex(12)
#0xc
Since we are not interested in the first two characters, we can omit them slicing the str:
hex(12)[2:] #from index 2 onwards
#c
Since the dataframe uses uppercase for its indices and columns, we can use str.upper to match them:
hex(12)[2:].upper()
#'C'
Additional
You can also get the upper-case hex representation using the Standard Format Specifiers:
"{:X}".format(43)
#2B

Values in pandas dataframe not getting sorted

I have a dataframe as shown below:
Category 1 2 3 4 5 6 7 8 9 10 11 12 13
A 424 377 161 133 2 81 141 169 297 153 53 50 197
B 231 121 111 106 4 79 68 70 92 93 71 65 66
C 480 379 159 139 2 116 148 175 308 150 98 82 195
D 88 56 38 40 0 25 24 55 84 36 24 26 36
E 1084 1002 478 299 7 256 342 342 695 378 175 132 465
F 497 246 283 206 4 142 151 168 297 224 194 198 148
H 8 5 4 3 0 2 3 2 7 5 3 2 0
G 3191 2119 1656 856 50 826 955 739 1447 1342 975 628 1277
K 58 26 27 51 1 18 22 42 47 35 19 20 14
S 363 254 131 105 6 82 86 121 196 98 81 57 125
T 54 59 20 4 0 9 12 7 36 23 5 4 20
O 554 304 207 155 3 130 260 183 287 204 98 106 195
P 756 497 325 230 5 212 300 280 448 270 201 140 313
PP 64 43 26 17 1 15 35 17 32 28 18 9 27
R 265 157 109 89 1 68 68 104 154 96 63 55 90
S 377 204 201 114 5 112 267 136 209 172 147 90 157
St 770 443 405 234 5 172 464 232 367 270 290 136 294
Qs 47 33 11 14 0 18 14 19 26 17 5 6 13
Y 1806 626 1102 1177 14 625 619 1079 1273 981 845 891 455
W 123 177 27 28 0 18 62 34 64 27 14 4 51
Z 2770 1375 1579 1082 17 900 1630 1137 1465 1383 861 755 1201
I want to sort the dataframe by values in each row. Once done, I want to sort the index also.
For example the values in first row corresponding to category A, should appear as:
2 50 53 81 133 141 153 161 169 197 297 377 424
I have tried df.sort_values(by=df.index.tolist(), ascending=False, axis=1) but this doesn't work. The values don't appear in sorted order at all
np.sort + sort_index
You can use np.sort along axis=1, then sort_index:
cols, idx = df.columns[1:], df.iloc[:, 0]
res = pd.DataFrame(np.sort(df.iloc[:, 1:].values, axis=1), columns=cols, index=idx)\
.sort_index()
print(res)
1 2 3 4 5 6 7 8 9 10 11 12 \
Category
A 2 50 53 81 133 141 153 161 169 197 297 377
B 4 65 66 68 70 71 79 92 93 106 111 121
C 2 82 98 116 139 148 150 159 175 195 308 379
D 0 24 24 25 26 36 36 38 40 55 56 84
E 7 132 175 256 299 342 342 378 465 478 695 1002
F 4 142 148 151 168 194 198 206 224 246 283 297
G 50 628 739 826 856 955 975 1277 1342 1447 1656 2119
H 0 0 2 2 2 3 3 3 4 5 5 7
K 1 14 18 19 20 22 26 27 35 42 47 51
O 3 98 106 130 155 183 195 204 207 260 287 304
P 5 140 201 212 230 270 280 300 313 325 448 497
PP 1 9 15 17 17 18 26 27 28 32 35 43
Qs 0 5 6 11 13 14 14 17 18 19 26 33
R 1 55 63 68 68 89 90 96 104 109 154 157
S 6 57 81 82 86 98 105 121 125 131 196 254
S 5 90 112 114 136 147 157 172 201 204 209 267
St 5 136 172 232 234 270 290 294 367 405 443 464
T 0 4 4 5 7 9 12 20 20 23 36 54
W 0 4 14 18 27 27 28 34 51 62 64 123
Y 14 455 619 625 626 845 891 981 1079 1102 1177 1273
Z 1 17 755 861 900 1082 1137 1375 1383 1465 1579 1630
One way is to apply sorted setting 1 as axis, applying pd.Series to return a dataframe instead of a list, and finally sorting by Category:
df.loc[:,'1':].apply(sorted, axis = 1).apply(pd.Series)
.set_index(df.Category).sort_index()
Category 0 1 2 3 4 5 6 7 8 9 10 ...
0 A 2 50 53 81 133 141 153 161 169 197 297 ...
1 B 4 65 66 68 70 71 79 92 93 106 111 ...

Grayscale image array rotation for data augmentation

I am trying to rotate some images for data augmentation to train a network for image segmentation task. After searching a lot, the best candidate for rotating each image and its corresponding mask was to use the scipy.ndimage.rotate function, but the problem with this is that after rotating the mask image numpy array ( which includes only 0 and 255 values for pixel values) the rotated mask has got all the values from 0 to 255 while I expect the mask array to have only 0 and 255 as its pixel values.
Here is the code:
from scipy.ndimage import rotate
import numpy as np
ample = dataset[1]
print(np.unique(sample['image']))
print(np.unique(sample['mask']))
print(sample['image'].shape)
print(sample['mask'].shape)
rot_image = rotate(sample['image'], 60, reshape = False)
rot_mask = rotate(sample['mask'], 60, reshape = False)
print(np.unique(rot_image))
print(np.unique(rot_mask))
print(rot_image.shape)
print(rot_mask.shape)
Here are the results:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 114 115 118 119 120 121 125 139]
[ 0 255]
(512, 512, 1)
(512, 512, 1)
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 115 117 118 124 125 132 135]
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 17 18 20
24 25 26 28 31 34 35 38 39 41 42 43 45 46 48 49 50 51
52 58 59 62 66 67 68 73 75 76 79 80 82 85 86 88 90 96
98 101 108 109 111 114 116 118 119 123 124 125 127 128 130 138 140 142
146 148 151 156 157 158 161 164 165 166 168 169 176 180 184 185 188 189
194 196 197 198 199 201 203 204 205 207 208 210 211 213 216 217 218 219
220 221 222 225 228 229 230 231 233 234 235 237 239 240 241 242 243 244
245 246 247 248 249 250 251 252 253 254 255]
(512, 512, 1)
(512, 512, 1)
It seems to be a simple problem to rotate image array, but I'm searching for days and I didn't find any solution to this problem. I am really confused how to prevent mask array values( 0 and 255) to take all values from 0 to 255 after rotation. I mean something like this:
x = np.unique(sample['mask'])
rot_mask = rotate(sample['mask'], 30, reshape = False)
x_rot = np.unique(rot_mask)
print(np.unique(x - x_rot))
[ 0]
Since you are using numpy arrays to represent images, why not using numpy functions? This library has all sorts of array manipulations. Try the rot90 function

Numpy 1d array subtraction - not getting expected result.

I have 2 arrays of shape (128,). I want the elementwise difference between them.
for idx, x in enumerate(test):
if idx == 0:
print (test[idx])
print()
print(library[idx])
print()
print(np.abs(np.subtract(library[idx],test[idx])))
output:
[186 3 172 80 187 120 127 172 96 213 103 107 137 119 33 53 54 113
200 78 140 234 77 94 151 64 199 218 170 73 152 73 0 5 121 42
0 106 166 80 115 220 56 66 194 187 51 132 55 73 150 83 91 204
108 58 183 0 32 240 255 55 151 255 189 153 77 89 42 176 204 170
93 117 194 195 59 204 149 55 111 255 218 48 72 171 122 163 255 155
198 179 69 173 108 0 0 176 249 214 193 255 106 116 0 47 255 255
255 255 210 175 67 0 95 120 21 158 0 72 120 255 121 208 255 0
61 255]
[189 0 178 72 177 124 123 167 81 235 110 123 139 107 39 54 34 102
195 59 156 255 66 112 161 65 180 236 181 69 142 82 0 0 152 38
0 102 146 86 117 230 59 77 220 182 44 121 63 59 146 41 92 213
146 70 184 0 0 255 255 42 165 255 245 152 114 88 63 138 255 158
96 141 221 201 47 191 179 42 156 255 237 7 136 168 133 142 254 164
236 250 56 202 141 0 0 197 255 184 212 255 108 133 0 7 255 255
255 255 243 197 74 0 50 143 24 175 0 74 101 255 121 207 255 0
146 255]
[ 3 253 6 248 246 4 252 251 241 22 7 16 2 244 6 1 236 245
251 237 16 21 245 18 10 1 237 18 11 252 246 9 0 251 31 252
0 252 236 6 2 10 3 11 26 251 249 245 8 242 252 214 1 9
38 12 1 0 224 15 0 243 14 0 56 255 37 255 21 218 51 244
3 24 27 6 244 243 30 243 45 0 19 215 64 253 11 235 255 9
38 71 243 29 33 0 0 21 6 226 19 0 2 17 0 216 0 0
0 0 33 22 7 0 211 23 3 17 0 2 237 0 0 255 0 0
85 0]
So it reads, the last array printed out is the difference between the first two arrays.
189 - 186 is 3
3 - 0 is 3 (not 253)
I must be missing something trivial.
I'd rather not zip and subtract the values as I have a ton of data.
​
Your arrays probably have dtype uint8; they cannot hold values outside the interval [0, 256), and subtracting 3 from 0 wraps around to 253. The absolute value of 253 is still 253.
Use a different dtype, or restructure your computation to avoid hitting the limits of the dtype you're using.
You can just simply subtract two numpy arrays like this, it is element-wise operation:
>test = np.array([1,2,3])
>library = np.array([1,1,1])
>np.abs(library - test)
array([0, 1, 2])

Categories

Resources