Can't reshape to right shape size - python

My original dataset is 7049 images(96x96) with following format:
train_x.shape= (7049,)
train_x[:3]
0 238 236 237 238 240 240 239 241 241 243 240 23...
1 219 215 204 196 204 211 212 200 180 168 178 19...
2 144 142 159 180 188 188 184 180 167 132 84 59 ...
Name: Image, dtype: object
I want to split image-string into 96x96 and get the (7049,96,96) array.
I try this method:
def split_reshape(row):
return np.array(row.split(' ')).reshape(96,96)
result = train_x.apply(split_reshape)
Then I still got result.shape=(7049,)
How to reshape to (7049,96,96) ?

Demo:
Source Series:
In [129]: train_X
Out[129]:
0 238 236 237 238 240 240 239 241 241
1 219 215 204 196 204 211 212 200 180
2 144 142 159 180 188 188 184 180 167
Name: 1, dtype: object
In [130]: type(train_X)
Out[130]: pandas.core.series.Series
In [131]: train_X.shape
Out[131]: (3,)
Solution:
In [132]: X = train_X.str \
.split(expand=True) \
.astype(np.int16) \
.values.reshape(len(train_X), 3, 3)
In [133]: X
Out[133]:
array([[[238, 236, 237],
[238, 240, 240],
[239, 241, 241]],
[[219, 215, 204],
[196, 204, 211],
[212, 200, 180]],
[[144, 142, 159],
[180, 188, 188],
[184, 180, 167]]], dtype=int16)
In [134]: X.shape
Out[134]: (3, 3, 3)

Related

How to convert image Run length Encoded Pixels mask to binary mask and reshape in python?

I have file having EncodedPixels mask of different size
1: I want to convert these EncodedPixels in binary and resize all into 1024 and then again convert in to EncodedPixels.
Explanation:
In file there is image-Mask in Encoded Pixels form, and images have different dimensions (5000x5000, 260x260 etc) So I resize all images in to 1024x1024, Now I want to resize each image-mask according to image 1024x1024.
I my mind there is only one possible solution (might be more available) to resize mask is first we need to convert run length encoding pixel in to binary and then we are able to resize mask easily.
File Link: link here
This code will use to resize binary mask.
from PIL import Image
import numpy as np
pil_image = Image.fromarray(binary_mask)
pil_image = pil_image.resize((new_width, new_height), Image.NEAREST)
resized_binary_mask = np.asarray(pil_image)
Encoded Pixels Example
['6068157 7 6073371 20 6078584 34 6083797 48 6089010 62 6094223 72 6099436 76 6104649 80
6109862 85 6115075 89 6120288 93 6125501 98 6130714 102 6135927 106 6141140 111 6146354 114 6151567 118 6156780 123 6161993 127 6167206 131 6172419 136 6177632 140 6182845 144 6188058 149 6193271 153 6198484 157 6203697 162 6208910 166 6214124 169 6219337 174 6224550 178 6229763 182 6234976 187 6240189 191 6245402 195 6250615 200 6255828 204 6261041 208 6266254 213 6271467 218 6276680 224 6281893 229 6287107 233 6292320 238 6297533 244 6302746 249 6307959 254 6313172 259 6318385 265 6323598 270 6328811 275 6334024 280 6339237 286 6344450 291 6349663 296 6354877 300 6360090 306 6365303 311 6370516 316 6375729 322 6380942 327 6386155 332 6391368 337 6396581 343 6401794 348 6407007 353 6412220 358 6417433 364 6422647 368 6427860 373 6433073 378 6438286 384 6443499 389 6448712 394 6453925 399 6459138 405 6464351 410 6469564 415 6474777 420 6479990 426 17204187 78 17208797 227 17209412 56 17214025 203 17214637 34 17219253 179 17219862 11 17224481 155 17229709 131 17234937 107 17240165 83 17245393 60 17250621 36 17255849 12']

Extract left and right limit from a Series of pandas Intervals

I want to get interval margins of a column with pandas intervals and write them in columns 'left', 'right'. Iterrows does not work (documentation says it would not be use for writing data) and, anyway it would not be the better solution.
import pandas as pd
i1 = pd.Interval(left=85, right=94)
i2 = pd.Interval(left=95, right=104)
i3 = pd.Interval(left=105, right=114)
i4 = pd.Interval(left=115, right=124)
i5 = pd.Interval(left=125, right=134)
i6 = pd.Interval(left=135, right=144)
i7 = pd.Interval(left=145, right=154)
i8 = pd.Interval(left=155, right=164)
i9 = pd.Interval(left=165, right=174)
data = pd.DataFrame(
{
"intervals":[i1,i2,i3,i4,i5,i6,i7,i8,i9],
"left" :[0,0,0,0,0,0,0,0,0],
"right" :[0,0,0,0,0,0,0,0,0]
},
index=[0,1,2,3,4,5,6,7,8]
)
#this is not working (has no effect):
for index, row in data.iterrows():
print(row.intervals.left, row.intervals.right)
row.left = row.intervals.left
row.right = row.intervals.right
How can we do something like:
data['left']=data['intervals'].left
data['right']=data['intervals'].right
Thanks!
Create an pandas.IntervalIndex from your intervals. You can then access the .left and .right attributes.
import pandas as pd
idx = pd.IntervalIndex([i1, i2, i3, i4, i5, i6, i7, i8, i9])
pd.DataFrame({'intervals': idx, 'left': idx.left, 'right': idx.right})
intervals left right
0 (85, 94] 85 94
1 (95, 104] 95 104
2 (105, 114] 105 114
3 (115, 124] 115 124
4 (125, 134] 125 134
5 (135, 144] 135 144
6 (145, 154] 145 154
7 (155, 164] 155 164
8 (165, 174] 165 174
Another option is using map and operator.attrgetter (look ma, no lambda...):
from operator import attrgetter
df['left'] = df['intervals'].map(attrgetter('left'))
df['right'] = df['intervals'].map(attrgetter('right'))
df
intervals left right
0 (85, 94] 85 94
1 (95, 104] 95 104
2 (105, 114] 105 114
3 (115, 124] 115 124
4 (125, 134] 125 134
5 (135, 144] 135 144
6 (145, 154] 145 154
7 (155, 164] 155 164
8 (165, 174] 165 174
A pandas.arrays.IntervalArray, is the preferred way for storing interval data in Series-like structures.
For #coldspeed's first example, IntervalArray is basically a drop in replacement:
In [2]: pd.__version__
Out[2]: '1.1.3'
In [3]: ia = pd.arrays.IntervalArray([i1, i2, i3, i4, i5, i6, i7, i8, i9])
In [4]: df = pd.DataFrame({'intervals': ia, 'left': ia.left, 'right': ia.right})
In [5]: df
Out[5]:
intervals left right
0 (85, 94] 85 94
1 (95, 104] 95 104
2 (105, 114] 105 114
3 (115, 124] 115 124
4 (125, 134] 125 134
5 (135, 144] 135 144
6 (145, 154] 145 154
7 (155, 164] 155 164
8 (165, 174] 165 174
If you already have interval data in a Series or DataFrame, #coldspeed's second example becomes a bit more simple by accessing the array attribute:
In [6]: df = pd.DataFrame({'intervals': ia})
In [7]: df['left'] = df['intervals'].array.left
In [8]: df['right'] = df['intervals'].array.right
In [9]: df
Out[9]:
intervals left right
0 (85, 94] 85 94
1 (95, 104] 95 104
2 (105, 114] 105 114
3 (115, 124] 115 124
4 (125, 134] 125 134
5 (135, 144] 135 144
6 (145, 154] 145 154
7 (155, 164] 155 164
8 (165, 174] 165 174
A simple way is to use apply() method:
data['left'] = data['intervals'].apply(lambda x: x.left)
data['right'] = data['intervals'].apply(lambda x: x.right)
data
intervals left right
0 (85, 94] 85 94
1 (95, 104] 95 104
...
8 (165, 174] 165 174

test_train_split converts string type label to np.array. Is there any way to get back the original label name?

I have an image dataset with a string type label name. When I split the data using test_train_split of sklearn library, it converts the label to np.array type. Is there a way to get back the original string type label name?
The below code splits a data to train and test:
imgs, y = load_images()
train_img,ytrain_img,test_img,ytest_img = train_test_split(imgs,y, test_size=0.2, random_state=1)
If I print y, it gives me the label name but if I print the splitted label value it give an array:
for k in y:
print(k)
break
for k in ytrain_img:
print(k)
break
Output:
001.Affenpinscher
[[[ 97 180 165]
[ 93 174 159]
[ 91 169 152]
...
[[ 88 171 156]
[ 88 170 152]
[ 84 162 145]
...
[130 209 222]
[142 220 233]
[152 230 243]]
[[ 99 181 163]
[ 98 178 161]
[ 92 167 151]
...
[130 212 224]
[137 216 229]
[143 222 235]]
...
[[ 85 147 158]
[ 85 147 158]
[111 173 184]
...
[227 237 244]
[236 248 250]
[234 248 247]]
[[ 94 154 166]
[ 96 156 168]
[133 194 204]
...
[226 238 244]
[237 249 253]
[237 252 254]]
...
[228 240 246]
[238 252 255]
[241 255 255]]]
Is there a way to convert back the array to the original label name?
No, you are inferring the output of train_test_split wrong.
train_test_split works in this way:
A_train, A_test, B_train, B_test, C_train, C_test ...
= train_test_split(A, B, C ..., test_size=0.2)
You can give as many arrays to split. For each given array, it will provide the train and test split first, then do the same for next array, then third array and so on..
So in your case actually it is:
train_img, test_img, ytrain_img, ytest_img = train_test_split(imgs, y,
test_size=0.2,
random_state=1)
But you are then mixing up the names of the output and using them wrong.

How to convert from String to List/Array .?

I have 3 strings
a=38 186 298 345 0.93345
27 198 277 389 0.86006
33 127 293 354 0.89782
Type(a)
len(a) shows it as 22 including splace between 2 numbers
Want to convert them to list
Need as below
b=[[38 186 298 345 0.93345][27 198 277 389 0.86006][33 127 293 354 0.89782]]
Is this what you aim for:
a = '''38 186 298 345 0.93345
27 198 277 389 0.86006
33 127 293 354 0.89782'''
b = [line.split() for line in a.split('\n')]
b
#[['38', '186', '298', '345', '0.93345'],
# ['27', '198', '277', '389', '0.86006'],
# ['33', '127', '293', '354', '0.89782']]
Split them by newline and spaces. Use this python function.
string_name.split(str="")
For more info: https://www.tutorialspoint.com/python/string_split.htm
This is one solution specific to your data.
Note your inputs are not valid Python, I have resolved that below.
a1 = '38 186 298 345 0.93345'
a2 = '27 198 277 389 0.86006'
a3 = '33 127 293 354 0.89782'
res = [[float(j) if float(j) < 1 else int(j) for j in i.split()] \
for i in [a1, a2, a3]]
# [[38, 186, 298, 345, 0.93345],
# [27, 198, 277, 389, 0.86006],
# [33, 127, 293, 354, 0.89782]]

Separate specific value in a dataframe

I have a large dataset. I am trying to read it with Pandas Dataframe. I want to separate some values from one of the columns. Assuming the name of column is "A", there are values ranging from 90 to 300. I want to separate any values between 270 to 280. I did try below code but it is wrong!
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('....csv')
df2 = df[ 270 < df['A'] < 280]
Use between with boolean indexing:
df = pd.DataFrame({'A':range(90,300)})
df2 = df[df['A'].between(270,280, inclusive=False)]
print (df2)
A
181 271
182 272
183 273
184 274
185 275
186 276
187 277
188 278
189 279
Or:
df2 = df[(df['A'] > 270) & (df['A'] < 280)]
print (df2)
A
181 271
182 272
183 273
184 274
185 275
186 276
187 277
188 278
189 279
Using numpy to speed things up and reconstruct a new dataframe.
Assuming we use jezrael's sample data
a = df.A.values
m = (a > 270) & (a < 280)
pd.DataFrame(a[m], df.index[m], df.columns)
A
181 271
182 272
183 273
184 274
185 275
186 276
187 277
188 278
189 279
You can also use query() method:
df2 = df.query("270 < A < 280")
Demo:
In [40]: df = pd.DataFrame({'A':range(90,300)})
In [41]: df.query("270 < A < 280")
Out[41]:
A
181 271
182 272
183 273
184 274
185 275
186 276
187 277
188 278
189 279

Categories

Resources