convert a numbered list into an order list - python

I have a python list question:
Input:
l=[2, 5, 6, 7, 10, 11, 12, 19, 20, 26, 28, 33, 34, 45, 46, 47, 50, 57, 59, 64, 67, 77, 79, 87, 93, 97, 106, 110, 111, 113, 115, 120, 125, 126, 133, 135, 142, 148, 160, 166, 169, 176, 202, 228, 234, 253, 274, 365, 433, 435, 436, 468, 476, 529, 570, 575, 577, 581, 614, 766, 813, 944, 1058, 1079, 1245, 1363, 1389, 1428, 1758, 2129, 2336, 2402, 2405, 2576, 3013, 3993, 7687, 8142, 8455, 8456]
Now I want to write mark the numbers in a [0]*10000 list, such that the beginning is like:
Output:
lp=[0,1,0,0,1,...]
The second and fifth elements are marked since they appeared in the input.

lp = [0] * 10000
for index in l:
lp[index - 1] = 1

You could use the following list comprehension
lp = [1 if i in l else 0 for i in range(1, 10001)]
Though I'd recommend since l could be long that you convert it to a set first
set_l = set(l)
lp = [1 if i in set_l else 0 for i in range(1, 10001)]

Related

How do I match similar bounding boxes that are in two separate lists?

I have two lists of bounding boxes. One list is the expected location of the bounding boxes, and the second list is the value of the bounding boxes that are returned by an OCR program. The bounding box lists (below) are in the format of [Top, Left, Width, Height]
Expected_Boxes= [[96, 752, 784, 172],
[876, 754, 674, 174],
[1536, 756, 620, 170],
[2146, 754, 318, 176],
[1136, 960, 66, 70],
[1406, 928, 906, 112],
[184, 1076, 60, 56],
[442, 1192, 812, 132],
[1710, 1232, 62, 54],
[2012, 1228, 58, 58],
[176, 1332, 1062, 128],
[1302, 1334, 1128, 126],
[128, 1526, 950, 106],
[1098, 1532, 402, 98],
[1534, 1538, 450, 88],
[2010, 1512, 434, 110],
[804, 1680, 62, 62],
[992, 1684, 56, 60],
[742, 1816, 62, 60],
[1158, 1814, 64, 60],
[100, 1994, 776, 102],
[910, 1996, 748, 98],
[1728, 1994, 714, 96],
[1728, 1994, 714, 96],
[2218, 2302, 58, 62],
[2072, 2486, 60, 60],
[2218, 2486, 60, 62],
[56, 1430, 336, 66]]
OCR_Boxes = [[793, 1660, 248, 81],
[806, 223, 215, 85],
[812, 1009, 219, 67],
[812, 2248, 86, 53],
[947, 1563, 556, 80],
[970, 1143, 44, 44],
[1080, 188, 46, 46],
[1208, 651, 406, 82],
[1234, 2015, 47, 46],
[1235, 1710, 46, 47],
[1364, 1422, 827, 96],
[1375, 338, 602, 93],
[1536, 1523, 516, 102],
[1550, 2115, 180, 76],
[1562, 429, 648, 70],
[1691, 991, 48, 47],
[1692, 808, 47, 46],
[1822, 1765, 46, 48],
[1823, 1166, 47, 47],
[1824, 746, 46, 45],
[2007, 195, 374, 91],
[2011, 1858, 380, 82],
[2014, 1019, 339, 81],
[2304, 2223, 49, 50],
[2305, 2078, 47, 46],
[2492, 2224, 46, 47],
[2492, 2081, 46, 47],
[2553, 485, 1124, 48],
[2790, 1168, 1269, 210],
[2906, 193, 391, 89]]
As you can tell, the expected list might have more or less than the OCR list, and the values will not be the same. I attempted to solve this by using the following code:
def intersection_over_union(boxA, boxB):
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou
def match_bounding_boxes(image1, image2):
matches = []
for box1 in image1:
best_iou = 0
best_box = None
for box2 in image2:
iou = intersection_over_union(box1, box2)
if iou > best_iou:
best_iou = iou
best_box = box2
matches.append((box1, best_box))
return matches
However, all matches return "None"... meaning something is logically wrong with the code. Can anyone spot it?

Increment values in a python list by 6 until it reaches a condition [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
i'm struggling with this seemingly simple python problem.
i need 90 values in the list that increase by 6 (from 120) until it reaches 240, once it reaches 240 it should decrease by 6 back to 120. this loop would continue until 90 values are reached.
x = [30, 36, 42, 48, 54, 60]
e = [120]
for row in range(90):
if e[row] >= 120 and e[row] != 240:
e.append(e[row] + 6)
print(e[row], "1")
elif e[row] <= 240 and e[row] != 120:
e.append(e[row] - 6)
print(e[row])
the code i have so far doesn't work well. after it reaches 240, it goes down to 236. 236 satisfies the >= 120 and != 240 condition so it just goes back up to 240.
any guidance would be appreciated!
A one-line way to do this by just gluing ranges together would be:
>>> ((list(range(120, 240, 6)) + list(range(240, 120, -6))) * 3)[:90]
[120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240, 234, 228, 222, 216, 210, 204, 198, 192, 186, 180, 174, 168, 162, 156, 150, 144, 138, 132, 126, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240, 234, 228, 222, 216, 210, 204, 198, 192, 186, 180, 174, 168, 162, 156, 150, 144, 138, 132, 126, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174]
To build it in a loop the way you're trying to do, I'd have the delta in another variable and only flip it when you hit one of the edges, like this:
>>> e = [120]
>>> d = 6
>>> for _ in range(89):
... n = e[-1] + d
... if n >= 240 or n <= 120:
... d *= -1
... e.append(n)
...
>>> e
[120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240, 234, 228, 222, 216, 210, 204, 198, 192, 186, 180, 174, 168, 162, 156, 150, 144, 138, 132, 126, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240, 234, 228, 222, 216, 210, 204, 198, 192, 186, 180, 174, 168, 162, 156, 150, 144, 138, 132, 126, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174]
I guess you want something like this:
x = [30, 36, 42, 48, 54, 60]
e = [120]
dir = 1
for row in range(90):
if dir == 1:
if e[row] >= 240:
dir = -1
else:
if e[row] <= 120:
dir = 1
e.append(e[row] + (dir * 6))
print(e[row])
print(f' LENGTH: {len(e)}')
You can use a variable to hold the amount that you're adding, and switch it between 6 and -6. Test whether it's positive or negative to know which end to check for.
e = [120]
increment = 6
for _ in range(90):
e.append(e[-1] + increment)
if increment > 0 and e[-1] == 240:
increment = -6
elif increment < 0 and e[-1] == 120:
increment = 6

Could you please explain this while loop in a simple way?

You need to write a loop that takes the numbers in a given list named num_list:
num_list = [422, 136, 524, 85, 96, 719, 85, 92, 10, 17, 312, 542, 87, 23, 86, 191, 116, 35, 173, 45, 149, 59, 84, 69, 113, 166]
Your code should add up the odd numbers in the list, but only up to the first 5 odd numbers together. If there are more than 5 odd numbers, you should stop at the fifth. If there are fewer than 5 odd numbers, add all of the odd numbers.
num_list = [422, 136, 524, 85, 96, 719, 85, 92, 10, 17, 312, 542, 87, 23, 86, 191, 116, 35, 173, 45, 149, 59, 84, 69, 113, 166]
count_odd = 0
list_sum = 0
i = 0
len_num_list = len(num_list)
while (count_odd < 5) and (i < len_num_list):
if num_list[i] % 2 != 0:
list_sum += num_list[i]
count_odd += 1
i += 1
print ("The numbers of odd numbers added are: {}".format(count_odd))
print ("The sum of the odd numbers added is: {}".format(list_sum))

Produce pandas Series of numpy.arrays from DataFrame in parallel with dask

I've got a pandas DataFrame with a column, containing images as numpy 2D arrays.
I need to have a Series or DataFrame with their histograms, again in a single column, in parallel with dask.
Sample code:
import numpy as np
import pandas as pd
import dask.dataframe as dd
def func(data):
result = np.histogram(data.image.ravel(), bins=128)[0]
return result
n = 10
df = pd.DataFrame({'image': [(np.random.random((60, 24)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
print 'DataFrame\n', df
hists = pd.Series([func(r[1]) for r in df.iterrows()])
# MAX_PROCESSORS = 4
# ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
# hists = ddf.apply(func, axis=1, meta=pd.Series(name='data', dtype=np.ndarray)).compute()
print 'Histograms \n', hists
Desired output
DataFrame
image n1 n2 n3
0 [[51, 254, 167, 61, 230, 135, 40, 194, 101, 24... 0 0 0
1 [[178, 130, 204, 196, 80, 97, 61, 51, 195, 38,... 1 2 4
2 [[122, 126, 47, 31, 208, 130, 85, 189, 57, 227... 2 4 8
3 [[185, 141, 206, 233, 9, 157, 152, 128, 129, 1... 3 6 12
4 [[131, 6, 95, 23, 31, 182, 42, 136, 46, 118, 2... 4 8 16
5 [[111, 89, 173, 139, 42, 131, 7, 9, 160, 130, ... 5 10 20
6 [[197, 223, 15, 40, 30, 210, 145, 182, 74, 203... 6 12 24
7 [[161, 87, 44, 198, 195, 153, 16, 195, 100, 22... 7 14 28
8 [[0, 158, 60, 217, 164, 109, 136, 237, 49, 25,... 8 16 32
9 [[222, 64, 64, 37, 142, 124, 173, 234, 88, 40,... 9 18 36
Histograms
0 [81, 87, 80, 94, 99, 79, 86, 90, 90, 113, 96, ...
1 [93, 76, 103, 83, 76, 101, 85, 83, 96, 92, 87,...
2 [84, 93, 87, 113, 83, 83, 89, 89, 114, 92, 86,...
3 [98, 101, 95, 111, 77, 92, 106, 72, 91, 100, 9...
4 [95, 96, 87, 82, 89, 87, 99, 82, 70, 93, 76, 9...
5 [77, 94, 95, 85, 82, 90, 77, 92, 87, 89, 94, 7...
6 [73, 86, 81, 91, 91, 82, 96, 94, 112, 95, 74, ...
7 [88, 89, 87, 88, 76, 95, 96, 98, 108, 96, 92, ...
8 [83, 84, 76, 88, 96, 112, 89, 80, 93, 94, 98, ...
9 [91, 78, 85, 98, 105, 75, 83, 66, 79, 86, 109,...
You can see commented lines, calling dask.DataFrame.apply. If I have uncommented them, I've got the exception dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
And here is the exception stack:
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 94, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\base.py", line 201, in compute
results = get(dsk, keys, **kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\threaded.py", line 76, in get
**kwargs)
File "C:\Anaconda\envs\MBA\lib\site-packages\dask\async.py", line 500, in get_async
raise(remote_exception(res, tb))
dask.async.ValueError: Shape of passed values is (3, 128), indices imply (3, 4)
How can I overcome it?
My goal is to process this data frame in parallel.
map_partitions was the answer. After several days of experiments and time measurements, I've come to the following code. It gives 2-4 times speedup compared to list comprehensions or generator expressions wrapping pandas.DataFrame.itertuples
def func(data):
filtered = # filter data.image
result = np.histogram(filtered)
return result
def func_partition(data, additional_args):
result = data.apply(func, args=(bsifilter, ), axis=1)
return result
if __name__ == '__main__':
dask.set_options(get=dask.multiprocessing.get)
n = 30000
df = pd.DataFrame({'image': [(np.random.random((180, 64)) * 255).astype(np.uint8) for i in np.arange(n)],
'n1': np.arange(n),
'n2': np.arange(n) * 2,
'n3': np.arange(n) * 4
}
)
ddf = dd.from_pandas(df, npartitions=MAX_PROCESSORS)
dhists = ddf.map_partitions(func_partition, bfilter, meta=pd.Series(dtype=np.ndarray))
print 'Delayed dhists = \n', dhists
hists = pd.Series(dhists.compute())

Pandas Groupby date index and count values in list of integers

I have the below data which is a date index that has date range between '2014-08-22' and '2014-08-28' and one column with list of integers. I am trying to figure out a nice Pandas method for just grouping the numbers by date. Desired Result also below.
Date:
values
date
2014-08-22 [179, 187, 188, 190, 194, 198, 2, 226, 26, 311, 322, 325, 341, 6]
2014-08-22 [179, 187, 188, 190, 194, 198, 2, 226, 26, 311, 322, 325, 341, 6]
2014-08-22 [167, 172, 178, 189, 198, 2, 20, 211, 212, 22, 274, 276, 287, 318, 321, 326, 48]
2014-08-23 [167, 172, 178, 189, 198, 2, 20, 211, 212, 22, 274, 276, 287, 318, 321, 326, 48]
2014-08-23 [167, 172, 178, 189, 198, 2, 20, 211, 212, 22, 274, 276, 287, 318, 321, 326, 48]
Desired pivot/groupby/crosstab Output:
2014-08-22 2014-08-23
179 2 0
167 1 2
etc...
I know how to create a dict with the counts of occurrence as below but not sure how to group it by the index
from collections import Counter
values_list = list(chain.from_iterable(df['values']))
Counter(values_list)
Here's an approach.
# expand lists of data into into columns
t = df['values'].apply(lambda x: pd.Series(1, index=x))
t = t.fillna(0) #Filled by 0
# sum observations across days and transpose
t.groupby(level=0).sum().T

Categories

Resources