Related
I want to extract a rectangular ROI from an image.
The image contains a single connected non zero part.
I need it to be efficient in run time.
I was thinking maybe:
Summing along each direction.
Finding first non zero and last non zero.
Slicing the image accordingly.
Is there a better way?
My code:
First is a function to find the first and last non zero:
import numpy as np
from PIL import Image
def first_last_nonzero(boolean_vector):
first = last = -1
for idx,val in enumerate(boolean_vector):
if val == True and first == -1:
first = idx
if val == False and first != -1:
last = idx
return first , last
Then creating an image:
np_im = np.array([[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 255 154 251 60 0 0 0]
[ 0 0 0 0 4 66 0 0 255 0 0 0]
[ 0 0 0 0 0 0 0 134 48 0 0 0]
[ 0 0 0 0 0 0 236 70 0 0 0 0]
[ 0 0 0 0 1 255 0 0 0 0 0 0]
[ 0 0 0 0 255 24 24 24 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0 0 0]])
Then running our function on the sum along each axis:
y_start, y_end = first_last_nonzero(np.sum(np_im, 1)>0)
x_start, x_end = first_last_nonzero(np.sum(np_im, 0)>0)
cropped_np_im = np_im[y_start:y_end, x_start:x_end]
# show the cropped image
Image.fromarray(cropped_np_im).show()
This works but there are probably a plenty of unnecessary calculations.
Is there a better way to do this? Or maybe more pythonic way?
You can make use of the functions from this post:
Numpy: How to find first non-zero value in every column of a numpy array?
def first_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)
def last_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
val = arr.shape[axis] - np.flip(mask, axis=axis).argmax(axis=axis) - 1
return np.where(mask.any(axis=axis), val, invalid_val)
arr = np.array([
[0, 0, 0, 0, 1, 1],
[0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0] ])
y_Min, y_Max, x_Min, x_Max = (0, 0, 0, 0)
y_Min = first_nonzero(arr, axis = 0, invalid_val = -1)
y_Min = (y_Min[y_Min >= 0]).min()
x_Min = first_nonzero(arr, axis = 1, invalid_val = -1)
x_Min = (x_Min[x_Min >= 0]).min()
y_Max = last_nonzero(arr, axis = 0, invalid_val = -1)
y_Max = (y_Max[y_Max >= 0]).max()
x_Max = last_nonzero(arr, axis = 1, invalid_val = -1)
x_Max = (x_Max[x_Max >= 0]).max()
print(x_Min)
print(y_Min)
print(x_Max)
print(y_Max)
For this example of mine, the code will return 1, 0, 5, 4.
As a general rule of thumb in python: Try to avoid loops at all costs. From my own experience that statement is true in 99 out of 100 cases
Essentially, I want to convert consecutive duplicates of Trues, to False as the title suggests.
For example, say, i have an array of 0s and 1s
x = pd.Series([1,0,0,1,1])
should become:
y = pd.Series([0,0,0,0,1])
# where the 1st element of x becomes 0 since its not a consecutive
# and the 4th element becomes 0 because its the first instance of the consecutive duplicate
# And everything else should remain the same.
This can also apply to consecutives of more than two, Say i have a much longer array:
eg.
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
becomes;
y = pd.Series([0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1])
Posts that i have searched are mostly either deleting consecutive duplicates, and does not retain the original length. In this case, it should retain the original length.
It is something like the following code:
for i in range(len(x)):
if x[i] == x[i+1]:
x[i] = True
else:
x[i] = False
but this gives me a never ending run. And does not accommodate consecutives of more than two.
Pandas solution - create Series, then consecutive groups by shift and cumsum and filter last 1 values in duplicates by Series.duplicated:
s = pd.Series(x)
g = s.ne(s.shift()).cumsum()
s1 = (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
print (s1.tolist())
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
EDIT:
For multiple columns use function:
x = pd.Series([1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1])
df = pd.DataFrame({'a':x, 'b':x})
def f(s):
g = s.ne(s.shift()).cumsum()
return (~g.duplicated(keep='last') & g.duplicated(keep=False) & s.eq(1)).astype(int)
df = df.apply(f)
print (df)
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 0 0
7 0 0
8 1 1
9 0 0
10 0 0
11 0 0
12 0 0
13 1 1
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 1 1
Vanilla Python :
x = [1,0,0,1,1,1,0,1,1,0,1,1,1,1,0,0,1,1,1,1,1]
counter = 0
for i, e in enumerate(x):
if not e:
counter = 0
continue
if not counter or (i < len(x) - 1 and x[i+1]):
counter += 1
x[i] = 0
print(x)
Prints :
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]
I basically want to reorder(don't think this is a shuffling task) a list of 100 binary numbers. The following properties should hold after the reorder: the fixed frequency of 1's should remain, which is 10 and the 1's should be roughly spread apart from each other as shown below, so every 9th, 10th, or 11th digit is a 1. I want this reordering to be random. The trivial approach I had in mind is to track the index of the first 1 in the input list and generate a new start index. Any ideas on other solutions?
x = [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
Codes as follows:
def main():
from random import shuffle
from random import randint
from itertools import chain
num_of_10th = randint(0, 5) * 2
num_of_11th = num_of_9th = int((10 - num_of_10th) / 2)
lsts = []
for i in range(num_of_10th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
for i in range(num_of_9th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0])
for i in range(num_of_11th):
lsts.append([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
shuffle(lsts)
lsts = list(chain.from_iterable(lsts))
print(lsts)
You can use python's list multiplication.
My solution will generate a random size between 1 and 10 using random.randint. from this size I create the repeated_part that starts with a 1 and fills in the rest with zero's. For example
when size is 5 repeated_part will be [1, 0, 0, 0, 0].
From the size we can calculate the number of times it fits in a list of 100 100//spread and we add one overflow. Now the list will be too large for example with a size of 3 the total size of the list is ((100/3)+1)*3 = 102 so we truncate the list to become 100 in length with [:100].
import random
size = random.randint(1, 10)
repeated_part = [1] + [0]*(size-1)
result = (repeated_part * (100 // size + 1)) [:100]
Note is you want the 1 to not start as first you could use random.shuffle(repeated_part) but still hold all your other requirements.
I have a python dictionary formatted in the following way:
data[author1][author2] = 1
This dictionary contains an entry for every possible author pair (all pairs of 8500 authors), and I need to output a matrix that looks like this for all author pairs:
"auth1" "auth2" "auth3" "auth4" ...
"auth1" 0 1 0 3
"auth2" 1 0 2 0
"auth3" 0 2 0 1
"auth4" 3 0 1 0
...
I have tried the following method:
x = numpy.array([[data[author1][author2] for author2 in sorted(data[author1])] for author1 in sorted(data)])
print x
outf.write(x)
However, printing this leaves me with this:
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
...,
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
and the output file is just a blank text file. I am trying to format the output in a way to read into Gephi (https://gephi.org/users/supported-graph-formats/csv-format/)
You almost got it right, your list comprehension is inverted. This will give you the expected result:
d = dict(auth1=dict(auth1=0, auth2=1, auth3=0, auth4=3),
auth2=dict(auth1=1, auth2=0, auth3=2, auth4=0),
auth3=dict(auth1=0, auth2=2, auth3=0, auth4=1),
auth4=dict(auth1=3, auth2=0, auth3=1, auth4=0))
np.array([[d[i][j] for i in sorted(d.keys())] for j in sorted(d[k].keys())])
#array([[0, 1, 0, 3],
# [1, 0, 2, 0],
# [0, 2, 0, 1],
# [3, 0, 1, 0]])
You could use pandas. Using #Saullo Castro input:
import pandas as pd
df = pd.DataFrame.from_dict(d)
Result:
>>> df
auth1 auth2 auth3 auth4
auth1 0 1 0 3
auth2 1 0 2 0
auth3 0 2 0 1
auth4 3 0 1 0
And if you want to save you can just do df.to_csv(file_name)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I wanna create matrix like following;
I am still beginner of this language and I need help so badly, thanks
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
You can create list of lists and print them as you like
matrix = [[0] * 5 for _ in range(5)]
for i in range(5):
matrix[i][i] = 1
print " ".join(str(num) for num in matrix[i])
print matrix
Output
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
If you're planning to do any real work with matrices, you should strongly consider looking at NumPy.
Once you get it installed:
>>> import numpy as np
>>> matrix = np.diag([1]*5)
>>> print(matrix)
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]]
So far, not too exciting. But check this out:
>>> print(matrix * 2)
[[2 0 0 0 0]
[0 2 0 0 0]
[0 0 2 0 0]
[0 0 0 2 0]
[0 0 0 0 2]]
>>> print(matrix + 1)
[[2 1 1 1 1]
[1 2 1 1 1]
[1 1 2 1 1]
[1 1 1 2 1]
>>> print((1 + matrix) * (1 - matrix))
[[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 1]
[1 1 1 0 1]
[1 1 1 1 0]]
>>> print(np.arccos(matrix) / np.pi)
[[ 0. 0.5 0.5 0.5 0.5]
[ 0.5 0. 0.5 0.5 0.5]
[ 0.5 0.5 0. 0.5 0.5]
[ 0.5 0.5 0.5 0. 0.5]
[ 0.5 0.5 0.5 0.5 0. ]]
All that math, and a whole lot more, you don't have to implement yourself. And it's generally at least 10x as fast as if you did implement it yourself. All that, plus fancy indexing (like slicing by row, column, or both), and all kinds of other things you don't yet know you were going to ask for, but will.
My way will be ...
Code::
size = 5
for i in range(size):
for j in range(size):
print 1 if i==j else 0,
print ''
Output:
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Hope this helps :)
I think this is the most simple. please enjoy it
def fun(N):
return [[0]*x + [1] + [0]*(N-x) for x in range(N)]
print(fun(5))
The result:
[[1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0], [0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 0]]