I have a dataset for which I would like to create additional training labels, by creating a buffer zone around the true labels in a two-dimensional dataset (lon, lat). For the sake of my question, say that my dataset looks like:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]])
code: df = np.array([0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0]).reshape(5,5)
After creating the buffer zone. My output data should look something like:
array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]])
Technically my dataset is 3D with 5000 time variables. I know in ArcGIS there is a tool that does this. However, it only does this for one time at a time. I don't want to export 5000 separate files, as you could understand. Does anyone know how to tackle this issue?
Maybe good to know that all my one 'pixel' is 0.5 by 0.5.
Although it might not be the prettiest of answers. I did find a way on how to tackle it. The code below creates (if possible) a 3x3 grid of true labels around each true label in the original dataset. It handles borders and edges as well without a problem. If anyone knows better / faster solution, please share!
df = np.array([0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0]).reshape(5,5)
values = np.where(df == 1) # find indexes of true labels
for ind in range(len(values[0])):
x = values[0][ind]
y = values[1][ind]
#upper left
if x == 0 and y == 0:
df[x,y:y+2] = 1
df[x+1,y:y+2] = 1
#upper right
elif x == 0 and y == df.shape[1]-1:
df[x,y-1:y+1] = 1
df[x+1,y-1:y+1] = 1
#bottom left
elif x == df.shape[0]-1 and y == 0:
df[x-1,y:y+2] = 1
df[x,y:y+2] = 1
#bottom right
elif x == df.shape[0]-1 and y == df.shape[1]-1:
df[x-1,y-1:y+1] = 1
df[x,y-1:y+1] = 1
### along borders
#along top border
elif x == 0 and y < df.shape[1]-1:
df[x,y-1:y+2] = 1
df[x+1,y-1:y+2] = 1
#along bottom border
elif x == df.shape[0]-1 and y < df.shape[1]-1:
df[x-1,y-1:y+2] = 1
df[x,y-1:y+2] = 1
#along left border
elif x < df.shape[0]-1 and y == 0:
df[x-1,y:y+2] = 1
df[x,y:y+2] = 1
df[x+1,y:y+2] = 1
#along right border
elif x < df.shape[0]-1 and y == df.shape[0]-1:
df[x-1,y-1:y+1] = 1
df[x,y-1:y+1] = 1
df[x+1,y-1:y+1] = 1
### everywhere aside along borders
else:
df[x-1,y-1:y+2] = 1
df[x,y-1:y+2] = 1
df[x+1,y-1:y+2] = 1
Related
I have an array y composed of 0 and 1, but at a different frequency.
For example:
y = np.array([0, 0, 1, 1, 1, 1, 0])
And I have an array x of the same length.
x = np.array([0, 1, 2, 3, 4, 5, 6])
The idea is to filter out elements until there are the same number of 0 and 1.
A valid solution would be to remove index 5:
x = np.array([0, 1, 2, 3, 4, 6])
y = np.array([0, 0, 1, 1, 1, 0])
A naive method I can think of is to get the difference between the value frequency of y (in this case 4-3=1) create a mask for y == 1 and switch random elements from True to False until the difference is 0. Then create a mask for y == 0, do a OR between them and apply it to both x and y.
This doesn't really seem the best "python/numpy way" of doing it though.
Any suggestions? Something like randomly select n elements from the highest count, where n is the count of the lowest value.
If this is easier with pandas then that would work for me too.
Naive algorithm assuming 1 > 0:
mask_pos = y == 1
mask_neg = y == 0
pos = len(y[mask_pos])
neg = len(y[mask_neg])
diff = pos-neg
while diff > 0:
rand = np.random.randint(0, len(y))
if mask_pos[rand] == True:
mask_pos[rand] = False
diff -= 1
mask_final = mask_pos | mask_neg
y_new = y[mask_final]
x_new = x[mask_final]
This naive algorithm is really slow
One way to do that with NumPy is this:
import numpy as np
# Makes a mask to balance ones and zeros
def balance_binary_mask(binary_array):
binary_array = np.asarray(binary_array).ravel()
# Count number of ones
z = np.count_nonzero(binary_array)
# If there are less ones than zeros
if z <= len(binary_array) // 2:
# Invert the array
binary_array = ~binary_array
# Find ones
idx = np.nonzero(binary_array)[0]
# Number of elements to remove
rem = 2 * len(idx) - len(binary_array)
# Pick random indices to remove
rem_idx = np.random.choice(idx, size=rem, replace=False)
# Make mask
mask = np.ones_like(binary_array, dtype=bool)
# Mask elements to remove
mask[rem_idx] = False
return mask
# Test
np.random.seed(0)
y = np.array([0, 0, 1, 1, 1, 1, 0])
x = np.array([0, 1, 2, 3, 4, 5, 6])
m = balance_binary_mask(y)
print(m)
# [ True True True True False True True]
y = y[m]
x = x[m]
print(y)
# [0 0 1 1 1 0]
print(x)
# [0 1 2 3 5 6]
I'm working on A star algorithm. I'm trying to build a trajectory for drone depending on A star. I have implemented my code below. I need to consider the height of obstacles and modify my equation :
F= G+H to F=H+G+E
E: represents the elevation of obstacles. We have the drone is flying in a specific altitude over a map, if the obstacle was very high(it means its risk very high) the distance between the obstacle and the drone is too close, so the drone will prefer to fly over the short obstacle. If the obstacle higher than the altitude of the drone, it will turn around.
I added an elevation map with random height generation and drone_altitude, but it doesn't work with me. Could I get some assistance, please?.
The A-star Python Code:
import numpy
grid = [[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0]]
heuristic = [[9, 8, 7, 6, 5, 4],
[8, 7, 6, 5, 4, 3],
[7, 6, 5, 4, 3, 2],
[6, 5, 4, 3, 2, 1],
[5, 4, 3, 2, 1, 0]]
init = [0,0]
goal = [len(grid)-1,len(grid[0])-1]
delta = [[-1 , 0], #up
[ 0 ,-1], #left
[ 1 , 0], #down
[ 0 , 1]] #right
delta_name = ['^','<','V','>'] #The name of above actions
cost = 1 #Each step costs you one
drone_height = 60
def search():
#open list elements are of the type [g,x,y]
closed = [[0 for row in range(len(grid[0]))] for col in range(len(grid))]
action = [[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
#We initialize the starting location as checked
closed[init[0]][init[1]] = 1
expand=[[-1 for row in range(len(grid[0]))] for col in range(len(grid))]
elvation = numpy.random.randint(0, 100+1, size=(5, 6))
print(elvation)
# we assigned the cordinates and g value
x = init[0]
y = init[1]
g = 0
h = heuristic[x][y]
e = elvation[x][y]
f = g + h + e
#our open list will contain our initial value
open = [[f, g, h, x, y]]
found = False #flag that is set when search complete
resign = False #Flag set if we can't find expand
count = 0
#print('initial open list:')
#for i in range(len(open)):
#print(' ', open[i])
#print('----')
while found is False and resign is False:
#Check if we still have elements in the open list
if len(open) == 0: #If our open list is empty, there is nothing to expand.
resign = True
print('Fail')
print('############# Search terminated without success')
print()
else:
#if there is still elements on our list
#remove node from list
open.sort()
open.reverse() #reverse the list
next = open.pop()
#print('list item')
#print('next')
x = next[3]
y = next[4]
g = next[1]
expand[x][y] = count
count+=1
#Check if we are done
if x == goal[0] and y == goal[1]:
found = True
print(next) #The three elements above this "if".
print('############## Search is success')
print()
else:
#expand winning element and add to new open list
for i in range(len(delta)):
x2 = x + delta[i][0]
y2 = y + delta[i][1]
#if x2 and y2 falls into the grid
if x2 >= 0 and x2 < len(grid) and y2 >=0 and y2 <= len(grid[0])-1:
#if x2 and y2 not checked yet and there is not obstacles
if closed[x2][y2] == 0 and grid[x2][y2] == 0 and e < drone_height:
g2 = g + cost #we increment the cose
h2 = heuristic[x2][y2]
e2 = elvation[x2][y2]
f2 = g2 + h2 + e2
open.append([f2,g2,h2,x2,y2]) #we add them to our open list
#print('append list item')
#print([g2,x2,y2])
#Then we check them to never expand again
closed[x2][y2] = 1
action[x2][y2] = i
for i in range(len(expand)):
print(expand[i])
print()
policy=[[' ' for row in range(len(grid[0]))] for col in range(len(grid))]
x=goal[0]
y=goal[1]
policy[x][y]='*'
while x !=init[0] or y !=init[1]:
x2=x-delta[action[x][y]][0]
y2=y-delta[action[x][y]][1]
policy[x2][y2]= delta_name[action[x][y]]
x=x2
y=y2
for i in range(len(policy)):
print(policy[i])
search()
I am using music21 for handling MIDI and mXML files and converting them to a piano roll I am using in my project.
My piano roll is made up of sequence of 88-dimensional vectors where each element in a vector represents one pitch. One vector is one time step that can be 16th, 8th, 4th, and so on. Elements can obtain three values {0, 1, 2}. 0 means note is off. 1 means note is on. 2 means also that note is on but it always follows 1 - that is how I distinguish multiple key presses of same note. E.g., let time step be 8th and these two pitches be C and E:
[0 0 0 ... 1 0 0 0 1 ... 0]
[0 0 0 ... 1 0 0 0 1 ... 0]
[0 0 0 ... 2 0 0 0 2 ... 0]
[0 0 0 ... 2 0 0 0 2 ... 0]
[0 0 0 ... 1 0 0 0 0 ... 0]
[0 0 0 ... 1 0 0 0 0 ... 0]
We see that C and E are simultaneously played for quarter note, then again for quarter note, and we end with a C that lasts quarter note.
Right now, I am creating Stream() for every note and fill it as notes come. That gives me 88 streams and when I convert that to MIDI, and open that MIDI with MuseScore, that leaves me with a mess that is not readable.
My question is, is there some nicer way to transform this kind of piano roll to MIDI? Some algorithm, or idea which I could use would be appreciated.
In my opinion music21 is a very good library but too high-level for
this job. There is no such thing as streams, quarter notes or chords
in MIDI -- only messages. Try the
Mido library instead. Here
is sample code:
from mido import Message, MidiFile, MidiTrack
def stop_note(note, time):
return Message('note_off', note = note,
velocity = 0, time = time)
def start_note(note, time):
return Message('note_on', note = note,
velocity = 127, time = time)
def roll_to_track(roll):
delta = 0
# State of the notes in the roll.
notes = [False] * len(roll[0])
# MIDI note for first column.
midi_base = 60
for row in roll:
for i, col in enumerate(row):
note = midi_base + i
if col == 1:
if notes[i]:
# First stop the ringing note
yield stop_note(note, delta)
delta = 0
yield start_note(note, delta)
delta = 0
notes[i] = True
elif col == 0:
if notes[i]:
# Stop the ringing note
yield stop_note(note, delta)
delta = 0
notes[i] = False
# ms per row
delta += 500
roll = [[0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 2, 0, 0, 0, 2, 0],
[0, 1, 0, 2, 0, 0, 0, 2, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0]]
midi = MidiFile(type = 1)
midi.tracks.append(MidiTrack(roll_to_track(roll)))
midi.save('test.mid')
I have this code:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.find(" ")
gp_nodeCount = int(gp[0:gp_splitIndex])
gp_edgeCount = int(gp[gp_splitIndex+1:-1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount-1):
gp = gs.readline()
gp_splitIndex = gp.find(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp[0:gp_splitIndex])
gp_to = int(gp[gp_splitIndex+1:-1])
matrix[gp_from][gp_to] = 1
print matrix
The file graph.txt contains this:
5 10
0 1
1 2
2 3
3 4
4 0
0 3
3 1
1 4
4 2
2 0
The first two number are telling me, that GRAPH has 5 nodes and 10 edges. The Following number pairs demonstrate the edges between nodes. For example "1 4" means an edge between node 1 and 4.
Problem is, the output should be this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
But instead of that, I get this:
[[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [0, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Only one number is different and I can't understand why is this happening. The edge "3 1" is not present. Can someone explain, where is the problem?
Change for i in range(0, gp_edgeCount-1): to
for i in range(0, gp_edgeCount):
The range() function already does the "-1" operation. range(0,3) "==" [0,1,2]
And it is not the "3 1" edge that is missing, it is the "2 0" edge that is missing, and that is the last edge. The matrices start counting at 0.
Matthias has it; you don't need edgeCount - 1 since the range function doesn't include the end value in the iteration.
There are several other things you can do to clean up your code:
The with operator is preferred for opening files, since it closes them automatically for you
You don't need to call find and manually slice, split already does what you want.
You can convert and assign directly to a pair of numbers using a generator expression and iterable unpacking
You can call range with just an end value, the 0 start is implicit.
The multiplication operator is handy for initializing lists
With all of those changes:
with open('graph.txt', 'r') as graph:
node_count, edge_count = (int(n) for n in graph.readline().split())
matrix = [[0]*node_count for _ in range(node_count)]
for i in range(edge_count):
src, dst = (int(n) for n in graph.readline().split())
matrix[src][dst] = 1
print matrix
# [[0, 1, 0, 1, 0], [0, 0, 1, 0, 1], [1, 0, 0, 1, 0], [0, 1, 0, 0, 1], [1, 0, 1, 0, 0]]
Just to keep your code and style, of course it could be much more readable:
gs = open("graph.txt", "r")
gp = gs.readline()
gp_splitIndex = gp.split(" ")
gp_nodeCount = int(gp_splitIndex[0])
gp_edgeCount = int(gp_splitIndex[1])
matrix = [] # predecare the array
for i in range(0, gp_nodeCount):
matrix.append([])
for y in range(0, gp_nodeCount):
matrix[i].append(0)
for i in range(0, gp_edgeCount):
gp = gs.readline()
gp_Index = gp.split(" ") # get the index of space, dividing the 2 numbers on a row
gp_from = int(gp_Index[0])
gp_to = int(gp_Index[1])
matrix[gp_from][gp_to] = 1
print matrix
Exactly is the last instance not used..the 2 0 from your file. Thus the missed 1. Have a nice day!
The other answers are correct, another version similar to the one of tzaman:
with open('graph.txt', mode='r') as txt_file:
lines = [l.strip() for l in txt_file.readlines()]
number_pairs = [[int(n) for n in line.split(' ')] for line in lines]
header = number_pairs[0]
edge_pairs = number_pairs[1:]
num_nodes, num_edges = header
edges = [[0] * num_nodes for _ in xrange(num_nodes)]
for edge_start, edge_end in edge_pairs:
edges[edge_start][edge_end] = 1
print edges
I have 2d binary numpy arrays of varying size, which contain certain patterns.
Just like this:
import numpy
a = numpy.zeros((6,6), dtype=numpy.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
Here the "image" contains two patches one with 2 and one with 3 connected cells.
print a
array([[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 1, 0]])
I want to know how often a non-zero cell borders another non-zero cell ( neighbours defined as rook's case, so the cells to the left, right, below and above each cell) including their pseudo-replication (so vice-versa).
A previous approach for inner boundaries returns wrong values (5) as it was intended to calculate outer boundaries.
numpy.abs(numpy.diff(a, axis=1)).sum()
So for the above test array, the correct total result would be 6 (The upper patch has two internal borders, the lower four ).
Grateful for any tips!
EDIT:
Mistake: The lower obviously has 4 internal edges (neighbouring cells with the same value)
Explained the desired neighbourhood a bit more
I think the result is 8 if it's 8-connected neighborhood. Here is the code:
import numpy
a = numpy.zeros((6,6), dtype=numpy.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
from scipy.ndimage import convolve
kernel = np.ones((3, 3))
kernel[1, 1] = 0
b = convolve(a, kernel, mode="constant")
b[a != 0].sum()
but you said rook's case.
edit
Here is the code for 4-connected neighborhood:
import numpy as np
a = np.zeros((6,6), dtype=np.int)
a[1,2] = a[1,3] = 1
a[4,4] = a[5,4] = a[4,3] = 1
from scipy import ndimage
kernel = ndimage.generate_binary_structure(2, 1)
kernel[1, 1] = 0
b = convolve(a, kernel, mode="constant")
b[a != 0].sum()