I'm having trouble making a function that places a number inside a binary grid. For instance,
if i'm given 4 3 2 1, and I have a grid that is 5x5, it would look like the following...
4 4 4 4 1
4 4 4 4 0
4 4 4 4 0
4 4 4 4 0
0 0 0 0 0
My current code reads a text file and creates a list that is arranged in descending order. For instance if the text file contained 1 2 3, it would create a list of integers 3 2 1. Also my code prompts for a bin # which creates a binxbin square. I don't know how to actually place in a number 4 for the bin. This is the function that should place in the values which i'm stuck with.
def isSpaceFree(bin, row, column, block):
if row + block > len(bin):
return False
if column + block > len(bin):
return False
if bin[row][column] == 0 :
return True
else:
return False
for r in range(row, row+block):
if bin[row][column] != 0:
It sounds like isSpaceFree should return True if you can create a square with origin origin (row, column) and size block, without going out of bounds or overlapping any non-zero elements. In which case, you're 75% of the way there. You have the bounds checking ready, and half of the overlap check loop.
def isSpaceFree(bin, row, column, block):
#return False if the block would go out of bounds
if row + block > len(bin):
return False
if column + block > len(bin):
return False
#possible todo:
#return False if row or column is negative
#return False if the square would overlap an existing element
for r in range(row, row+block):
for c in range(column, column+block):
if bin[r][c] != 0: #oops, overlap will occur
return False
#square is in bounds, and doesn't overlap anything. Good to go!
return True
Then, actually placing the block is the same double-nested loop, but instead performing an assignment.
def place(bin, row, column, block):
if isSpaceFree(bin, row, column, block):
for r in range(row, row+block):
for c in range(column, column+block):
bin[r][c] = block
x = [
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
]
place(x, 0, 0, 4)
print "\n".join(str(row) for row in x)
Result:
[4, 4, 4, 4, 0]
[4, 4, 4, 4, 0]
[4, 4, 4, 4, 0]
[4, 4, 4, 4, 0]
[0, 0, 0, 0, 0]
Related
I have the following dataframe:
d_test = {
'random_staff' : ['gfda', 'fsd','gec', 'erw', 'gd', 'kjhk', 'fd', 'kui'],
'cluster_number' : [1, 2, 3, 3, 2, 1, 4, 2]
}
df_test = pd.DataFrame(d_test)
cluster_number column contains values from 1 to n. Some values could have repetition but no missing values are presented. For example above such values are: 1, 2, 3, 4.
I want to be able to select some value from cluster_number column and change every occurrence of this value to set of unique values. No missing value should be presented. For example if we select value 2 then desirable outcome for cluster_number is [1, 2, 3, 3, 5, 1, 4, 6]. Note we had three 2 in the column. We kept first one as 2 we change next occurrence of 2 to 5 and we changed last occurrence of 2 to 6.
I wrote code for the logic above and it works fine:
cluster_number_to_change = 2
max_cluster = max(df_test['cluster_number'])
first_iter = True
i = cluster_number_to_change
for index, row in df_test.iterrows():
if row['cluster_number'] == cluster_number_to_change:
df_test.loc[index, 'cluster_number'] = i
if first_iter:
i = max_cluster + 1
first_iter = False
else:
i += 1
But it is written as for-loop and I am trying understand if can be transformed in form of pandas .apply method (or any other effective vectorized solution).
Using boolean indexing:
# get cluster #2
m1 = df_test['cluster_number'].eq(2)
# identify duplicates
m2 = df_test['cluster_number'].duplicated()
# increment duplicates using the max as reference
df_test.loc[m1&m2, 'cluster_number'] = (
m2.where(m1).cumsum()
.add(df_test['cluster_number'].max())
.convert_dtypes()
)
print(df_test)
Output:
random_staff cluster_number
0 gfda 1
1 fsd 2
2 gec 3
3 erw 3
4 gd 5
5 kjhk 1
6 fd 4
7 kui 6
I am trying to reverse arrays in groups. I have wasted more than half an hour finding the problem but not able to figure out how is index is out of range.
Here is my code:
def rev(A,S,N):
start=S
end=N
while start<end:
A[start],A[end]=A[end],A[start] #error here
start+=1
end-=1
return A
def reverseInGroups(A,N,K):
#Your code here
rev(A,0,K)
rev(A,K,N) #error here
return A
Here is the error I am getting
Sample Input 1 : N=5 K=3 A= [1,2,3,4,5]
Sample Output 1 : 3 2 1 5 4
Sample Output 2 N= 8 K=3 A=[1,2,3,4,5,6,7,8]
Sample Output 2 : 3 2 1 6 5 4 8 7
For more information please visit this link
How about
def rev(a,start,end, middle):
assert 0 <= start <= middle <= end < len(a)
a[start:middle] = reversed(a[start:middle])
a[middle:end] = reversed(a[middle:end])
return a
There is no need to use / iterate positions at all - this avoids your error because slicing can handle oversized slices: [1,2,3,4][2:99] works w/o error.
def rev(data, start, end):
"""Reverses the range start:end (end exclusive) of the given list.
No safeguards whatsoever so only use with correct data. Out of bounds
is irrelevant due to slices used to reverse."""
data[start:end] = data[start:end][::-1] # you need end+1 if you want inclusive
return data
def reverseInGroups(A,N,K):
rev(A,0,K)
rev(A,K,N)
return A
l = list(range(11))
print ( reverseInGroups(l , 8, 3)) # why N (the bigger number) first?
to get
[2, 1, 0, 7, 6, 5, 4, 3, 8, 9, 10]
#0 1 2 3 4 5 6 7 8 9 10 # 0-3(exclusive) and 3-8(exclusive) reversed
To revere all K sized groups do
def reverseInGroups(A,K):
pos_at = 0
while pos_at < len(A):
rev(A, pos_at, pos_at+K)
pos_at += K
return A
I have:
hi
0 1
1 2
2 4
3 8
4 3
5 3
6 2
7 8
8 3
9 5
10 4
I have a list of lists and single integers like this:
[[2,8,3], 2, [2,8]]
For each item in the main list, I want to find out the index of when it appears in the column for the first time.
So for the single integers (i.e 2) I want to know the first time this appears in the hi column (index 1, but I am not interested when it appears again i.e index 6)
For the lists within the list, I want to know the last index of when the list appears in order in that column.
So for [2,8,3] that appears in order at indexes 6, 7 and 8, so I want 8 to be returned. Note that it appears before this too, but is interjected by a 4, so I am not interested in it.
I have so far used:
for c in chunks:
# different method if single note chunk vs. multi
if type(c) is int:
# give first occurence of correct single notes
single_notes = df1[df1['user_entry_note'] == c]
single_notes_list.append(single_notes)
# for multi chunks
else:
multi_chunk = df1['user_entry_note'].isin(c)
multi_chunk_list.append(multi_chunk)
You can do it with np.logical_and.reduce + shift. But there are a lot of edge cases to deal with:
import numpy as np
def find_idx(seq, df, col):
if type(seq) != list: # if not list
s = df[col].eq(seq)
if s.sum() >= 1: # if something matched
idx = s.idxmax().item()
else:
idx = np.NaN
elif seq: # if a list that isn't empty
seq = seq[::-1] # to get last index
m = np.logical_and.reduce([df[col].shift(i).eq(seq[i]) for i in range(len(seq))])
s = df.loc[m]
if not s.empty: # if something matched
idx = s.index[0]
else:
idx = np.NaN
else: # empty list
idx = np.NaN
return idx
l = [[2,8,3], 2, [2,8]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7]
l = [[2,8,3], 2, [2,8], [], ['foo'], 'foo', [1,2,4,8,3,3]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7, nan, nan, nan, 5]
I have a vector dogSpecies showing all four unique dog species under investigation.
#a set of possible dog species
dogSpecies = [1,2,3,4]
I also have a data vector containing integer numbers corresponding to the records of dog species of all dogs tested.
# species of examined dogs
data = np.array(1,1,2,-1,0,2,3,5,4)
Some of the records in data contain values different than 1,2,3 or 4. (Such as -1, 0 or 5). If an element in the data set is not equal to any element of the dogSpecies, such occurrence should be marked in an error evaluation boolean matrix as False.
#initially all the elements of the boolean error evaluation vector are True.
errorEval = np.ones((np.size(data,axis = 0)),dtype=bool)
Ideally my errorEval vector would look like this:
errorEval = np.array[True, True, True, False, False, True, True, False, True]
I want a piece of code that checks if the elements of data are not equal to the elements of dogSpecies vector. My code for some reason marks every single element of the errorEval vector as 'False'.
for i in range(np.size(data, axis = 0)):
# validation of the species
if (data[i] != dogSpecies):
errorEval[i] = False
I understand that I cannot compare a single element with a vector of four elements like above, but how do I do this then?
Isn't this just what you want?
for index, elem in enumerate(data):
if elem not in dogSpecies:
errorEval[index] = False
Probably not very fast, it doesn't use any vectorized numpy ufuncs but if the array isn't very large that won't matter. Converting dogSpecies to a set will also speed things up.
As an aside, your python looks very c/java esque. I'd suggest reading the python style guide.
If I understand correctly, you have a dataframe and a list of dog species. This should achieve what you want.
df = pd.DataFrame({'dog': [1,3,4,5,1,1,8,9,0]})
dog
0 1
1 3
2 4
3 5
4 1
5 1
6 8
7 9
8 0
df['errorEval'] = df['dog'].isin(dogSpecies).astype(int)
dog errorEval
0 1 1
1 3 1
2 4 1
3 5 0
4 1 1
5 1 1
6 8 0
7 9 0
8 0 0
df.errorEval.values
# array([1, 1, 1, 0, 1, 1, 0, 0, 0])
If you don't want to create a new column then you can do:
df.assign(errorEval=df['dog'].isin(dogSpecies).astype(int)).errorEval.values
# array([1, 1, 1, 0, 1, 1, 0, 0, 0])
As #FHTMitchel stated you have to use in to check if an element is in a list or not.
But you can use list comprehension which is faster as normal loop and shorter:
errorEval = np.array([True if elem in dogSpecies else False for elem in data])
I have a DataFrame with one column with positive and negative integers. For each row, I'd like to see how many consecutive rows (starting with and including the current row) have negative values.
So if a sequence was 2, -1, -3, 1, -1, the result would be 0, 2, 1, 0, 1.
I can do this by iterating over all the indices, using .iloc to split the column, and next() to find out where the next positive value is. But I feel like this isn't taking advantage of panda's capabilities, and I imagine that there's a better way of doing it. I've experimented with using .shift() and expanding_window but without success.
Is there a more "pandastic" way of finding out how many consecutive rows after the current one meet some logical condition?
Here's what's working now:
import pandas as pd
df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1]})
df["b"] = 0
for i in df.index:
sub = df.iloc[i:].a.tolist()
df.b.iloc[i] = next((sub.index(n) for n in sub if n >= 0), 1)
Edit: I realize that even my own example doesn't work when there's more than one negative value at the end. So that makes a better solution even more necessary.
Edit 2: I stated the problem in terms of integers, but originally only put 1 and -1 in my example. I need to solve for positive and negative integers in general.
FWIW, here's a fairly pandastic answer that requires no functions or applies. Borrows from here (among other answers I'm sure) and thanks to #DSM for mentioning the ascending=False option:
df = pd.DataFrame({"a": [2, -1, -3, -1, 1, 1, -1, 1, -1, -2]})
df['pos'] = df.a > 0
df['grp'] = ( df['pos'] != df['pos'].shift()).cumsum()
dfg = df.groupby('grp')
df['c'] = np.where( df['a'] < 0, dfg.cumcount(ascending=False)+1, 0 )
a b pos grp c
0 2 0 True 1 0
1 -1 3 False 2 3
2 -3 2 False 2 2
3 -1 1 False 2 1
4 1 0 True 3 0
5 1 0 True 3 0
6 -1 1 False 4 1
7 1 0 True 5 0
8 -1 1 False 6 2
9 -2 1 False 6 1
I think a nice thing about this method is that once you set up the 'grp' variable you can do lots of things very easily with standard groupby methods.
This was an interesting puzzle. I found a way to do it using pandas tools, but I think you'll agree it's a lot more opaque :-). Here's the example:
data = pandas.Series([1, -1, -1, -1, 1, -1, -1, 1, 1, -1, 1])
x = data[::-1] # reverse the data
print(x.groupby(((x<0) != (x<0).shift()).cumsum()).apply(lambda x: pandas.Series(
np.arange(len(x))+1 if (x<0).all() else np.zeros(len(x)),
index=x.index))[::-1])
The output is correct:
0 0
1 3
2 2
3 1
4 0
5 2
6 1
7 0
8 0
9 1
10 0
dtype: float64
The basic idea is similar to what I described in my answer to this question, and you can find the same approach used in various answers that ask how to make use of inter-row information in pandas. Your question is slightly trickier because your criterion goes in reverse (asking for the number of following negatives rather than the number of preceding negatives), and because you only want one side of the grouping (i.e., you only want the number of consecutive negatives, not the number of consecutive numbers with the same sign).
Here is a more verbose version of the same code with some explanation that may make it easier to grasp:
def getNegativeCounts(x):
# This function takes as input a sequence of numbers, all the same sign.
# If they're negative, it returns an increasing count of how many there are.
# If they're positive, it just returns the same number of zeros.
# [-1, -2, -3] -> [1, 2, 3]
# [1, 2, 3] -> [0, 0, 0]
if (x<0).all():
return pandas.Series(np.arange(len(x))+1, index=x.index)
else:
return pandas.Series(np.zeros(len(x)), index=x.index)
# we have to reverse the data because cumsum only works in the forward direction
x = data[::-1]
# compute for each number whether it has the same sign as the previous one
sameSignAsPrevious = (x<0) != (x<0).shift()
# cumsum this to get an "ID" for each block of consecutive same-sign numbers
sameSignBlocks = sameSignAsPrevious.cumsum()
# group on these block IDs
g = x.groupby(sameSignBlocks)
# for each block, apply getNegativeCounts
# this will either give us the running total of negatives in the block,
# or a stretch of zeros if the block was positive
# the [::-1] at the end reverses the result
# (to compensate for our reversing the data initially)
g.apply(getNegativeCounts)[::-1]
As you can see, run-length-style operations are not usually simple in pandas. There is, however, an open issue for adding more grouping/partitioning abilities that would ameliorate some of this. In any case, your particular use case has some specific quirks that make it a bit different from a typical run-length task.