Overlap of nested lists creates unwanted gap

Overlap of nested lists creates unwanted gap - python

I have a nest containing three lists which are being filled by a for-loop and the fill is controlled by if-conditions.
After the first iteration it could look like the following example:
a = [[1,2,0,0,0,0],[0,0,4,5,0,0],[0,0,0,0,6,7]]
which, by condition, are not overlapping. After the second iteration new values are being appended to the corresponding nested lists. In order to make sure the lists are the same length I append zeros in each run.
As soon as I set a condition so two lists overlap I get a gap the "size" of the desired overlap after the third iteration eventhough it should append directly to the corresponding list. Additionally if i set several overlaps (e.g. for each iteration) they add up so e.g. for three overlaps each the size of two i get a gap of six.
Below you see what i mean
w/o overlap w overlap (returns) w overlap (should return)
1 0 0 1 0 0 1 0 0
1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0
0 1 0 0 1 1 0 1 1
0 1 0 0 1 1 0 1 1
0 0 1 0 0 1 0 0 1
0 0 1 0 0 0 1 0 0
0 0 1 0 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 0 0
I have created a Pyfiddle here with the code I am using (it is the shortest I could create).It should give you the idea of what I am trying to achieve and what I have done so far.
Further I have used this post to wrap my head around it but it does not seem to apply to my problem.
EDIT: I think I have narrowed down the problem. It seems that due to the overlap the relevant list is being "pulled up" by the size of the overlap without adjusting the size of the remaining lists by the size of the offset. So the difference is being filled by zeros.
EDIT2:
My idea is to add the overlap/offset before the list is filled depending on the size of its predecessor. Since the start index depends on the the size of the predecessor it could be calculated using the difference of predecessor size and the gap.
Basically in the parent for-loop for i in range(len(data_j)) I would add:
overlap = len(data_j[i-1]['axis'])-offset
Unfortunately another problem occured during the process which you can find here Connect string value to a corresponding variable name

I have solved it using the steps from teh other post regarding this issue (See here: Connect string value to a corresponding variable name)
I have created another fiddle with the solution so you can compare it with the original fiddle to see what I did.
New Fiddle
Bascially I add the offset by summing up the size of the current predecessor list and the offset value (which can be negative as well to create an overlap). This sum is assigned to n_offset. Further another problem occurred with .append.
As soon as all lists are filled and you need to append more values to one of these lists the gap occurs again. This is caused by the for-loop appending the zeros. The range is n_offset and since it takes the size of the predecessor-list it just adds an amount of zeros the size of the first filling of the same list. That's why you have to subtract the length of the list from n_offset.

Related

More effcient function to find next change in a dataframe column with time series data

I have a large dataframe with a price column that stays at the same value as the time increases and then will change values, and then stay at that value new value for a while before going up or down. I want to write a function that looks at the price column and creates a new column called next movement that indicates wheather or not the next movement of the price column will be up or down.
For example if the price column looked like [1,1,1,2,2,2,4,4,4,3,3,3,4,4,4,2,1] then the next movement column should be [1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,-1] with 1 representing the next movement being up 0 representing the next movement being down, and -1 representing unknown.
def make_next_movement_column(DataFrame, column):
DataFrame["next movement"] = -1
for i in range (DataFrame.shape[0]):
for j in range(i + 1, DataFrame.shape[0]):
if(DataFrame[column][j] > DataFrame[column][i]):
DataFrame["next movement"][i:j] = 1
break;
if(DataFrame[column][j] < DataFrame[column][i]):
DataFrame["next movement"][i:j] = 0
break;
i = j - 1
return DataFrame
I wrote this function and it does work, but the problem is it is horribly ineffcient. I was wondering if there was a more effcient way to write this function.
This answer doesn't seem to work because the diff method only looks at the next column but I want to find the next movement no matter how far away it is.

Annotated code
# Calculate the diff between rows
s = df['column'].diff(-1)
# Broadcast the last diff value per group
s = s.mask(s == 0).bfill()
# Select from [1, 0] depending upon the value of diff
df['next_movement'] = np.select([s <= -1, s >= 1], [1, 0], -1)
Result
column next_movement
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 4 0
7 4 0
8 4 0
9 3 1
10 3 1
11 3 1
12 4 0
13 4 0
14 4 0
15 2 0
16 1 -1

Filling a 3D pocket with smaller fragments

I have a 3 dimensional numpy array with binary (0 and 1) values representing a voxelized space. A value of 1 means that the voxel is occupied, 0 means it is empty. For simplicity I will describe the problem with 2D data. An example of such a pocket could look like this:
1 1 1 1 1 1 1
1 1 0 0 0 1 1
1 0 0 0 0 0 1
1 0 0 1 0 0 1
1 0 0 1 1 1 1
1 1 1 1 1 1 1
I also have a dataset of fragments which are smaller than the pocket. Think of them as tetris pieces if you'd like, just in 3D. Similar to the game, the fragments can be rotated. Some examples:
0 1 1 1 1 1 0 1 1 0
1 1 0 0 0 1 0 1 1 0
I am looking to fill in the pocket with the fragments so the remaining empty space (0s) is as small as possible.
So far, I was thinking that I could decompose the pocket into smaller rectangular pockets, calculate the dimensions of these rectangular areas and of the fragments, then just match them based on these dimensions. Or maybe I could rotate the fragments so the values of 1 are closer to the "wall", and focus on boxes closer to the border. Next, I could look up the rectangular areas again and work towards filling in the core/inside of the pocket. To optimize the outcome, I can wrap these steps around a Monte-Carlo Tree Search algo.
Obviously I don't expect a complete answer, but if you have any better ideas on how to approach this, I would be happy to hear it. Any references to similar space search algorithms/papers would also be appreciated.

error: out of bounds when searching for identical rows in a numpy array

Given a numpy array of 2300 rows and 44 columns, I'd like my script to check for equal rows and to return arrays of those equal rows with the according indices in the original matrix.
Example:
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
1 2 3 4 5
Result:
equal_arrays1 = [1,2,3]
equal_arrays2 = [0,4]
My original data set consists of zero rows starting from 1323 to 1699. The result should then be:
equal_array1=[1323,...,1699]
What I did up till now is using the following code:
import numpy as np
input_data = np.load('1IN.npy')
print(np.shape(input_data))
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
if np.array_equal(input_data[:,i],input_data[:,j]):
print (i, j),
else: break
but this led to the error:
if np.array_equal(input_data[:,i],input_data[:,j]) :
IndexError: index 1302 is out of bounds for axis 1 with size 44
I think that this is not the best way to go for what I want to achieve, so if anyone has a better alternative or could explain what I need to fix, I'd be glad as I'm new to python.

You want to check only rows, so remove the check on column equality:
matching_pairs = []
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
matching_pairs.append((i, j))
# break?
print(matching_pairs)
Not sure what the break is about? You may want to break if you found a j matching your i, but you don't want to break if you don't find it, otherwise you will only check i against i+1 and nothing more.

Calculating number of permutations of a matrix with elements being adjacent integers only

I'm trying to write a Python code in order to determine the number of possible permutations of a matrix where neighbouring elements can only be adjacent integer numbers. I also wish to know how many times each total set of numbers appears (by that I mean, the same numbers of each integer in n matrices, but not in the same matrix permutation)
Forgive me if I'm not being clear, or if my terminology isn't ideal! Consider a 5 x 5 zero matrix. This is an acceptable permutaton, as all of the elements are adjacent to an identical number.
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
25 x 0, 0 x 1, 0 x 2
The elements within the matrix can be changed to 1 or 2. Changing any of the elements to 1 would also be an acceptable permutation, as the 1 would be surrounded by an adjacent integer, 0. For example, changing the central [2,2] element of the matrix:
0 0 0 0 0
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 0 0
24 x 0, 1 x 1, 0 x 2
However, changing the [2,2] element in the centre to a 2 would mean that all of the elements surrounding it would have to switch to 1, as 2 is not adjacent to 0.
0 0 0 0 0
0 1 1 1 0
0 1 2 1 0
0 1 1 1 0
0 0 0 0 0
16 x 0, 8 x 1, 1 x 2
I want to know how many permutations are possible from that zeroed 5x5 matrix by changing the elements to 1 and 2, whilst keeping neighbouring elements as adjacent integers. In other words, any permutations where 0 and 2 are adjacent are not allowed.
I also wish to know how many matrices contain a certain number of each integer. For example, both of the below matrices would be 24 x 0, 1 x 1, 0 x 2. Over every permutation, I'd like to know how many correspond to this frequency of integers.
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Again, sorry if I'm not being clear or my nomenclature is poor! Thanks for your time - I'd really appreciate some help with this, and any words or guidance would be kindly received.
Thanks,
Sam

First, what you're calling a permutation isn't.
Secondly your problem is that a naive brute force solution would look at 3^25 = 847,288,609,443 possible combinations. (Somewhat less, but probably still in the hundreds of billions.)
The right way to solve this is called dynamic programming. What you need to do for your basic problem is calculate, for i from 0 to 4, for each of the different possible rows you could have there how many possible matrices you could have had that end in that row.
Add up all of the possible answers in the last row, and you'll have your answer.
For the more detailed count, you need to divide it by row, by cumulative counts you could be at for each value. But otherwise it is the same.
The straightforward version should require tens of thousands of operation. The detailed version might require millions. But this will be massively better than the hundreds of billions that the naive recursive version takes.

Just search for some more simple rules:
1s can be distributed arbitrarily in the array, since the matrix so far only consists of 0s. 2s can aswell be distributed arbitrarily, since only neighbouring elements must be either 1 or 2.
Thus there are f(x) = n! / x! possibilities to distributed 1s and 2s over the matrix.
So the total number of possible permutations is 2 * sum(x = 1 , n * n){f(x)}.
Calculating the number of possible permutations with a fixed number of 1s can easily be solved by simple calculating f(x).
The number of matrices with a fixed number of 2s and 1s is a bit more tricky. Here you can only rely on the fact that all mirrored versions of the matrix yield the same number of 1s and 2s and are valid. Apart from using that fact you can only brute-force search for correct solutions.

Matrix of zeroes and ones without numpy

How would I create a matrix of single zeroes and ones in a size I specify without numpy? I tried looking this up but I only found results using it. I guess it would be using loops? Unless there's a more simple method?
For example, the size I specify could be 3 and the grid would be 3x3.
Col 0 Col 1 Col 2
Row 0 0 1 0
Row 1 0 0 1
Row 2 1 1 1

You could use a list comprehension:
def m(s):
return [s*[0] for _ in xrange(s)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.