Conditional iloc statement. "Time series analysis" - python

In Python, I have a data set with several rows and columns. If the value of one column=X I am interested in the value of the row above it and the column to the right so plus [1,1] from the previous location of value x.
What's the best way to approach this? If cell =X, is there a dynamic way I can get the location and then add X amount of rows and columns to it "since I am interested in the value of that cell"?
Visual Example of Question

Say our array is defined as:
import numpy as np
arr = np.array([[9, 8, 4, 5, 2],
[0, 3, 5, 6, 7],
[6, 6, 2, 0, 6],
[2, 0, 2, 5, 8],
[2, 6, 9, 7, 9]])
A straight forward solution is to just loop through the column of interest.
colIndex = 2
X = 2
rDelta = -1 #row change, up is negative!
cDelta = 1 #col change, right is positive
numberRows = np.shape(arr)[1]
output = []
#This loops through all the values in the specified column and looks for
#matches, if a match is found it is added to an output list
#Note that we only look at values that would result in valid output
# for example we will not look at the first row if
# we would need to index into the row above for the output
for i in range(max(-rDelta,0),min(numberRows,numberRows-rDelta)):
if(arr[i,colIndex]==X):
#replace this with your processing
output.append(arr[i+rDelta,colIndex+cDelta])
After running this code I get that output as
[6, 0]
which are the values up and to the right of the 2's in the second column.

Related

Remove entire row of np array if there is duplicate in first column

I have the following test array:
arr = np.array([[1, 2, 3], [1,3,7], [2,1,3], [4, 5, 6], [1,4,7], [2,7,6])
I need to remove every row that has a duplicate value in the first column (but still keeping the first instance of that value). For this test array I require the following output:
result=[1,2,3],[2,1,3],[4,5,6]
So the first row where 1 is in the first column is kept, and the first row where 2 is in the column is kept etc...
Any help would be appreciated!
return_index in np.unique is quite useful for this:
_, i = np.unique(arr[:,0], return_index=True)
arr[i]
array([[1, 2, 3],
[2, 1, 3],
[4, 5, 6]])

How to iterate through a section of a 2D list

I want to go through just a section of a 2D list, rather than the whole thing.
Here's essentially what it is I want to do:
Let's say the user inputs the coordinates [1,1] (so, row 1 column 1)
If I have the 2D list:
[[1,3,7],
[4,2,9],
[13,5,6]]
Then I want to iterate through all the elements adjacent to the element at [1,1]
Furthermore, if the element is at a corner or the edge of the 2D list, (so basically if the user enters [0,0], for example) then I want to just want to get back the elements at [0,0], [0,1], [1,0], and [1,1]
So essentially I just want elements adjacent to a specific to a certain point in the 2D array.
Here's what I've done so far:
I've made it so that it assigns 4 variables at the start of the code: starting_row, ending_row, starting_column, and ending_column. These variables are assigned values based off of which coordinates the user wants to input (if they the row is 0 or len(list) then the for loop runs accordingly. The same goes for the columns).
Then, I use a nested for loop to go through every element
for row in range(row_start, row_end+1):
for column in range(column_start, column_end+1):
print(lst[row,column])
Only thing is, it doesn't seem to work correctly and often outputs the whole entire 2D list when enter a list size of more than 3x3 elements (all the lists will be square lists)
You can slice the list of lists according to the given row and column. For the lower bounds, use max with 0 to avoid slicing with a negative index, but not so for the upper bounds since it is okay for the stopping index of a slice to be out of the range of a list:
def get_adjacent_items(matrix, row, col):
output = []
for r in matrix[max(row - 1, 0): row + 2]:
for i in r[max(col - 1, 0): col + 2]:
output.append(i)
return output
or, with a list comprehension:
def get_adjacent_items(matrix, row, col):
return [i for r in matrix[max(row - 1, 0): row + 2] for i in r[max(col - 1, 0): col + 2]]
so that given:
m = [[1, 3, 7],
[4, 2, 9],
[13, 5, 6]]
get_adjacent_items(m, 0, 0) returns: [1, 3, 4, 2]
get_adjacent_items(m, 1, 1) return: [1, 3, 7, 4, 2, 9, 13, 5, 6]
get_adjacent_items(m, 2, 1) returns: [4, 2, 9, 13, 5, 6]
get_adjacent_items(m, 2, 2) returns: [2, 9, 5, 6]

python - Adding combinations of adjacent rows in a matrix

This is my first post here and I'm a python beginner - all help is appreciated!
I'm trying to add all combinations of adjacent rows in a numpy matrix. i.e. row 1 + row 2, row 2 + row 3, row 3 + row 4, etc... with output to a list
I will then look for the smallest of these outputs and select that item in the list to be printed
I believe I need to use a for loop of some sort but I really am a novice...
Just iterate over the length of the array - 1 and add the pairs as you go into a new list. Then, select the one you want. For example:
>>> x = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> print [x[i] + x[i+1] for i in range(len(x)-1)]
[array([5, 7, 9]), array([11, 13, 15])]
Suppose you have this
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
You can first calculate the sum of each row by using np.sum(arr, axis=1) the argument axis=1 allows to sum each column entries for each line.
In this case, sums = np.sum(arr, axis=1) = array([ 6, 15, 24]).
Then you can iterate over this tab to add the different sums :
lst_sums = []
for s in range(len(sums)-1) :
lst_sums.append(sums[i]+sums[i+1])
Then you can sorted or getting the np.min(sums)
If you need more details you can look at numpy function docs, same for the lists

Adding Values to section in numpy Array

I have 1D array in numpy, and I want to add a certain value to part of the array.
For example, if the array is:
a = [1, 2, 3, 4, 5]
I want to add the value 7 to 2nd and 3rd columns to get:
a = [1, 2, 10, 11, 5]
Is there any simple way to do this?
Thanks!
You can index the array with another array containing the indices:
a[[2,3]] += 7
If your columns have some pattern, like in this specific case, they are contiguous, then you can use fancy indexing:
a = np.array([1, 2, 3, 4, 5])
a[2:4] += 7
Note here 2:4 means "from column 2(included) to column 4(excluded)", thus it's column 2 and 3.

Pyspark: using filter for feature selection

I have an array of dimensions 500 x 26. Using the filter operation in pyspark, I'd like to pick out the columns which are listed in another array at row i. Ex: if
a[i]= [1 2 3]
Then pick out columns 1, 2 and 3 and all rows. Can this be done with filter command? If yes, can someone show an example or the syntax?
Sounds like you need to filter columns, but not records. Fo doing this you need to use Spark's map function - to transform every row of your array represented as an RDD. See in my example:
# generate 13 x 10 array and creates rdd with 13 records, each record contains a list with 10 elements
rdd = sc.parallelize([range(10) for i in range(13)])
def make_selector(cols):
"""use closure to configure select_col function
:param cols: list - contains columns' indexes to select from every record
"""
def select_cols(record):
return [record[c] for c in cols]
return select_cols
s = make_selector([1,2])
s([0,1,2])
>>> [1, 2]
rdd.map(make_selector([0, 3, 9])).take(5)
results in
[[0, 3, 9], [0, 3, 9], [0, 3, 9], [0, 3, 9], [0, 3, 9]]
This is essentially the same answer as #vvladymyrov's, but without closures:
rdd = sc.parallelize([range(10) for i in range(13)])
columns = [0,3,9]
rdd.map(lambda record: [record[c] for c in columns]).take(5)
results in
[[0, 3, 9], [0, 3, 9], [0, 3, 9], [0, 3, 9], [0, 3, 9]]

Categories

Resources