How to iterate through a matrix column in python - python

I have a matrix with the cell values only 0 or 1.
I want to count how many ones or zeros are there in the same row or column to a given cell.
For example, the value matrix[r][c] is 1, so I want to know how many ones are there in the same row. This code does that:
count_in_row = 0
value = matrix[r][c]
for i in matrix[r]:
if i == value:
count_in_row += 1
The for cycle iterates through the same row and counts all ones (cells with the same value).
What if I want to do the same process with columns? Will I iterate through the whole matrix or it is possible through just one column?
PS: I don't want to use numpy, transpose or zip; better with composite cycle.

You have not specified what the datatype of your matrix is. If it is a list of lists, then there is no way to "get just one column", but the code still is similar (assuming that r and c are of type int):
I added the functionality to only count the cells adjacent to the cell in question (above, below, left and right; does NOT consider diagonals); this is done checking that the difference between indexes is not greater than 1.
count_in_row = 0
count_in_col = 0
value = matrix[r][c]
for j in range(len(matrix[r])):
if abs(j - c) <= 1: # only if it is adjacent
if matrix[r][j] == value:
count_in_row += 1
for i in range(len(matrix)):
if abs(i - r) <= 1: # only if it is adjacent
if matrix[i][c] == value:
count_in_col += 1
Or if following the way you started it (whole rows and columns, not only adjacent ones):
for col_val in matrix[r]:
if col_val == value:
count_in_row += 1
for row in matrix:
if row[c] == value:
count_in_col += 1
If you will be doind this for a lot of cells, then there are better ways to do that (even without numpy, but numpy is defenitively a very good option).

You can create a list for rows and cols and simply iterate over your matrix once while adding up the correct parts:
Create demodata:
import random
random.seed(42)
matrix = []
for n in range(10):
matrix.append(random.choices([0,1],k=10))
print(*matrix,sep="\n")
Output:
[1, 0, 0, 0, 1, 1, 1, 0, 0, 0]
[0, 1, 0, 0, 1, 1, 0, 1, 1, 0]
[1, 1, 0, 0, 1, 0, 0, 0, 1, 1]
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0]
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0]
[1, 0, 1, 1, 0, 0, 1, 1, 0, 0]
[0, 1, 1, 0, 0, 0, 1, 1, 1, 1]
Count things:
rows = [] # empty list for rows - you can simply sum over each row
cols = [0]*len(matrix[0]) # list of 0 that you can increment while iterating your matrix
for row in matrix:
for c,col in enumerate(row): # enumerate gives you the (index,value) tuple
rows.append( sum(x for x in row) ) # simply sum over row
cols[c] += col # adds either 0 or 1 to the col-index
print("rows:",rows)
print("cols:",cols)
Output:
rows: [4, 5, 5, 9, 2, 4, 6, 4, 5, 6] # row 0 == 4, row 1 == 5, ...
cols: [6, 6, 5, 4, 6, 5, 5, 5, 5, 3] # same for cols
Less code but taking 2 full passes over your matrix using zip() to transpose the data:
rows = [sum(r) for r in matrix]
cols = [sum(c) for c in zip(*matrix)]
print("rows:",rows)
print("cols:",cols)
Output: (the same)
rows: [4, 5, 5, 9, 2, 4, 6, 4, 5, 6]
cols: [6, 6, 5, 4, 6, 5, 5, 5, 5, 3]
You would have to time it, but the overhead of two full iteration and the zipping might be still worth it, as the zip() way is inheritently more optimized then looping over a list. Tradeoff might only be worth it for / up to / up from certain matrix sizes ...

I will not solve that for you, but maybe hint in the right direction...
# assuming a list of lists of equal length
# without importing any modules
matrix = [
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0],
[1, 1, 1, 1],
]
sum_rows = [sum(row) for row in matrix]
print(sum_rows) # [1, 2, 3, 4]
sum_columns = [sum(row[i] for row in matrix) for i in range(len(matrix[0]))]
print(sum_columns) # [4, 3, 2, 1]

This is a solution with just one for loop:
count_in_row = 0
count_in_column = 0
value = matrix[r][c]
for index, row in enumerate(matrix):
if index == r:
count_in_row = row.count(value)
if row[c] == value:
count_in_column += 1
print(count_in_row, count_in_column)

With numpy it's 1 command (each direction) and much faster
import numpy as np
A = np.array([[1, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 1, 0, 1, 1, 0],
[1, 1, 0, 0, 1, 0, 0, 0, 1, 1],
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 1, 1, 0, 1, 0, 1, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 1, 1, 0, 0],
[0, 1, 1, 0, 0, 0, 1, 1, 1, 1]])
rowsum = A.sum(axis=1)
colsum = A.sum(axis=0)
print("A ="); print(A);print()
print("rowsum:",rowsum)
print("colsum:",colsum)
rowsum: [4 5 5 9 2 4 6 4 5 6]
colsum: [6 6 5 4 6 5 5 5 5 3]

Related

How to create arrays with combinations between certain indexes of a fixed length and fixed sum

For example:
array = [4,3,2,0,0,0,0,0,0]
The 0th index should only have combinations with 3rd index and 6th index.
The 1st index should only have combinations with 4th index and 7th index.
The 2nd index should only have combinations with 5th index and 8th index.
(sum should stay the same between these indexes).
Then output should be:
[1,2,2,1,1,0,2,0,0]
[2,1,1,1,1,1,1,1,0]...
In both these combinations, sum between the respective indexes (listed above) remain the same.
Using the findPairs function resulting from the answer to your previous question:
from itertools import product
def findPairs(sum_value, len_value):
lst = range(sum_value + 1)
return [
pair
for pair in product(lst, repeat=len_value)
if sum(pair) == sum_value
]
import itertools
combinations = itertools.product(findPairs(array[0], 3), findPairs(array[1], 3), findPairs(array[2], 3))
result = [list(itertools.chain(*zip(p1, p2, p3))) for p1, p2, p3 in combinations]
print(result[0:10])
[[0, 0, 0, 0, 0, 0, 4, 3, 2], [0, 0, 0, 0, 0, 1, 4, 3, 1],
[0, 0, 0, 0, 0, 2, 4, 3, 0], [0, 0, 1, 0, 0, 0, 4, 3, 1],
[0, 0, 1, 0, 0, 1, 4, 3, 0], [0, 0, 2, 0, 0, 0, 4, 3, 0],
[0, 0, 0, 0, 1, 0, 4, 2, 2], [0, 0, 0, 0, 1, 1, 4, 2, 1],
[0, 0, 0, 0, 1, 2, 4, 2, 0], [0, 0, 1, 0, 1, 0, 4, 2, 1]]
...

Is there a way to find the largest change in a pandas dataframe column?

Im trying to find the largest difference between i and j in a series where i cannot be before j. Is there an efficient way to do this in pandas:
x = [1, 2, 5, 4, 2, 4, 2, 1, 7]
largest_change = 0
for i in range(len(x)):
for j in range(i+1, len(x)):
change = x[i] - x[j]
print(x[i], x[j], change)
if change > largest_change:
largest_change = change
The output would just be the value, in this case 4 from 5 to 1.
Try numpy broadcast with np.triu and max
arr = np.array(x)
np.triu(arr[:,None] - arr)
array([[ 0, -1, -4, -3, -1, -3, -1, 0, -6],
[ 0, 0, -3, -2, 0, -2, 0, 1, -5],
[ 0, 0, 0, 1, 3, 1, 3, 4, -2],
[ 0, 0, 0, 0, 2, 0, 2, 3, -3],
[ 0, 0, 0, 0, 0, -2, 0, 1, -5],
[ 0, 0, 0, 0, 0, 0, 2, 3, -3],
[ 0, 0, 0, 0, 0, 0, 0, 1, -5],
[ 0, 0, 0, 0, 0, 0, 0, 0, -6],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0]])
np.triu(arr[:,None] - arr).max()
Out[758]: 4
Besides Andy's smart method, here is another one propagating the minimum value backward whose advantage is to have linear time complexity, instead of quadratic, in case you handle a large amount of data.
a = np.flipud(np.array(x))
largest_change = (a - np.minimum.accumulate(a)).max()
How about this?
x = [1, 2, 5, 4, 2, 4, 2, 1, 7]
largest_change = 0
position = 0
for i in range(len(x)-1):
change = x[i] - min(x[i+1:])
if change > largest_change:
largest_change = change
position = i
print(x[position], min(x[position+1:]), largest_change)
Why don't you just take the diff then the max of that?
x = [1, 2, 5, 4, 2, 4, 2, 1, 7]
s = pd.Series(x)
z = abs(s.diff())
idx_max_val = z[z==z.max()].index[0]
print(f'Max difference in value ({z.max()}) occurs at the indices of {idx_max_val-1}:{idx_max_val}')
I would suggest rolling window:
import pandas
df = pandas.DataFrame({'col1': [1, 2, 5, 4, 2, 4, 2, 1, 7]})
df["diff"] = df['col1'].rolling(window=2).apply(lambda x: x[1] - x[0])
print(df["diff"].max())
Output: 6.0
Or did I misunderstand you and you just want the largest difference between any two values?
This would be:
import pandas
df = pandas.DataFrame({'col1': [1, 2, 5, 4, 2, 4, 2, 1, 7]})
max_diff = df["col1"].max() - df["col1"].min()
print("Min:", df["col1"].min(), "Max:", df["col1"].max(), "Diff:", max_diff)
Output:
Min: 1 Max: 7 Diff: 6

Comparing two lists on Python

I need help comparing two lists and returning the indices that they don't match.
a = [0, 1, 1, 0, 0, 0, 1, 0, 1]
b = [0, 1, 1, 0, 1, 0, 1, 0, 0]
indices 4 and 8 don't match and i need to return that as a list [4,8]
I've tried a few methods but they haven't worked for me.
Use zip to iterate over both lists at the same time and enumerate to get the indices during iteration, and write a list comprehension that filters out the indices where the list values don't match:
>>> [i for i, (x, y) in enumerate(zip(a, b)) if x != y]
[4, 8]
You could also just use a simple loop which scans the lists, item by item:
a = [0, 1, 1, 0, 0, 0, 1, 0, 1]
b = [0, 1, 1, 0, 1, 0, 1, 0, 0]
diff=[]
for i in range(0,len(a)):
if a[i]!=b[i]:
diff.append(i)
print diff
A list comprehension could also do the same thing:
diff=[i for i in range(len(a)) if a[i]!=b[i]]
print diff
If you are happy to use a 3rd party library, numpy provides one way:
import numpy as np
a = np.array([0, 1, 1, 0, 0, 0, 1, 0, 1])
b = np.array([0, 1, 1, 0, 1, 0, 1, 0, 0])
res = np.where(a != b)[0]
# array([4, 8], dtype=int64)
Relevant: Why NumPy instead of Python lists?
You can use zip :
a = [0, 1, 1, 0, 0, 0, 1, 0, 1]
b = [0, 1, 1, 0, 1, 0, 1, 0, 0]
count=0
indices=[]
for i in zip(a,b):
if i[0]!=i[1]:
indices.append(count)
count+=1
print(indices)
output:
[4, 8]

Count how often integer y occurs right after integer x in a numpy array

I have a very large numpy.array of integers, where each integer is in the range [0, 31].
I would like to count, for every pair of integers (a, b) in the range [0, 31] (e.g. [0, 1], [7, 9], [18, 0]) how often b occurs right after a.
This would give me a (32, 32) matrix of counts.
I'm looking for an efficient way to do this with numpy. Raw python loops would be too slow.
Here's one way...
To make the example easier to read, I'll use a maximum value of 9 instead of 31:
In [178]: maxval = 9
Make a random input for the example:
In [179]: np.random.seed(123)
In [180]: x = np.random.randint(0, maxval+1, size=100)
Create the result, initially all 0:
In [181]: counts = np.zeros((maxval+1, maxval+1), dtype=int)
Now add 1 to each coordinate pair, using numpy.add.at to ensure that duplicates are counted properly:
In [182]: np.add.at(counts, (x[:-1], x[1:]), 1)
In [183]: counts
Out[183]:
array([[2, 1, 1, 0, 1, 0, 1, 1, 1, 1],
[2, 1, 1, 3, 0, 2, 1, 1, 1, 1],
[0, 2, 1, 1, 4, 0, 2, 0, 0, 0],
[1, 1, 1, 3, 3, 3, 0, 0, 1, 2],
[1, 1, 0, 1, 1, 0, 2, 2, 2, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 0, 2],
[0, 4, 2, 3, 1, 0, 2, 1, 0, 1],
[0, 1, 1, 1, 0, 0, 2, 0, 0, 3],
[1, 2, 0, 1, 0, 0, 1, 0, 0, 0],
[2, 0, 2, 2, 0, 0, 2, 2, 0, 0]])
For example, the number of times 6 is followed by 1 is
In [188]: counts[6, 1]
Out[188]: 4
We can verify that with the following expression:
In [189]: ((x[:-1] == 6) & (x[1:] == 1)).sum()
Out[189]: 4
You can use numpy's built-in diff routine together with boolean arrays.
import numpy as np
test_array = np.array([1, 2, 3, 1, 2, 4, 5, 1, 2, 6, 7])
a, b = (1, 2)
sum(np.bitwise_and(test_array[:-1] == a, np.diff(test_array) == b - a))
# 3
If your array is multi-dimensional, you will need to flatten it first or make some small modifications to the code above.

Is there some elegant way to manipulate my ndarray

I have a matrix named xs:
array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
Now I want to replace the zeros by the nearest previous element in the same row (Assuming that the first column must be nonzero.).
The rough solution as following:
In [55]: row, col = xs.shape
In [56]: for r in xrange(row):
....: for c in xrange(col):
....: if xs[r, c] == 0:
....: xs[r, c] = xs[r, c-1]
....:
In [57]: xs
Out[57]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1],
[2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2]])
Any help will be greatly appreciated.
If you can use pandas, replace will explicitly show the replacement in one instruction:
import pandas as pd
import numpy as np
a = np.array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
df = pd.DataFrame(a, dtype=np.float64)
df.replace(0, method='pad', axis=1)
My version, based on step-by-step rolling and masking of initial array, no additional libraries required (except numpy):
import numpy as np
a = np.array([[1, 1, 1, 1, 1, 0, 1, 0, 0, 2, 1],
[2, 1, 0, 0, 0, 1, 2, 1, 1, 2, 2]])
for i in xrange(a.shape[1]):
a[a == 0] = np.roll(a,i)[a == 0]
if not (a == 0).any(): # when all of zeros
break # are filled
print a
## [[1 1 1 1 1 1 1 1 1 2 1]
## [2 1 1 1 1 1 2 1 1 2 2]]
Without going crazy with complicated indexing tricks that figure out consecutive zeros, you could have a while loop that goes for as many iterations as consecutive zeros there are in your array:
zero_rows, zero_cols = np.where(xs == 0)
while zero_cols :
xs[zero_rows, zero_cols] = xs[zero_rows, zero_cols-1]
zero_rows, zero_cols = np.where(xs == 0)

Categories

Resources