Dividing an input into diffferent lists by the lines - python

I'm trying to divide a multi-lined input into different lists by the lines. Is there a method for that?
For example:
#The input:
0 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
#What I request:
column1 = [0, 0, 0, 0, 0] #the first line as a list
column2 = [0, 0, 0, 0, 1] # the second line as a a list
.
. #so goes on
.

You don't need a method. This line is pretty much the standard:
first_line = list(map(int, input().split()))
Breakdown:
First, input() takes one line of input from input stream i.e. "0 0 0 0 0".
split method breaks it by ' ' (space) which results in ["0", "0", "0", "0", "0"]
Then map function is used to convert every character to integer i.e. from "0" to 0.
If you want to process all input lines regardless of the input length, then you can use fileinput.input(). Here's a sample code:
import fileinput
input_grid = [list(map(int, line.split())) for line in fileinput.input()]
input_grid:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]

You probably want a list of lists, not individual variables for each row. If you do it this way, once output is built, you can call any row by index, such as output[2], etc
s ='''0 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0'''
s = s.split('\n')
output = [list(map(int,x.split())) for x in s]
Output
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]

Related

Numpy scalable diagonal matrices

Assuming I have the variables:
A = 3
B = 2
C = 1
How can i transform them into diagonal matrices in the following form:
np.diag([1, 1, 1, 0, 0, 0])
Out[0]:
array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,1,1,0])
Out[1]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]])
np.diag([0,0,0,0,0,1])
Out[2]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]])
I would like this to be scalable, so for instance with 4 variables a = 500, b = 20, c = 300, d = 200 the size of the matrix will be 500 + 20 + 300 + 200 = 1020.
What is the easiest way to do this?
The obligatory solution with np.einsum, about ~2.25x slower than the accepted answer for the [500,20,200,300] arrays on a 2-core colab instance.
import numpy as np
A = 3
B = 2
C = 1
r = [A,B,C]
m = np.arange(len(r))
np.einsum('ij,kj->ijk', m.repeat(r) == m[:,None], np.eye(np.sum(r), dtype='int'))
Output
array([[[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1]]])
Here's one approach. The resulting array mats contains the matrices you're looking for.
A = 3
B = 2
C = 1
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
mats = [np.diag(((a <= ran) & (ran < b)).astype('int'))
for a,b in zip(ab_list[:-1],ab_list[1:])]
for mat in mats:
print(mat,'\n')
Result:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]
Edit: Here's a faster solution that yields the same result
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((len(n_list),total,total))
for k,p in enumerate(zip(ab_list[:-1],ab_list[1:])):
idx = np.arange(p[0],p[1])
mats[k,idx,idx] = 1
for mat in mats:
print(mat,'\n')
This seems to yield a ~10% speedup over the currently accepted solution
Another with roughly equivalent performance:
n_list = [A,B,C]
m = len(n_list)
ab_list = np.cumsum([0] + n_list)
total = ab_list[-1]
ran = np.arange(total)
mats = np.zeros((m,total,total))
idx = [k for a,b in zip(ab_list[:-1],ab_list[1:]) for k in range(a,b)]
mats[[k for k,n in enumerate(n_list) for _ in range(n)],
idx,idx] = 1
for mat in mats:
print(mat,'\n')
You can achieve even better performance by just allocating the array once, then setting the values all at once by specifying the indices. The indices are fortunately easy to obtain.
import numpy as np
a = [3, 2, 1] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
Output:
[[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]]
Comparing to #Ben Grossmann's answer, with A=3000, B=2000, C=1000 and 100 repeats:
def A():
'''My solution'''
a = [3000, 2000, 1000] # Put your values in a list
s = np.sum(a)
m = np.zeros((len(a), s, s), dtype=int) # Initialize array once
indices = (np.repeat(range(len(a)), a), *np.diag_indices(s, 2)) # Get indices
m[indices] = 1 # Set the diagonals at once
return m
def B():
'''Bens solution'''
A = 3000
B = 2000
C = 1000
n_list = [A,B,C]
ab_list = np.cumsum([0] + n_list)
ran = np.arange(ab_list[-1])
return [np.diag(((a <= ran) & (ran < b)).astype('int')) for a,b in zip(ab_list[:-1], ab_list[1:])]
print(f'Timings:')
timeA = timeit.timeit(A, number=100)
timeB = timeit.timeit(B, number=100)
ratio = timeA / timeB
print(f'This solution: {timeA} seconds')
print(f'Current accepted answer: {timeB} seconds')
if ratio < 1:
print(f'This solution is {1 / ratio} times faster than Bens solution')
else:
print(f'Bens solution is {ratio} times faster than this solution')
Output:
Timings:
This solution: 1.6834218999993027 seconds
Current accepted answer: 5.096610300000066 seconds
This solution is 3.027529997086397 times faster than Bens solution
EDIT: Changed the "indices" algorithm to use np.repeat instead of np.concatenate.
One posible method ( don't think it's optimal but it works):
import numpy as np
a = 3
b = 2
c = 1
values = [a,b,c] #create a list with values
n = sum(values) #calc total length of diagnal
#create an array with cumulative sums but starting from 0 to use as index
idx_vals = np.zeros(len(values)+1,dtype=int)
np.cumsum(values,out=idx_vals[1:]);
#create every diagonal using values, then create diagonal matrices and
#save them in `matrices` list
matrices = []
for idx,v in enumerate(values):
diag = np.zeros(n)
diag[idx_vals[idx]:idx_vals[idx]+v] = np.ones(v)
print(diag)
matrices.append(np.diag(diag))
Yet another possibility:
import numpy as np
# your constants here
constants = [3, 2, 1] # [A, B, C]
size = sum(constants)
cumsum = np.cumsum([0] + constants)
for i in range(len(cumsum) - 1):
inputVector = np.zeros(size, dtype=int)
inputVector[cumsum[i]:cumsum[i+1]] = 1
matrix = np.diag(inputVector)
print(matrix, '\n')
Output:
[[1 0 0 0 0 0]
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 1 0 0]
[0 0 0 0 1 0]
[0 0 0 0 0 0]]
[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 1]]

Python - numpy arrays - Abelian sandpile

I'm trying to do the Abelian sandpile model using a simple numpy array.
When a 'pile' is 4 >=, then it collapse among its neighbors.
I understand how the "gravity" thing works, but I can't think of a way of making it.
Here's the code to make my array :
import numpy as np
spile = np.zeros((5, 5), dtype=np.uint32)
spile[2, 2] = 16
Which gives me the following :
array([[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 16, 0, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0]], dtype=uint32)
Now, I need the "gravity" code that does these steps of calculation :
array([[ 0, 0, 0, 0, 0],
[ 0, 0, 4, 0, 0],
[ 0, 4, 0, 4, 0],
[ 0, 0, 4, 0, 0],
[ 0, 0, 0, 0, 0]], dtype=uint32)
array([[ 0, 0, 1, 0, 0],
[ 0, 2, 1, 2, 0],
[ 1, 1, 0, 1, 1],
[ 0, 2, 1, 2, 0],
[ 0, 0, 1, 0, 0]], dtype=uint32)
The last array is the final result I'm trying to get.
I'm not trying to make you guys code for me, I just need some ideas as I've never ever did such a thing (but feel free to provide a code if you're that kind :p ).
Use np.divmod to identify where the cells tumble and how much tumbles. Then use array slicing to shift the amounts tumbled and add back into the sandpile.
import numpy as np
spile = np.zeros((5, 5), dtype=np.uint32)
spile[2, 2] = 16
def do_add( spile, tumbled ):
""" Updates spile in place """
spile[ :-1, :] += tumbled[ 1:, :] # Shift N and add
spile[ 1:, :] += tumbled[ :-1, :] # Shift S
spile[ :, :-1] += tumbled[ :, 1:] # Shift W
spile[ :, 1:] += tumbled[ :, :-1] # Shift E
def tumble( spile ):
while ( spile > 3 ).any():
tumbled, spile = np.divmod( spile, 4 )
do_add( spile, tumbled )
# print( spile, '\n' ) # Uncomment to print steps
return spile
print( tumble( spile ) )
# or tumble( spile ); print( spile )
# [[0 0 1 0 0]
# [0 2 1 2 0]
# [1 1 0 1 1]
# [0 2 1 2 0]
# [0 0 1 0 0]]
Uncommented print statement prints these results
[[0 0 0 0 0]
[0 0 4 0 0]
[0 4 0 4 0]
[0 0 4 0 0]
[0 0 0 0 0]]
[[0 0 1 0 0]
[0 2 0 2 0]
[1 0 4 0 1]
[0 2 0 2 0]
[0 0 1 0 0]]
[[0 0 1 0 0]
[0 2 1 2 0]
[1 1 0 1 1]
[0 2 1 2 0]
[0 0 1 0 0]]
http://rosettacode.org/wiki/Abelian_sandpile_model

Do you have some advices about signal processing on binary time series?

I have a binary time series with some ASK modulated signals in different frequencies inside of it.
Let's say it's something like this: x = [0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0, ...]
What's matter to me is having all the '1' and '0' in an interval of 4 samples or more, but sometimes the '0' and '1' change places like this: x1 = [0,0,0,1,1,1,1,1] when it had to be x2 = [0,0,0,0,1,1,1,1]
And there's also some noise as spikes as seen in n1 = [0,0,0,0,0,0,1,1,0,0,0,0,0] when it should be only zeros.
I've already tried moving average and it introduced a lag to the signal that was't good for my application.
Do you have some advices about signal processing on binary time series?
The following code finds the indices of all continuous sequences with the length smaller than 4 (min_cont_length). It also gives you the lengths of the problematic sectors, so you can decide how to handle them.
import numpy as np
def find_index_of_err(signal, min_cont_length = 4):
# pad sides to detect problems at the edges
signal = np.concatenate(([1-signal[0]],signal,[1-signal[-1]]))
# calculate differences from 1 element to the next
delta = np.concatenate(([0], np.diff(signal, 1)))
# detect discontinuities
discontinuity = np.where(delta!=0)[0]
# select discontinuities with matching length (< min_cont_length)
err_idx = discontinuity[:-1][np.diff(discontinuity) < min_cont_length] - 1
# get also the size of the gap
err_val = np.diff(discontinuity)[np.argwhere(np.diff(discontinuity) < min_cont_length).flatten()]
return err_idx, err_val
# some test signals
signals = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]])
for sig in signals:
index, value = find_index_of_err(sig)
print(sig, index, value)
# Output:
# [1 0 0 0 0 0 0 0 0 0 0] [0] [1]
# [0 0 1 0 0 0 0 0 0 0 0] [0 2] [2 1]
# [0 0 0 0 1 0 0 0 0 0 0] [4] [1]
# [0 0 0 0 0 0 1 1 0 0 0] [6 8] [2 3]
# [0 0 0 0 0 0 1 1 1 1 1] [] []

how to do select the rows with same value across columns in pandas?

I have a df with 9 columns. Each column has values 0,1.
1 -means outlier.
It's outliers according to 9 different algorithms.
I want to select those true outliers, the following query does work.
true_outliers= outliers[
(outliers['isolation_forest_300000']==1) &
(outliers['knn_1000']==1) &
(outliers['knn_10000']==1)&
(outliers['abod_neighbors_5_1000']==1)&
(outliers['abod_neighbors_5_10000']==1)&
(outliers['abod_neighbors_10_1000']==1)&
(outliers['hbos_1000']==1)&
(outliers['hbos_10000']==1)&
(outliers['hbos_100000']==1)]
however how can i refactor it like this:
for col in outliers.columns.tolist():
s= outliers[outliers[col] == 1]
I want it go thru loop and only select those rows that are '1' in each column
If you want to select rows with 1 on every column, using a mask is better
Sample df:
Out[266]:
isolation_forest_300000 knn_1000 knn_10000 abod_neighbors_5_1000 \
0 1 1 1 1
1 0 0 0 1
2 0 0 0 0
3 1 1 1 1
abod_neighbors_5_10000 abod_neighbors_10_1000 hbos_1000 hbos_10000 \
0 1 1 1 1
1 1 0 0 0
2 0 0 0 0
3 1 1 1 1
hbos_100000
0 1
1 0
2 0
3 1
use eq and all to create mask and slicing
df[df.eq(1).all(1)]
Out[267]:
isolation_forest_300000 knn_1000 knn_10000 abod_neighbors_5_1000 \
0 1 1 1 1
3 1 1 1 1
abod_neighbors_5_10000 abod_neighbors_10_1000 hbos_1000 hbos_10000 \
0 1 1 1 1
3 1 1 1 1
hbos_100000
0 1
3 1
I think this can help you:
import functools
import operator
import pandas as pd
data = [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]
df = pd.DataFrame(
data, columns=[str(i) for i in range(9)]
)
condition = functools.reduce(
operator.and_,
(df[col] == 1 for col in df.columns)
)
print(df[condition])

How to to read a matrix from a given file?

I have a text file which contains matrix of N * M dimensions.
For example the input.txt file contains the following:
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0
0,0,2,1,0,2,0,0,0,0
0,0,2,1,1,2,2,0,0,1
0,0,1,2,2,1,1,0,0,2
1,0,1,1,1,2,1,0,2,1
I need to write python script where in I can import the matrix.
My current python script is:
f = open ( 'input.txt' , 'r')
l = []
l = [ line.split() for line in f]
print l
the output list comes like this
[['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'],
['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'], ['0,0,0,0,0,0,0,0,0,0'],
['0,0,2,1,0,2,0,0,0,0'], ['0,0,2,1,1,2,2,0,0,1'], ['0,0,1,2,2,1,1,0,0,2'],
['1,0,1,1,1,2,1,0,2,1']]
I need to fetch the values in int form . If I try to type cast, it throws errors.
Consider
with open('input.txt', 'r') as f:
l = [[int(num) for num in line.split(',')] for line in f]
print(l)
produces
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 2, 1, 0, 2, 0, 0, 0, 0], [0, 0, 2, 1, 1, 2, 2, 0, 0, 1], [0, 0, 1, 2, 2, 1, 1, 0, 0, 2], [1, 0, 1, 1, 1, 2, 1, 0, 2, 1]]
Note that you have to split on commas.
If you do have blank lines then change
l = [[int(num) for num in line.split(',')] for line in f ]
to
l = [[int(num) for num in line.split(',')] for line in f if line.strip() != "" ]
You can simply use numpy.loadtxt.
Easy to use, and you can also specify your delimiter, datatypes etc.
specifically, all you need to do is this:
import numpy as np
input = np.loadtxt("input.txt", dtype='i', delimiter=',')
print(input)
And the output would be:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 2 1 0 2 0 0 0 0]
[0 0 2 1 1 2 2 0 0 1]
[0 0 1 2 2 1 1 0 0 2]
[1 0 1 1 1 2 1 0 2 1]]
You can do this:
fin = open('input.txt','r')
a=[]
for line in fin.readlines():
a.append( [ int (x) for x in line.split(',') ] )
The following does what you want:
l = []
with open('input.txt', 'r') as f:
for line in f:
line = line.strip()
if len(line) > 0:
l.append(map(int, line.split(',')))
print l
You should not write your csv parser, consider the csv module when reading such files and use the with statement to close after reading:
import csv
with open('input.txt') ad f:
data = [map(int, row) for row in csv.reader(f)]
Check out this small one line code for reading matrix,
matrix = [[input() for x in range(3)] for y in range(3)]
this code will read matrix of order 3*3.
import numpy as np
f = open ( 'input.txt' , 'r')
l = []
l = np.array([ line.split() for line in f])
print (l)
type(l)
output:
[['0'] ['0'] ['0'] ['0,0,0,0,0,0,0,0,0,0,0']
['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']
['0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0']]
numpy.ndarray
The following code converts the above input to matrix form:
f = open ('input.txt' , 'r')
l = []
l = [line.split() for line in f]
l=np.array(l)
l=l.astype(np.int)

Categories

Resources