How would I create a matrix of single zeroes and ones in a size I specify without numpy? I tried looking this up but I only found results using it. I guess it would be using loops? Unless there's a more simple method?
For example, the size I specify could be 3 and the grid would be 3x3.
Col 0 Col 1 Col 2
Row 0 0 1 0
Row 1 0 0 1
Row 2 1 1 1
You could use a list comprehension:
def m(s):
return [s*[0] for _ in xrange(s)]
Related
I have a dataframe with one column called label which has the values [0,1,2,3,4,5,6,8,9].
I would like to make dummy columns out of this, but I would like some labels to be joined together, so for example I want dummy_012 to be 1 if the observation has either label 0, 1 or 2.
If i use the command df2 = pd.get_dummies(df, columns=['label']), it would create 9 columns, 1 for each label.
I know I can use df2['dummy_012']=df2['dummy_0']+df2['dummy_1']+df2['dummy_2'] after that to turn it into one joint column, but I want to know if there's a more pythonic way of doing it (or some function where i can just change the parameters to the joins).
Maybe this approach can give a idea:
groups = ['012', '345', '6789']
for gp in groups:
df.loc[df['Label'].isin([int(x) for x in gp]), 'Label_Group'] = f'dummies_{gp}'
Output:
Label Label_Group
0 0 dummies_012
1 1 dummies_012
2 2 dummies_012
3 3 dummies_345
4 4 dummies_345
5 5 dummies_345
6 6 dummies_6789
7 8 dummies_6789
8 9 dummies_6789
And then apply dummy:
df_dummies = pd.get_dummies(df['Label_Group'])
dummies_012 dummies_345 dummies_6789
0 1 0 0
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 1 0
6 0 0 1
7 0 0 1
8 0 0 1
I don't know that this is pythonic because a more elegant solution might exist, but I does allow you to change parameters and it's vectorized. I've read that get_dummies() can be a bit slow with large amounts of data and vectorizing pandas is good practice in general. So I vectorized this function and had it do its calculations with numpy arrays. It should give you a boost in performance as the dataset increases in size compared to similar functions.
This function will take your dataframe and a list of numbers as strings and will return your dataframe with the column you wanted.
def get_dummy(df,column_nos):
new_col_name = 'dummy_'+''.join([i for i in column_nos])
vector_sum = sum([df[i].values for i in column_nos])
df[new_col_name] = [1 if i>0 else 0 for i in vector_sum]
return df
In case you'd rather the input to be integers rather than strings, you can tweak the above function to look like below.
def get_dummy(df,column_nos):
column_names = ['dummy_'+str(i) for i in column_nos]
new_col_name = 'dummy_'+''.join([str(i) for i in sorted(column_nos)])
vector_sum = sum([df[i].values for i in column_names])
df[new_col_name] = [1 if i>0 else 0 for i in vector_sum]
return df
I wanted to construct a 6 x 9 matrix with entries zeros and ones in a specific way as follows. In the zeroth row column, 0 to 2 should be 1 and in the first-row column,3 to 5 should be one and in the second-row column, 6 to 8 should be one, with all the other entries to be zeros. In the third row, element 0,3,6 should be one and the other should be zeros. In the fourth row, element 1,4,7 should be one and the other elements should be zeros. In the fifth row,2,5,8 should be one and the remaining should be zeros. Half of the rows follow one way enter the value 1 and the other half of the row follows different procedures to enter the value one. How do extend this some 20 x 100 case where the first 10 rows follow one procedure as mentioned above and the second half follows different procedures
The 6x9 by matrix looks as follows
[[1,1,1,0,0,0,0,0,0],
[0,0,0,1,1,1,0,0,0],
[0,0,0,0,0,0,1,1,1],
[1,0,0,1,0,0,1,0,0],
[0,1,0,0,1,0,0,1,0],
[0,0,1,0,0,1,0,0,1]]
EDIT: Code I used to create this matrix:
import numpy as np
m=int(input("Enter the value of m, no. of points = "))
pimatrix=np.zeros((2*m +1)*(m**2)).reshape((2*m+1),(m**2))
for i in range(2*m + 1):
for j in range(m**2):
if((i<m) and ((j<((i+1)*m) and j>=(i*m)))):
pimatrix[i][j]=1
if (i>(m-1)):
for k in range(-1,m-1,1):
if(j == i+(k*m)):
pimatrix[i][j]=1
if i==2*m:
pimatrix[i][j]=1
print(pimatrix)
Try to use numpy.put function numpy.put
The best approach depends on the rules you plan to follow, but an easy approach would be to initialise the array as an array of zeroes:
import numpy as np
a = np.zeros([3, 4], dtype = int)
You can then write the logic to loop over the appropriate rows and set 1's as needed. You can simply access any element of the array by its coordinates:
a[2,1] = 1
print(a)
Result:
[[0 0 0 0]
[0 0 0 0]
[0 1 0 0]]
Without a general rule, it's hard to say what your intended logic is exactly, but assuming these rules: the top half of the array has runs of three ones on each consecutive row, starting in the upper left and moving down a row at the end of every run, until it reaches the bottom of the top half, where it wraps around to the top; the bottom half has runs of single ones, following the same pattern.
Implementing that, with your given example:
import numpy as np
a = np.zeros([6, 9], dtype=int)
def set_ones(a, run_length, start, end):
for n in range(a.shape[1]):
a[start + ((n // run_length) % (end - start)), n] = 1
set_ones(a, 3, 0, a.shape[0] // 2)
set_ones(a, 1, a.shape[0] // 2, a.shape[0])
print(a)
Result:
[[1 1 1 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 0 0 0 0 1 1 1]
[1 0 0 1 0 0 1 0 0]
[0 1 0 0 1 0 0 1 0]
[0 0 1 0 0 1 0 0 1]]
Given a numpy array of 2300 rows and 44 columns, I'd like my script to check for equal rows and to return arrays of those equal rows with the according indices in the original matrix.
Example:
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
1 2 3 4 5
Result:
equal_arrays1 = [1,2,3]
equal_arrays2 = [0,4]
My original data set consists of zero rows starting from 1323 to 1699. The result should then be:
equal_array1=[1323,...,1699]
What I did up till now is using the following code:
import numpy as np
input_data = np.load('1IN.npy')
print(np.shape(input_data))
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
if np.array_equal(input_data[:,i],input_data[:,j]):
print (i, j),
else: break
but this led to the error:
if np.array_equal(input_data[:,i],input_data[:,j]) :
IndexError: index 1302 is out of bounds for axis 1 with size 44
I think that this is not the best way to go for what I want to achieve, so if anyone has a better alternative or could explain what I need to fix, I'd be glad as I'm new to python.
You want to check only rows, so remove the check on column equality:
matching_pairs = []
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
matching_pairs.append((i, j))
# break?
print(matching_pairs)
Not sure what the break is about? You may want to break if you found a j matching your i, but you don't want to break if you don't find it, otherwise you will only check i against i+1 and nothing more.
Say I have a 1D array of values corresponding to x, y, and z values like this:
x y z arr_1D
0 0 0 0
1 0 0 1
0 1 0 2
1 1 0 3
0 2 0 4
1 2 0 5
0 0 1 6
...
0 2 3 22
1 2 3 23
I want to get arr_1D into a 3D array arr_3D with shape (nx,ny,nz) (in this case (2,3,4)). I'd like to the values to be referenceable using arr_3D[x_index, y_index, z_index], so that, for example, arr_3D[1,2,0]=5. Using numpy.reshape(arr_1D, (2,3,4)) gives me a 3D matrix of the right dimensions, but not ordered the way I want. I know I can use the following code, but I'm wondering if there's a way to avoid the clunky nested for loops.
arr_1d = np.arange(24)
nx = 2
ny = 3
nz = 4
arr_3d = np.empty((nx,ny,nz))
count = 0
for k in range(nz):
for j in range(ny):
for i in range(nx):
arr_3d[i,j,k] = arr_1d[count]
count += 1
print arr_3d[1,2,0]
output: 5
What would be the most pythonic and/or fast way to do this? I'll typically want to do this for arrays of length on the order of 100,000.
You where really close, but since you want the x axis to be the one that is iterated trhough the fastest, you need to use something like
arr_3d = arr_1d.reshape((4,3,2)).transpose()
So you create an array with the right order of elements but the dimensions in the wrong order and then you correct the order of the dimensions.
What is an efficient solution to generate all the possible graphs using an incidence matrix?
The problems is equivalent of generating all the possible binary triangular matrix.
My first idea was to use python with itertools. For instance, for generating all the possibile 4x4 matrix
for b in itertools.combinations_with_replacement((0,1), n-3):
b_1=[i for i in b]
for c in itertools.combinations_with_replacement((0,1), n-2):
c_1=[i for i in c]
for d in itertools.combinations_with_replacement((0,1), n-1):
d_1=[i for i in d]
and then you create the matrix adding the respective number of zeroes..
But this is not correct since we skip some graphs...
So, any ideas?
Perhaps i can use the isomorphism between R^n matrix and R^(n*n) vector, and generate all the possibile vector of 0 and 1, and then cut it into my matrix, but i think there's a more efficient solutions.
Thank you
I add the matlab tab because it's a problem you can have in numerical analysis and matlab.
I assume you want lower triangular matrices, and that the diagonal needs not be zero. The code can be easily modified if that's not the case.
n = 4; %// matrix size
vals = dec2bin(0:2^(n*(n+1)/2)-1)-'0'; %// each row of `vals` codes a matrix
mask = tril(reshape(1:n^2, n, n))>0; %// decoding mask
for v = vals.' %'// `for` picks one column each time
matrix = zeros(n); %// initiallize to zeros
matrix(mask) = v; %// decode into matrix
disp(matrix) %// Do something with `matrix`
end
Each iteration gives one possible matrix. For example, the first matrices for n=4 are
matrix =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
matrix =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
matrix =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 0
matrix =
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 1
Here is an example solution using numpy that generates all simple graphs:
It first generates the indices of the upper triangular part iu. The loop converts the number k to it's binary representation and then assigns it to the upper triangular part G[iu].
import numpy as np
n = 4
iu = np.triu_indices(n,1) # Start at first minor diagonal
G = np.zeros([n,n])
def dec2bin(k, bitlength=0):
return [1 if digit=='1' else 0 for digit in bin(k)[2:].zfill(bitlength)]
for k in range(0,2**(iu[0].size)):
G[iu] = dec2bin(k, iu[0].size)
print(G)