outputting large matrix in python from a dictionary - python

I have a python dictionary formatted in the following way:
data[author1][author2] = 1
This dictionary contains an entry for every possible author pair (all pairs of 8500 authors), and I need to output a matrix that looks like this for all author pairs:
"auth1" "auth2" "auth3" "auth4" ...
"auth1" 0 1 0 3
"auth2" 1 0 2 0
"auth3" 0 2 0 1
"auth4" 3 0 1 0
...
I have tried the following method:
x = numpy.array([[data[author1][author2] for author2 in sorted(data[author1])] for author1 in sorted(data)])
print x
outf.write(x)
However, printing this leaves me with this:
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
...,
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
and the output file is just a blank text file. I am trying to format the output in a way to read into Gephi (https://gephi.org/users/supported-graph-formats/csv-format/)

You almost got it right, your list comprehension is inverted. This will give you the expected result:
d = dict(auth1=dict(auth1=0, auth2=1, auth3=0, auth4=3),
auth2=dict(auth1=1, auth2=0, auth3=2, auth4=0),
auth3=dict(auth1=0, auth2=2, auth3=0, auth4=1),
auth4=dict(auth1=3, auth2=0, auth3=1, auth4=0))
np.array([[d[i][j] for i in sorted(d.keys())] for j in sorted(d[k].keys())])
#array([[0, 1, 0, 3],
# [1, 0, 2, 0],
# [0, 2, 0, 1],
# [3, 0, 1, 0]])

You could use pandas. Using #Saullo Castro input:
import pandas as pd
df = pd.DataFrame.from_dict(d)
Result:
>>> df
auth1 auth2 auth3 auth4
auth1 0 1 0 3
auth2 1 0 2 0
auth3 0 2 0 1
auth4 3 0 1 0
And if you want to save you can just do df.to_csv(file_name)

Related

How do you convert a matrix into a string? [duplicate]

This question already has answers here:
Printing 2D-array in a grid
(6 answers)
Closed 1 year ago.
let's say I have this matrix: m = [[0 for i in range(5)] for i in range(5)],
which when printed, outputs this:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
How do I make it so that it outputs something like this:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
You can simply unpack the list
m = [[0 for i in range(5)] for i in range(5)]
for i in m:
print(*i)
Output:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
You can use the below where m is your matrix and can control the spacing in the ' '.join()
for line in m:
print (' '.join(map(str, line)))

one hot encode with pandas get_dummies missing values

I have a dataset in the form of a DataFrame and each row has a label ranging from 1-5. I am doing a one hot encode using pd.get_dummies(). If my dataset has all 5 labels there is not problem. However not all sets contain all 5 numbers so the encode just skips the missing value and creates a problem for new datasets coming in. Can I set a range so that the one hot encode knows there should be 5 labels? Or would I have to append 1,2,3,4,5 to the end of the array before I perform the encode and then delete the last 5 entries?
Correct encode: values 1-5 are encoded
arr = np.array([1,2,5,3,1,5,1,4])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]
[0 0 0 1 0]]
Missing value encode: this dataset is missing label 4.
arr = np.array([1,2,5,3,1,5,1,])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0]
[0 1 0 0]
[0 0 0 1]
[0 0 1 0]
[1 0 0 0]
[0 0 0 1]
[1 0 0 0]]
Set up the CategoricalDtype before encoding to ensure all categories are represented when getting dummies:
import numpy as np
import pandas as pd
arr = np.array([1, 2, 5, 3, 1, 5, 1])
df = pd.DataFrame(arr, columns=['test'])
# Setup Categorical Dtype
df['test'] = df['test'].astype(pd.CategoricalDtype(categories=[1, 2, 3, 4, 5]))
hotarr = np.array(pd.get_dummies(df['test']))
print(hotarr)
Alternatively can reindex after get_dummies with fill_value=0 to add the missing columns:
hotarr = np.array(pd.get_dummies(df['test'])
.reindex(columns=[1, 2, 3, 4, 5], fill_value=0))
Both produce hotarr with 5 columns even though input does not contain 4:
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]]

matlab's bwmorph(image, 'spur') in python

I'm porting a matlab image processing script over to python/skimage and haven't been able to find Matlab's bwmorph function, specifically the 'spur' operation in skimage. The matlab docs say this about spur operation:
Removes spur pixels. For example:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 becomes 0 0 0 0
0 1 0 0 0 1 0 0
1 1 0 0 1 1 0 0
I've implemented a version in python than handles the above case fine:
def _neighbors_conv(image):
image = image.astype(np.int)
k = np.array([[1,1,1],[1,0,1],[1,1,1]])
neighborhood_count = ndimage.convolve(image,k, mode='constant', cval=1)
neighborhood_count[~image.astype(np.bool)] = 0
return neighborhood_count
def spur(image):
return _neighbors_conv(image) > 1
def bwmorph(image, fn, n=1):
for _ in range(n):
image = fn(image)
return image
t= [[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 1, 0, 0]]
t = np.array(t)
print('neighbor count:')
print(_neighbors_conv(t))
print('after spur:')
print(bwmorph(t,spur).astype(np.int))
neighbor count:
[[0 0 0 0]
[0 0 1 0]
[0 3 0 0]
[7 5 0 0]]
after spur:
[[0 0 0 0]
[0 0 0 0]
[0 1 0 0]
[1 1 0 0]]
The above works by removing any pixels that only have a single neighboring pixel.
I have noticed that the above implementation behaves differently than matlab's spur operation though. Take this example in matlab:
0 0 0 0 0
0 0 1 0 0
0 1 1 1 1
0 0 1 0 0
0 0 0 0 0
becomes, via bwmorph(t,'spur',1):
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
The spur operation is a bit more complex than looking at the 8-neighbor count. It is not clear to me how to extend my implementation to satisfy this case without making it too aggressive (i.e. removing valid pixels).
What is the underlying logic of matlab's spur or is there a python implementation already available that I can use?
UPDATE:
I have found Octave's implemenation of spur that uses a LUT:
case('spur')
## lut=makelut(inline("xor(x(2,2),(sum((x&[0,1,0;1,0,1;0,1,0])(:))==0)&&(sum((x&[1,0,1;0,0,0;1,0,1])(:))==1)&&x(2,2))","x"),3);
## which is the same as
lut=repmat([zeros(16,1);ones(16,1)],16,1); ## identity
lut([18,21,81,273])=0; ## 4 qualifying patterns
lut=logical(lut);
cmd="BW2=applylut(BW, lut);";
(via https://searchcode.com/codesearch/view/9585333/)
Assuming that is correct I just need to be able to create this LUT in python and apply it...
I ended up implementing my own version of spur and other operations of bwmorph myself. For future internet travelers who have the same need here is a handy gist of what I ended up using:
https://gist.github.com/bmabey/4dd36d9938b83742a88b6f68ac1901a6

How to find longest consecutive ocurrence of non-zero elements in 2D numpy array

I am simulating protein folding on a 2D grid where every angle is either ±90° or 0°, and have the following problem:
I have an n-by-n numpy array filled with zeros, except for certain places where the value is any integer from 1 to n. Every integer appears just once. Integer k is always a nearest neighbour to k-1 and k + 1, except for the endpoints. The array is saved as an object in the class Grid which I have created for doing energy calculations and folding the protein. Example array, with n=5:
>>> from Grid import Grid
>>> a = Grid(5)
>>> a.show()
[[0 0 0 0 0]
[0 0 0 0 0]
[1 2 3 4 5]
[0 0 0 0 0]
[0 0 0 0 0]]
My goal is to find the longest consecutive line of non-zero elements withouth any bends. In the above case, the result should be 5.
My idea so far are something like this:
def getDiameter(self):
indexes = np.zeros((self.n, 2))
for i in range(1, self.n + 1):
indexes[i - 1] = np.argwhere(self.array == i)[0]
for i in range(self.n):
j = 1
currentDiameter = 1
while indexes[0][i] == indexes[0][i + j] and i + j <= self.n:
currentDiameter += 1
j += 1
while indexes[i][0] == indexes[i + j][0] and i + j <= self.n:
currentDiameter += 1
j += 1
if currentDiameter > diameter:
diameter = currentDiameter
return diameter
This has two problems: (1) it doesn't work, and (2) it is horribly inefficient if I get it to work. I am wondering if anybody has a better way of doing this. If anything is unclear, please let me know.
Edit:
Less trivial example
[[ 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 10 0 0 0]
[ 0 0 0 0 0 0 9 0 0 0]
[ 0 0 0 0 0 0 8 0 0 0]
[ 0 0 0 4 5 6 7 0 0 0]
[ 0 0 0 3 0 0 0 0 0 0]
[ 0 0 0 2 1 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0 0 0]]
The correct answer here is 4 (both the longest column and the longest row have four non-zero elements).
What I understood from your question is you need to find the length of longest occurance of consecutive elements in numpy array (row by row).
So for this below one, the output should be 5:
[[1 2 3 4 0]
[0 0 0 0 0]
[10 11 12 13 14]
[0 1 2 3 0]
[1 0 0 0 0]]
Because [10 11 12 13 14] are consecutive elements and they have the longest length comparing to any consecutive elements in any other row.
If this is what you are expecting, consider this:
import numpy as np
from itertools import groupby
a = np.array([[1, 2, 3, 4, 0],
[0, 0, 0, 0, 0],
[10, 11, 12, 13, 14],
[0, 1, 2, 3, 0],
[1, 0, 0, 0, 0]])
a = a.astype(float)
a[a == 0] = np.nan
b = np.diff(a) # Calculate the n-th discrete difference. Consecutive numbers will have a difference of 1.
counter = []
for line in b: # for each row.
if 1 in line: # consecutive elements differ by 1.
counter.append(max(sum(1 for _ in g) for k, g in groupby(line) if k == 1) + 1) # find the longest length of consecutive 1's for each row.
print(max(counter)) # find the max of list holding the longest length of consecutive 1's for each row.
# 5
For your particular example:
[[0 0 0 0 0]
[0 0 0 0 0]
[1 2 3 4 5]
[0 0 0 0 0]
[0 0 0 0 0]]
# 5
Start by finding the longest consecutive occurrence in a list:
def find_longest(l):
counter = 0
counters =[]
for i in l:
if i == 0:
counters.append(counter)
counter = 0
else:
counter += 1
counters.append(counter)
return max(counters)
now you can apply this function to each row and each column of the array, and find the maximum:
longest_occurrences = [find_longest(row) for row in a] + [find_longest(col) for col in a.T]
longest_occurrence = max(longest_occurrences)

how to program Matrix in Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I wanna create matrix like following;
I am still beginner of this language and I need help so badly, thanks
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
You can create list of lists and print them as you like
matrix = [[0] * 5 for _ in range(5)]
for i in range(5):
matrix[i][i] = 1
print " ".join(str(num) for num in matrix[i])
print matrix
Output
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
If you're planning to do any real work with matrices, you should strongly consider looking at NumPy.
Once you get it installed:
>>> import numpy as np
>>> matrix = np.diag([1]*5)
>>> print(matrix)
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]]
So far, not too exciting. But check this out:
>>> print(matrix * 2)
[[2 0 0 0 0]
[0 2 0 0 0]
[0 0 2 0 0]
[0 0 0 2 0]
[0 0 0 0 2]]
>>> print(matrix + 1)
[[2 1 1 1 1]
[1 2 1 1 1]
[1 1 2 1 1]
[1 1 1 2 1]
>>> print((1 + matrix) * (1 - matrix))
[[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 1]
[1 1 1 0 1]
[1 1 1 1 0]]
>>> print(np.arccos(matrix) / np.pi)
[[ 0. 0.5 0.5 0.5 0.5]
[ 0.5 0. 0.5 0.5 0.5]
[ 0.5 0.5 0. 0.5 0.5]
[ 0.5 0.5 0.5 0. 0.5]
[ 0.5 0.5 0.5 0.5 0. ]]
All that math, and a whole lot more, you don't have to implement yourself. And it's generally at least 10x as fast as if you did implement it yourself. All that, plus fancy indexing (like slicing by row, column, or both), and all kinds of other things you don't yet know you were going to ask for, but will.
My way will be ...
Code::
size = 5
for i in range(size):
for j in range(size):
print 1 if i==j else 0,
print ''
Output:
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
Hope this helps :)
I think this is the most simple. please enjoy it
def fun(N):
return [[0]*x + [1] + [0]*(N-x) for x in range(N)]
print(fun(5))
The result:
[[1, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0], [0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 0]]

Categories

Resources