Related
I've an image processing task and we're prohibited to use NumPy so we need to code from scratch. I've done the logic image transformation but now I'm stuck on creating an array without numpy.
So here's my last output code :
Output :
new_log =
[[236,
232,
226,
.
.
.
198,
204]]
I need to convert this to an array so I can write the image like this (with Numpy)
new_log =
array([[236, 232, 226, ..., 208, 209, 212],
[202, 197, 187, ..., 198, 200, 203],
[192, 188, 180, ..., 205, 206, 207],
...,
[233, 226, 227, ..., 172, 189, 199],
[235, 233, 228, ..., 175, 182, 192],
[235, 232, 228, ..., 195, 198, 204]], dtype=uint8)
cv.imwrite('log_transformed.jpg', new_log)
# new_log must be shaped like the second output
You can make a straightforward function to take your list and reshape it in a similar way to NumPy's np.reshape(). But it's not going to be fast, and it doesn't know anything about data types (NumPy's dtype) so... my advice is to challenge whoever it is that doesn't like NumPy. Especially if you're using OpenCV — it depends on NumPy!
Here's an example of what you could do in pure Python:
def reshape(l, shape):
"""Reshape a list.
Example
-------
>>> l = [1,2,3,4,5,6,7,8,9]
>>> reshape(l, shape=(3, -1))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
"""
nrows, ncols = shape
if ncols == -1:
ncols = len(l) // nrows
if nrows == -1:
nrows = len(l) // ncols
array = []
for r in range(nrows):
row = []
for c in range(ncols):
row.append(l[ncols*r + c])
array.append(row)
return array
I have a problem with convolution kernel in python. It is about simple convolution operator. I have input matrix and output matrix. I want to find a possible convolution kernel with size(5x5). How to solve this problem with python, numpy or tensorflow ?
import scipy.signal as ss
input_img = np.array([[94, 166, 76, 106, 152, 232],
[48, 242, 30, 98, 46, 210],
[52, 60, 86, 60, 216, 248],
[52, 236, 116, 240, 224, 184],
[138, 160, 146, 254, 236, 252],
[94, 100, 224, 246, 152, 74]], dtype=float)
output_img = np.array([[15, 49, 23, 105, 0, 0],
[43,30, 108, 124, 0, 0],
[58, 120, 112, 92, 0, 0],
[73, 127, 118, 126, 0, 0],
[112, 123, 76, 37, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=float)
# I want to find this kernel
conv = np.zeros((5,5), dtype=int)
# So if I do convolution operator, output_img will resulting a value same as I defined above
output_img = ss.convolve2d(input_img, conv, padding='same')
As far as I understood, you need to reconstruct window weights by given input, output arrays and window size. This is possible, I think, especially, if input array (image) is sufficiently big.
Look at the code below:
import scipy.signal as ss
import numpy as np
source_dataset = np.random.rand(20, 10)
sample_convolution = np.diag([1, 1, 1])
output_dataset = ss.convolve2d(data, sample_convolution, mode='same')
conv_size = c.shape[0]
# Given output_dataset, source_datset, and conv_size we need to reconstruct
# window weights.
def reconstruct(data, output, csize):
half_size = int(csize / 2)
min_row_ind = half_size
max_row_ind = int(data.shape[0]) - half_size
min_col_ind = half_size
max_col_ind = int(data.shape[1]) - half_size
A = list()
b = list()
for i in np.arange(min_row_ind, max_row_ind, dtype=int):
for j in np.arange(min_col_ind, max_col_ind, dtype=int):
A.append(data[(i - half_size):(i + half_size + 1), (j - half_size):(j + half_size + 1)].ravel().tolist())
b.append(output[i, j])
if len(A) == csize * csize and np.linalg.matrix_rank(A) == csize * csize:
return (np.linalg.pinv(A)#np.array(b)[:, np.newaxis]).reshape(csize, csize)
if len(A) < csize*csize:
raise Exception("Insufficient data")
result = reconstruct(source_dataset, output_dataset, 3)
I got the following result
array([[ 1.00000000e+00, -1.77635684e-15, -1.11022302e-16],
[ 0.00000000e+00, 1.00000000e+00, -8.88178420e-16],
[ 0.00000000e+00, -1.22124533e-15, 1.00000000e+00]])
So, it works as expected; but definitely need to be improved to take into account edge effects, case when size of window is even etc.
I am trying to use python to just compute a local pixel color average, however my output is not at all that.
Image:
Output:
Code:
image = cv2.imread('perspective.jpeg')
for i in range(image.shape[1]):
for j in range(image.shape[0]):
up = image[min(j + 1, image.shape[0]-1), i]
down = image[max(j - 1, 0), i]
right = image[j, min(i + 1, image.shape[1]-1)]
left = image[j, max(i - 1, 0)]
average = (up + down + left + right + image[j, i]) / 5
image[j, i] = average
The issues that you are observing is due to integer arithmetic overflow while computing the average. The reason of overflow is that the pixels are of type np.uint8 which when added together, generate result of type np.uint8 which is not large enough to hold the result of addition.
The solution to this problem is to cast the pixels to a larger data-type before adding them. Then cast the final value back to np.uint8 before storing back to the result image.
In-fact, casting only one of the values (say up) to larger data type will suffice as the rest of them will automatically be upgraded while performing addition.
The corrected code may look like this:
image = cv2.imread('perspective.jpeg')
for i in range(image.shape[1]):
for j in range(image.shape[0]):
up = np.float32(image[min(j + 1, image.shape[0]-1), i])
down = image[max(j - 1, 0), i]
right = image[j, min(i + 1, image.shape[1]-1)]
left = image[j, max(i - 1, 0)]
average = (up + down + left + right + image[j, i]) / 5
image[j, i] = np.uint8(average)
You can easily do this with filter2D as shown in the example below. It will work on any number of channels.
im = np.random.randint(0, 256, (5, 5), np.uint8)
kernel = np.array([[0, 1./5, 0], [1./5, 1./5, 1./5], [0, 1./5, 0]])
filt = cv2.filter2D(im, cv2.CV_8U, kernel)
For example:
im
array([[ 14, 127, 221, 74, 2],
[132, 251, 88, 19, 215],
[183, 140, 17, 60, 76],
[208, 144, 182, 11, 64],
[183, 89, 217, 131, 23]], dtype=uint8)
filt
array([[106, 173, 120, 67, 116],
[166, 148, 119, 91, 66],
[161, 147, 97, 37, 95],
[172, 153, 114, 90, 37],
[155, 155, 160, 79, 83]], dtype=uint8)
You can choose the border type, I've used the default.
I am working on an Networkx .MultiDiGraph() object built from a total of 82927 directed email data. At current stage, I am trying to get the largest strongly connected components from the .MultiDiGraph() object and its corresponding subgraph.
The text data can be accessed here.
Here's my working code:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
email_df = pd.read_csv('email_network.txt', delimiter = '->')
edge_groups = email_df.groupby(["#Sender", "Recipient"], as_index=False).count().rename(columns={"time":"weight"})
email = nx.from_pandas_dataframe(edge_groups, '#Sender', 'Recipient', edge_attr = 'weight')
G = nx.MultiDiGraph()
G.add_edges_from(email.edges(data=True))
# G is a .MultiDiGraph object
# using .strongly_connected_components() to get the part of G that has the most nodes
# using list comprehension
number_of_nodes = [len(n) for n in sorted(nx.strongly_connected_components(G))]
number_of_nodes
# 'number_of_nodes' return a list of [1, 1, 1,...,1] of length 167 (which is the exact number of nodes in the network)
# using the recommended method in networkx documentation
largest = max(nx.strongly_connected_components(G), key=len)
largest
# 'largest' returns {92}, not sure what this means...
As I noted in the above code block, the list comprehension method returns a list of [1, 1, 1,..., 1] of length 167 (which is the total number of nodes in my data), while the max(nx.strongly_connected_components(G), key=len) returned {92}, I am not sure what this means.
It looks like there's something wrong with my code and I might have missed several key steps in processing the data. Could anyone care to take a look at and enlighten me on this?
Thank you.
Note: Revised code (kudos to Eric and Joel)
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
email_df = pd.read_csv('email_network.txt', delimiter = ' ')
edge_groups = email_df.groupby(["#Sender", "Recipient"], as_index=False).count().rename(columns={"time":"weight"})
# per #Joel's comment, adding 'create_using = nx.DiGraph()'
email = nx.from_pandas_dataframe(edge_groups, '#Sender', 'Recipient', edge_attr = 'weight', create_using = nx.DiGraph())
# adding this 'directed' edge list to .MultiDiGraph() object
G = nx.MultiDiGraph()
G.add_edges_from(email.edges(data=True))
We now examine the largest strongly connected component (in terms of the number of nodes) in this network.
In [1]: largest = max(nx.strongly_connected_components(G), key=len)
In [2]: len(largest)
Out [2]: 126
The largest strongly connected component consists of 126 nodes.
[Updates]
Upon further trial and error, I found that one needs to use create_using = .MultiDiGraph() (instead of .DiGraph()) when loading data onto networkx, otherwise, even if you get correct number of nodes for your MultiDiGraph and its weakly/strongly connected subgraphs, you might still get the number of edges wrong! This will reflect in you .strongly_connected_subgraphs() outputs.
For my case here, I will recommend others to use this one-liner
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
G = nx.read_edgelist(path="email_network.txt", data=[('time', int)], create_using=nx.MultiDiGraph(), nodetype=str)
And we can implement .strongly_connected_components(G) and strongly_connected_subgraphs to verify.
If you use the networkx output G from the first code block, max(nx.strongly_connected_components(G), key=len) will give an output with 126 nodes and 52xx something edges, but if you apply the one-liner I listed above, you will get:
In [1]: largest = max(nx.strongly_connected_components(G), key=len)
In [2]: G_sc = max(nx.strongly_connected_subgraphs(G), key=len)
In [3]: nx.number_of_nodes(G_sc)
Out [3]: 126
In [4]: nx.number_of_nodes(G_sc)
Out [4]: 82130
You will get the same number of nodes with both methods but different number of edges owing to different counting mechanisms associated with different networkx graph classes.
The underlying cause of your error is that nx.from_pandas_dataframe defaults to creating an undirected graph. So email is an undirected graph. When you then create the directed graph, each edge appears in only one direction.
To fix it use nx.from_pandas_dataframe with the argument create_using = DiGraph
older comments related to the output you were getting
All your strongly connected components have a single node.
When you do max(nx.strongly_connected_components(G), key=len) it finds the set of nodes which has the longest length and returns it. In your case, they all have length 1, so it returns one of them (I believe whichever networkx happened to put into nx.strongly_connected_components(G) first). But it's returning the set, not the length. So {92} is the set of nodes it is returning.
It happens that {92} was chosen to be the "longest" length 1 component in nx.strongly_connected_components(G) by the tiebreaker.
Example:
max([{1}, {3}, {5}], key = len)
> {1}
[1, 1, 1,...,1] of length 167 (which is the exact number of nodes in the network)
This means that there's basically no strongly connected component in your graph (except for lone vertices, that is).
If you sort those components by length, you get a randon component of one single vertex since the components all have the same length (1). In your example, {92}, which could have been any other vertex.
The import looks correct and there's really no strongly connected component, it means that nobody ever replied to any email.
To check if the problem doesn't come from pandas, MultiDiGraph or your import, I wrote:
G = nx.DiGraph()
with open('email_network.txt') as f:
for line in f:
n1, n2, time = line.split()
if n1.isdigit():
G.add_edge(int(n1),int(n2))
It didn't change the result.
Just adding an edge with G.add_edge(2,1) creates a large strongly connected component, though:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 126, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 115, 117, 118, 119, 120, 121, 122, 123, 124, 128, 129, 134, 149, 151}
I'm trying to draw a bar plot with vertical axis labels and an axis title.
The script below makes the graph but it cuts off the x-axis label/title. Even if I try to make the picture bigger on my screen it still is cut off a bit. Also when I run this, I have to run it twice. The first time I get error about the fontdict property, but the next time it works.
Anyone know how to not make it cut that off? Also I am just saving the one that pops up on the screen as the saving is not working for some reason.
Thanks!
import numpy
import matplotlib
import matplotlib.pylab as pylab
import matplotlib.pyplot
import pdb
from collections import Counter
phenos = [128, 20, 0, 144, 4, 16, 160, 136, 192, 128, 20, 0, 4, 16, 144, 130, 136, 132, 22,
128, 160, 4, 0, 36, 132, 136, 130, 128, 22, 4, 0, 144, 160, 130, 132,
128, 4, 0, 136, 132, 68, 130, 192, 8, 128, 4, 0, 20, 22, 132, 144, 192, 130, 2,
128, 4, 0, 132, 20, 136, 144, 192, 64, 130, 128, 4, 0, 144, 132, 192, 20, 16, 136,
128, 4, 0, 130, 160, 132, 192, 2, 128, 4, 0, 132, 68, 160, 192, 36, 64,
128, 4, 0, 136, 192, 8, 160, 36, 128, 4, 0, 22, 20, 144, 132, 160,
128, 4, 0, 132, 20, 192, 144, 160, 68, 64, 128, 4, 0, 132, 160, 144, 136, 192, 68, 20]
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
from operator import itemgetter
c = Counter(phenos).items()
c.sort(key=itemgetter(1))
font = {'family' : 'sanserif',
'color' : 'black',
'weight' : 'normal',
'size' : 22,
}
font2 = {'family' : 'sansserif',
'color' : 'black',
'weight' : 'normal',
'size' : 18,
}
labels, values = zip(*c)
labels = ("GU", "IT", "AA", "SG", "A, IGI", "A, SG", "GU, A, AA", "D, GU", "D, IT", "A, AA", "D, IGI", "D, AA", "192", "D, A", "D, H", "H", "A")
pylab.show()
pylab.draw()
indexes = np.arange(0, 2*len(labels), 2)
width = 2
plt.bar(indexes, values, width=2, color="blueviolet")
plt.xlabel("Phenotype identifier", fontdict=font)
plt.ylabel("Number of occurances in top 10 \n phenotypes for cancerous tumours", fontdict=font)
#plt.title("Number of occurances for different phenotypes \n in top 10 subclones of a tumour", fontdict=font2)
plt.xticks(indexes + width * 0.5, labels, rotation='vertical', fontdict=font2)
plt.figure(figsize=(8.0, 7.0))
pictureFileName2 = "..\\Stats\\" + "Phenos2.png"
pylab.savefig(pictureFileName2, dpi=800)
#fig.set_size_inches(18.5,10.5)
#plt.savefig('test2png.png',dpi=100)
Three problems:
1, It is not true that the first time you run the code it doesn't work and the second time it does. The reason is that you call .show() before making the plot. The 1st time you run the code, the code stopped at where the except error message indicates. The 2nd time, .show() gets executed first and the partially made plot from the previous run now show up.
2, fontdict=font2 etc is not necessary and in fact wrong. You just need **font2 etc.
3, The truncated tick labels. There are just about many different ways to do it, but the basic idea is to increase the area of white space around the plot, alternatives are:
plt.gcf().subplots_adjust(bottom=0.35, top=0.7) #adjusting the plotting area
plt.tight_layout() #may raise an exception, depends on which backend is in use
plt.savefig('test.png', bbox_inches='tight', pad_inches = 0.0) #use bbox and pad, if you only want to change the saved figure.