I would like to write a program that creates 100 masked plots from a spread of 100 text files. i.e. for fnum in range(1,100,1):
The text files are numbered xydata1.txt, xydata2.txt ... until xydata100.txt.
How is this best done in Python?
Below is my plotting program, where (file number fnum) = 1,2,3...100.
fn = 'xydata'+fnum+'.txt'
y = loadtxt(fn,unpack=True,usecols=[0])
x = loadtxt(fn,unpack=True,usecols=[1])
n = ma.masked_where(gradient(y) < 0, y)
p = ma.masked_where(gradient(y) > 0, y)
pylab.plot(x,n,'r',x,p,'g')
pylab.savefig('data'+fnum+'.png')
pylab.show()
Assuming Python 2.7
from glob import glob
from pylab import *
for fname in glob("xydata*.txt"):
x, y = loadtxt(fname, unpack=True, usecols=[1, 0])
mask_inf = gradient(y) < 0
mask_sup = gradient(y) >= 0
plot(x[mask_inf], y[mask_inf], 'r')
plot(x[mask_sup], y[mask_sup], 'g')
legend(("grad(y) < 0", "grad(y) >= 0"))
title(fname)
savefig(fname.replace("xydata", "data").replace(".txt", ".svg"))
clf()
You can also use masked arrays. But the only advantage of them is to avoid allocating new memory. If your plots are small enough, you don't need them.
By the way there is no "best answer".
Related
I'm new to Python and want to perform a rather simple task. I've got a two-dimensional point set, which is stored as binary data (i.e. (x, y)-coordinates) in a file, which I want to visualize. The output should look as in the picture below.
However, I'm somehow overwhelmed by the amount of google results on this topic. And many of them seem to be for three-dimensional point cloud visualization and/or a massive amount of data points. So, if anyone could point me to a suitable solution for my problem, I would be really thankful.
EDIT: The point set is contained in a file which is formatted as follows:
0.000000000000000 0.000000000000000
1.000000000000000 1.000000000000000
1
0.020375738732779 0.026169010160356
0.050815740313746 0.023209931647163
0.072530406907906 0.023975230642589
The first data vector is the one in the line below the single "1"; i.e. (0.020375738732779, 0.026169010160356). How do I read this into a vector in python? I can open the file using f = open("pointset file")
Install and import matplotlib and pyplot:
import matplotlib.pyplot as plt
Assuming this is your data:
x = [1, 2, 5, 1, 5, 7, 8, 3, 2, 6]
y = [6, 7, 1, 2, 6, 2, 1, 6, 3, 1]
If you need, you can use a comprehension to split the coordinates into seperate lists:
x = [p[0] for p in points]
y = [p[1] for p in points]
Plotting is as simple as:
plt.scatter(x=x, y=y)
Result:
Many customizations are possible.
EDIT: following question edit
In order to read the file:
x = []
y = []
with open('pointset_file.txt', 'r') as f:
for line in f:
coords = line.split(' ')
x.append(float(coords[0]))
y.append(float(coords[1]))
You could read your data as follow, and plot using scattr plot. this method is considering for small number of data and not csv, just the format you have presented.
import matplotlib.pyplot as plt
with open("pointset file") as fid:
lines = fid.read().split("\n")
# lines[:2] looks like the bounds for each axis, if yes use it in plot
data = [[float(d) for d in line.split(" ") if d] for line in lines[3:]]
plt.scatter(data[0], data[1])
plt.show()
Assuming you want a plot looking pretty much exactly like the sample image you give, and you want the plot to display the data with both axes in equal proportion, one could use a general purpose multimedia library like pygame to achieve this:
#!/usr/bin/env python3
import sys
import pygame
# windows will never be larger than this in their largest dimension
MAX_WINDOW_SIZE = 400
BG_COLOUR = (255, 255, 255,)
FG_COLOUR = (0, 0, 0,)
DATA_POINT_SIZE = 2
pygame.init()
if len(sys.argv) < 2:
print('Error: need filename to read data from')
pygame.quit()
sys.exit(1)
else:
data_points = []
# read in data points from file first
with open(sys.argv[1], 'r') as file:
[next(file) for _ in range(3)] # discard first 3 lines of file
# now the rest of the file contains actual data to process
data_points.extend(tuple(float(x) for x in line.split()) for line in file)
# file read complete. now let's find the min and max bounds of the data
top_left = [float('+Inf'), float('+Inf')]
bottom_right = [float('-Inf'), float('-Inf')]
for datum in data_points:
if datum[0] < top_left[0]:
top_left[0] = datum[0]
if datum[1] < top_left[1]:
top_left[1] = datum[1]
if datum[0] > bottom_right[0]:
bottom_right[0] = datum[0]
if datum[1] > bottom_right[1]:
bottom_right[1] = datum[1]
# calculate space dimensions
space_dimensions = (bottom_right[0] - top_left[0], bottom_right[1] - top_left[1])
# take the biggest of the X or Y dimensions of the point space and scale it
# up to our maximum window size
biggest = max(space_dimensions)
scale_factor = MAX_WINDOW_SIZE / biggest # all points will be scaled up by this factor
# screen dimensions
screen_dimensions = tuple(sd * scale_factor for sd in space_dimensions)
# basic init and draw all points to screen
display = pygame.display.set_mode(screen_dimensions)
display.fill(BG_COLOUR)
for point in data_points:
# translate and scale each point
x = point[0] * scale_factor - top_left[0] * scale_factor
y = point[1] * scale_factor - top_left[1] * scale_factor
pygame.draw.circle(display, FG_COLOUR, (x, y), DATA_POINT_SIZE)
pygame.display.update()
while True:
for event in pygame.event.get():
if event.type == pygame.QUIT:
pygame.quit()
sys.exit(0)
pygame.time.wait(50)
Execute this script and pass the name of the file which holds your data in as the first argument. It will spawn a window with the data points displayed.
I generated a bunch of uniformly distributed random x,y points to test it, with:
from random import random
for _ in range(1000):
print(random(), random())
This produces a window looking like the following:
If the space your data points are within is not of square size, the window shape will change to reflect this. The largest dimension of the window, either width or height, will always stay at a specified size (I used 400px as a default in my demo).
Admittedly, this is not the most elegant or concise solution, and reinvents the wheel a little bit, however it gives you the most control on how to display your data points, and it also deals with both the reading in of the file data and the display of it.
To read your file:
import pandas as pd
import numpy as np
df = pd.read_csv('your_file',
sep='\s+',
header=None,
skiprows=3,
names=['x','y'])
For now I've created a random dataset
import random
df = pd.DataFrame({'x':[random.uniform(0, 1) for n in range(100)],
'y':[random.uniform(0, 1) for n in range(100)]})
I prefer Plotly for any kind of figure
import plotly.express as px
fig = px.scatter(df,
x='x',
y='y')
fig.show()
From here you can easily update labels, colors, etc.
I have two text files, each with the pixel intensities from an image. The first file I converted to a binary image by manually establishing a threshold:
import numpy as np
import matplotlib.pyplot as p
icp4 = np.loadtxt(icp4_img)
with np.nditer(icp4, op_flags=['readwrite']) as it:
for x in it:
if x[...] > 800:
x[...] = 1
else:
x[...] = 0
p.imshow(icp4, interpolation='nearest', cmap='gray')
p.show()
print(icp4.shape)
>>>(45, 52)
With the second file, I want to sort the pixel values into two lists, which I will use to plot a histogram of pixel values, i.e. if the pixel is above threshold in the first array, then I want to sort it to the inside list.
#sorting pixels for PTM channel
ptm = np.loadtxt(ptm_img)
inside = [] #list for pixel values that colocalize with icp4 signal
outside = [] #list for pixel values that do not colocalize with icp4 signal
with np.nditer(ptm, op_flags=['readonly']) as it:
i=0
for x in it:
if icp4[i] > 0:
inside.append(x[...])
else:
outside.append(x[...])
i+=1
sys.exit()
I cannot figure out how to reference the same position in array icp4 when iterating through the array ptm. I apologize if this is a duplicated question.
import numpy as np
import matplotlib.pyplot as p
icp4 = np.loadtxt(icp4_img)
ptm = np.loadtxt(ptm_img)
inside, outside = [], []
with np.nditer(icp4, op_flags=['readwrite']) as icp_it, np.nditer(ptm, op_flags=['readonly']) as ptm_it:
for icp, ptm in zip(icp_it, ptm_it):
if icp[...] > 800:
icp[...] = 1
else:
icp[...] = 0
if np.all(icp > 0):
inside.append(ptm)
else:
outside.append(ptm)
p.imshow(icp4, interpolation='nearest', cmap='gray')
p.show()
print(icp4.shape)
sys.exit()
I want to run through a large tif stack +1500 frames and extract the coordinates of the local maxima for each frame. The code below does the job, however extremely slow for large files. When running on smaller bits (e.g. 20 frames) each frame is done almost instantly - when running on the whole dataset, each frame takes seconds.
Any solutions to run a faster code? I figure it is due to the loading of the large tiff file - however it should only be necessary one time initially?
I have the following code:
from pims import ImageSequence
from skimage.feature import peak_local_max
def cmask(index,array):
radius = 3
a,b = index
nx,ny = array.shape
y,x = np.ogrid[-a:nx-a,-b:ny-b]
mask = x*x + y*y <= radius*radius
return(sum(array[mask])) # number of pixels
images = ImageSequence('tryhard_red_small.tif')
frame_list = []
x = []
y = []
int_liposome = []
BG_liposome = []
for i in range(len(images[0])):
tmp_frame = images[0][i]
xy = pd.DataFrame(peak_local_max(tmp_frame, min_distance=8,threshold_abs=3000))
x.extend(xy[0].tolist())
y.extend(xy[1].tolist())
for j in range(len(xy)):
index = x[j],y[j]
int_liposome.append(cmask(index,tmp_frame))
frame_list.extend([i]*len(xy))
print "Frame: ", i, "of ",len(images[0])
features = pd.DataFrame(
{'lip_int':int_liposome,
'y' : y,
'x' : x,
'frame' : frame_list})
Have you tried profiling the code, say with %prun or %lprun in ipython? That'll tell you exactly where your slowdowns are occurring.
I can't make my own version of this without the tif stack, but I suspect the problem is the fact that you're using lists to store everything. Every time you do an append or an extension, python is having to allocate more memory. You could try getting the total count of maxima first, then allocating your output arrays, then rerunning to fill the arrays. Something like below
# run through once to get the count of local maxima
npeaks = (len(peak_local_max(f, min_distance=8, threshold_abs=3000))
for f in images[0])
total_peaks = sum(npeaks)
# allocate storage arrays and rerun
x = np.zeros(total_peaks, np.float)
y = np.zeros_like(x)
int_liposome = np.zeros_like(x)
BG_liposome = np.zeros_like(x)
frame_list = np.zeros(total_peaks, np.int)
index_0 = 0
for frame_ind, tmp_frame in enumerate(images[0]):
peaks = pd.DataFrame(peak_local_max(tmp_frame, min_distance=8,threshold_abs=3000))
index_1 = index_0 + len(peaks)
# copy the data from the DataFrame's underlying numpy array
x[index_0:index_1] = peaks[0].values
y[index_0:index_1] = peaks[1].values
for i, peak in enumerate(peaks, index_0):
int_liposome[i] = cmask(peak, tmp_frame)
frame_list[index_0:index_1] = frame_ind
# update the starting index
index_0 = index_1
print "Frame: ", frame_ind, "of ",len(images[0])
I have a nested list of dictionaries, created like this:
N = 30
grid = []
for row in range(N):
rows = []
for column in range(N):
each_cell = {"check": 0, "type": -1}
rows.append(each_cell)
grid.append(rows)
Type is the one that I want to plot, a value of -1 means nothing in the cell, and 0,1,2,3 are different types (not gradient), which I want to be represented by different colours.
I am putting a random number of types into the grid like this:
import numpy.random as rnd
import matplotlib.pyplot as plt
for i in range (rnd.randint(0, N*N)):
x = rnd.randint(0, N)
y = rnd.randint(0, N)
grid[x][y]['check'] = 1
if grid[x][y]['check'] == 1:
grid[x][y]['type'] = rnd.randint(0,4)
I am attempting to plot it using this:
plt.imshow(grid['type'], interpolation = 'nearest', cmap = 'gist_ncar_r')
plt.show()
But obviously the grid['type'] isn't picking out only the types like I want it to, anybody know how to fix this?
Since imshow requires an 'array-like', you can change the structure of your data to make it easier to work with. Instead of using an array of dicts, use a dict of arrays.
import numpy.random as rnd
import matplotlib.pyplot as plt
N = 30
grid = {'check': [], 'type': []}
for row in range(N):
check_rows = []
type_rows = []
for column in range(N):
check_rows.append(0)
type_rows.append(1)
grid['check'].append(check_rows)
grid['type'].append(type_rows)
for i in range (rnd.randint(0, N*N)):
x = rnd.randint(0, N)
y = rnd.randint(0, N)
grid['check'][x][y] = 1
if grid['check'][x][y] == 1:
grid['type'][x][y] = rnd.randint(0,4)
plt.imshow(grid['type'], interpolation = 'nearest', cmap = 'gist_ncar_r')
plt.show()
You can use a list comprehension to get the data you want into an array:
from numpy import *
...
data = array([[grid[i][j]['type'] for j in range(N)] for i in range(N)])
To use array you will need to do do the numpy import.
Then you can plot it like you're trying to:
matplotlib.pyplot.imshow(data, interpolation = 'nearest', cmap = 'gist_ncar_r')
matplotlib.pyplot.show()
I've got a 900 x 650 2D numpy array which I'd like to split into 10 x 10 blocks, which will be checked for nonzero elements. Is there a Pythonic way that I can achieve this with numpy?
I'm looking for functionality similar to the following:
blocks_that_have_stuff = []
my_array = getArray()
my_array.cut_into_blocks((10, 10))
for block_no, block in enumerate(my_array):
if numpy.count_nonzero(block) > 5:
blocks_that_have_stuff.append(block_no)
I wrote a routine that cut your matrix in blocks. The example is very easy to understand. I wrote it in an easy form to display the result (only for checking purpose). If you are interested in it, you could include in the output the number of blocks or anything.
import matplotlib.pyplot as plt
import numpy as np
def cut_array2d(array, shape):
arr_shape = np.shape(array)
xcut = np.linspace(0,arr_shape[0],shape[0]+1).astype(np.int)
ycut = np.linspace(0,arr_shape[1],shape[1]+1).astype(np.int)
blocks = []; xextent = []; yextent = []
for i in range(shape[0]):
for j in range(shape[1]):
blocks.append(array[xcut[i]:xcut[i+1],ycut[j]:ycut[j+1]])
xextent.append([xcut[i],xcut[i+1]])
yextent.append([ycut[j],ycut[j+1]])
return xextent,yextent,blocks
nx = 900; ny = 650
X, Y = np.meshgrid(np.linspace(-5,5,nx), np.linspace(-5,5,ny))
arr = X**2+Y**2
x,y,blocks = cut_array2d(arr,(10,10))
n = 0
for x,y,block in zip(x,y,blocks):
n += 1
plt.imshow(block,extent=[y[0],y[1],x[0],x[1]],
interpolation='nearest',origin='lower',
vmin = arr.min(), vmax=arr.max(),
cmap=plt.cm.Blues_r)
plt.text(0.5*(y[0]+y[1]),0.5*(x[0]+x[1]),str(n),
horizontalalignment='center',
verticalalignment='center')
plt.xlim([0,900])
plt.ylim([0,650])
plt.savefig("blocks.png",dpi=72)
plt.show()
The output is:
Regards
Note: I think you could optimize this routine using np.meshgrid instead a lot of appends with the xextent & yextent.