Splitting data file columns into separate arrays in Python

Splitting data file columns into separate arrays in Python - python

I'm new to python and have been trying to figure this out all day. I have a data file laid out as below,
time I(R_stkb)
Step Information: Temp=0 (Run: 1/11)
0.000000000000000e+000 0.000000e+000
9.999999960041972e-012 8.924141e-012
1.999999992008394e-011 9.623148e-012
3.999999984016789e-011 6.154220e-012
(Note: No empty line between the each data line.)
I want to plot the data using matplotlib functions, so I'll need the two separate columns in arrays.
I currently have
def plotdata():
Xvals=[], Yvals=[]
i = open(file,'r')
for line in i:
Xvals,Yvals = line.split(' ', 1)
print Xvals,Yvals
But obviously its completely wrong. Can anyone give me a simple answer to this, and with an explanation of what exactly the lines mean would be helpful. Cheers.
Edit: The first two lines repeat throughout the file.

This is a job for the * operator on the zip method.
>>> asdf
[[1, 2], [3, 4], [5, 6]]
>>> zip(*asdf)
[(1, 3, 5), (2, 4, 6)]
So in the context of your data it might be something like:
handle = open(file,'r')
lines = [line.split() for line in handle if line[:4] not in ('time', 'Step')]
Xvals, Yvals = zip(*lines)
or if your really need to be able to mutate the data afterwards you could just call the list constructor on each tuple:
Xvals, Yvals = [list(block) for block in zip(*lines)]

One way to do it is:
Xvals=[]; Yvals=[]
i = open(file,'r')
for line in i:
x, y = line.split(' ', 1)
Xvals.append(float(x))
Yvals.append(float(y))
print Xvals,Yvals
Note the call to the float function, which will change the string you get from the file into a number.

This is what numpy.loadtxt is designed for. Try:
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(file, skiprows = 2) # assuming you have time and step information on 2 separate lines
# and you do not want to read them
plt.plot(data[:,0], data[:,1])
plt.show()
EDIT:
if you have time and step information scattered throughout the file and you want to plot data on every step, there is a possibility of reading all the file to memory (suppose it's small enough), and then split it on time strings:
l = open(fname, 'rb').read()
for chunk in l.split('time'):
data = np.array([s.split() for s in chunk.split('\n')[2:]][:-1], dtype = np.float)
plt.plot(data[:,0], data[:,1])
plt.show()
Or else you could add the # comment sign to the comment lines and use np.loadxt.

If you want to plot this file with matplotlib, you might want to check out it's plotfile function. See the official documentation here.

Related

How do I read a text file of numbers into an array of arrays

In python, using the OpenCV library, I need to create some polylines. The example code for the polylines method shows:
cv2.polylines(img,[pts],True,(0,255,255))
I have all the 'pts' laid out in a text file in the format:
x1,y1,x2,y2,x3,y3,x4,y4
x1,y1,x2,y2,x3,y3,x4,y4
x1,y1,x2,y2,x3,y3,x4,y4
How can I read this file and provide the data to the [pts] variable in the method call?
I've tried the np.array(csv.reader(...)) method as well as a few others I've found examples of. I can successfully read the file, but it's not in the format the polylines method wants. (I am a newbie when it comes to python, if this was C++ or Java, it wouldn't be a problem).

I would try to use numpy to read the csv as an array.
from numpy import genfromtxt
p = genfromtxt('myfile.csv', delimiter=',')
cv2.polylines(img,p,True,(0,255,255))
You may have to pass a dtype argument to the genfromtext if you need to coerce the data to a specific format.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

In case you know it is a fixed number of items in each row:
import csv
with open('myfile.csv') as csvfile:
rows = csv.reader(csvfile)
res = list(zip(*rows))
print(res)

I know it's not pretty and there is probably a MUCH BETTER way to do this, but it works. That being said, if someone could show me a better way, it would be much appreciated.
pointlist = []
f = open(args["slots"])
data = f.read().split()
for row in data:
tmp = []
col = row.split(";")
for points in col:
xy = points.split(",")
tmp += [[int(pt) for pt in xy]]
pointlist += [tmp]
slots = np.asarray(pointlist)

You might need to draw each polyline individually (to expand on #Chris's answer):
from numpy import genfromtxt
lines = genfromtxt('myfile.csv', delimiter=',')
for line in lines:
cv2.polylines(img, line.reshape((-1, 2)), True, (0,255,255))

pandas plot - plot specific lines in file

I am using the following code for pandas plot. It takes in a file, and makes plots for specific lines (locus ID's: i.e. 'loc.27404').
However, this code manually specifies the lines which I want to plot.
I have a another file containing all the lines (there are 100s) that I want to plot - how can I write a script that will allow me to input this file such that so that these specific lines are plotted using the below code. I can't seem to write anything that makes sense.
data = {}
for line in File:
cols = line.strip().split('\t')
vals = map(float,cols[6:])
data[cols[3]] = vals
fig,ax = plt.subplots(4,figsize=[15,20])
l1= 'loc.27404'
l2= 'loc.37387'
l3 = 'loc.05134'
l4 = 'loc.10034'
pd.Series(data[l1],index=xticks).plot(ax=ax[0])
pd.Series(data[l2],index=xticks).plot(ax=ax[1])
pd.Series(data[l3],index=xticks).plot(ax=ax[2])
pd.Series(data[l4],index=xticks).plot(ax=ax[3])

A potential solution would be to read the 100s of lines and store results in a list, like: ['loc.27404','loc.37387','loc.05134','loc.10034'] to be used in a for-loop:
for i,line_to_plot in enumerate(['loc.27404','loc.37387','loc.05134','loc.10034']):
pd.Series(data[line_to_plot],index=xticks).plot(ax=ax[i])

Extract multiple arrays from .DAT file with undefined size

I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file

Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.

Calculating and plotting a grow rate in years from a dictionary

I am trying to plot a graph from a CSV file with the following Python code;
import csv
import matplotlib.pyplot as plt
def population_dict(filename):
"""
Reads the population from a CSV file, containing
years in column 2 and population / 1000 in column 3.
#param filename: the filename to read the data from
#return dictionary containing year -> population
"""
dictionary = {}
with open(filename, 'r') as f:
reader = csv.reader(f)
f.next()
for row in reader:
dictionary[row[2]] = row[3]
return dictionary
dict_for_plot = population_dict('population.csv')
def plot_dict(dict_for_plot):
x_list = []
y_list = []
for data in dict_for_plot:
x = data
y = dict_for_plot[data]
x_list.append(x)
y_list.append(y)
plt.plot(x_list, y_list, 'ro')
plt.ylabel('population')
plt.xlabel('year')
plt.show()
plot_dict(dict_for_plot)
def grow_rate(data_dict):
# fill lists
growth_rates = []
x_list = []
y_list = []
for data in data_dict:
x = data
y = data_dict[data]
x_list.append(x)
y_list.append(y)
# calc grow_rate
for i in range(0, len(y_list)-1):
var = float(y_list[i+1]) - float(y_list[i])
var = var/y_list[i]
print var
growth_rates.append(var)
# growth_rate_dict = dict(zip(years, growth_rates))
grow_rate(dict_for_plot)
However, I'm getting a rather weird error on executing this code
Traceback (most recent call last):
File "/home/jharvard/Desktop/pyplot.py", line 71, in <module>
grow_rate(dict_for_plot)
File "/home/jharvard/Desktop/pyplot.py", line 64, in grow_rate
var = var/y_list[i]
TypeError: unsupported operand type(s) for /: 'float' and 'str'
I've been trying different methods to cast the y_list variable. For example; casting an int.
How can I solve this problem so I can get the percentage of the grow rate through the years to plot this.

Since CSV files are text files, you will need to convert them into numbers. Its easy to correct for the syntax error. Just use
var/float(y_list[i])
Even though that gets rid of the syntax error, there is a minor bug which is a little more difficult to spot, which may result in incorrect results under some circumstances. The main reason being that dictionaries are not ordered. i.e. the x and y values are not ordered in any way. The indentation for your program appears to be a bit off on my computer, so am unable to follow it exactly. But the gist of it appears to be that you are obtaining values from a file (x, and y values) and then finding the sequence
var[i] = (y[i+1] - y[i]) / y[i]
Unfortunately, your y_list[i] may not be in the same sequence as in the CSV file because, it is being populated from a dictionary.
In the section where you did:
for row in reader:
dictionary[row[2]] = row[3]
it is just better to preserve the order by doing
x, y = zip(*[ ( float(row[2]), float(row[3]) ) for row in reader])
x, y = map(numpy.array, [x, y])
return x, y
or something like this ...
Then, Numpy arrays have methods for handling your problem much more efficiently. You can then simply do:
growth_rates = numpy.diff(y) / y[:-1]
Hope this helps. Let me know if you have any questions.
Finally, if you do go the Numpy route, I would highly recommend its own csv reader. Check it out here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

Python numpy method matrix.toFile()

I need help formatting my matrix when i write it to a file. I am using the numpy method called toFile()
it takes 3 args. 1-name of file,2-seperator(must be a string),3-format(Also a string)
I dont know a lot about formatting but i am trying to format the file to there is a new line each 9 charatcers. (not including spaces). The output is a 9x9 soduku game. So I need to it be formatted 9x9.
finished = M.tofile("soduku_solved.txt", " ", "")
Where M is a matrix
My first argument is the name of the file, the second is a space, but I dont know what format argument i need to to make it 9x9

I could be wrong, but I don't think that's possible with the numpy tofile function. I think the format argument just allows you to format how each individual item is formatted, it doesn't consider them in a group.
You could do something like:
M = np.random.randint(1, 9, (9, 9))
each_item_fmt = '{:>3}'
each_row_fmt = ' '.join([each_item_fmt] * 9)
fmt = '\n'.join([each_row_fmt] * 9)
as_string = fmt.format(*M.flatten())
It's not a very nice way to build up the format string and there's bound to be a better way of doing it. You'll see the final result (print(fmt)) is a big block of '{:>3}', which basically says, put a bit of data in here with a fixed width of 3 characters, right aligned.
EDIT Since you're putting it directly into a file you could write it line by line:
M = np.random.randint(1, 9, (9, 9))
fmt = ('{:>3} ' * 9).strip()
with open('soduku_solved.txt', 'w') as f:
for m in M:
f.write(fmt.format(*m) + '\n')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting data file columns into separate arrays in Python - python

One way to do it is: Xvals=[]; Yvals=[] i = open(file,'r') for line in i: x, y = line.split(' ', 1) Xvals.append(float(x)) Yvals.append(float(y)) print Xvals,Yvals Note the call to the float function, which will change the string you get from the file into a number.

If you want to plot this file with matplotlib, you might want to check out it's plotfile function. See the official documentation here.

Related

How do I read a text file of numbers into an array of arrays

pandas plot - plot specific lines in file

Extract multiple arrays from .DAT file with undefined size

Calculating and plotting a grow rate in years from a dictionary

Python numpy method matrix.toFile()

Categories

Resources