I'm getting really confused with all the information on here using 'split' in python. Basically I want to write a code which opens a spreadsheet (with two columns in it) and the function I write will use the first column as x's and the second column as y's and then it will plot it in the x-y plane.
I thought I would use line.splitlines to cut each line in excel into (x,y) but I keep getting
'ValueError: need more than 1 value to unpack'
I don't know what this means?
Below is what I've written so far, (xdir is an initial condition for a different part of my question):
def plotMo(filename, xdir):
infile = open(filename)
data = []
for line in infile:
x,y = line.splitlines()
x = float(x)
y = float(y)
data.append([x,y])
infile.close()
return data
plt.plot(x,y)
For example with
0 0.049976
0.01 0.049902
0.02 0.04978
0.03 0.049609
0.04 0.04939
0.05 0.049123
0.06 0.048807
I would want to the first point in my plane to be (0, 0.049976) and the second plot to be (0.01, 0.049902).
x,y = line.splitlines() tries to split the current line into several lines.
Since splitlines returns only 1 element, there's an error because python cannot find a value to assign to y.
What you want is x,y = line.split() which will split the line according to 1 or more spaces (like awk would do) if no parameter is specified.
However it depends of the format: if there are blank lines you'll get the "unpack" problem at some point, so to be safe and skip malformed lines, write:
items = line.split()
if len(items)==2: x,y = items
To sum it up, a more pythonic, shorter & safer way of writing your routine would be:
def plotMo(filename):
with open(filename) as infile:
data = []
for line in infile:
items = line.split()
if len(items)==2:
data.append([float(e) for e in items])
return data
(maybe it could be condensed more, but that's good for starters)
Related
I have a question concering my code for a data evaluation of an experiment:
in a first for loop I am opening file after file which I want to analyze. inside this for loop, so inside one file, I want to create a second for loop to evaluate the some specific paramters for evaluation. when I do it just for one file, the parameters are correct, but when I loop over all files, it looks like in the second for loop these paramteres are summed up. so the normal value should be in the range of ar= 0.0001, for one file perfectly working. when I loop over the files I then get 0.0001 for the first one, 0.0002 for the second, 0.0003 for the thrid, etc.
update:
ok so here is the whole part of the code. for each file I want after fitting the data to get the sum over the difference between two datapoints in the first column (x[j]) inside the file multiplicated by the coressponding value in the second columnn (y[j]) (each file has two columns with a length of 720 datapoints) and the result of this should then be stored in AR for each file.
def sum_list(l):
sum = 0
for k in l:
sum += k
return sum
INV= []
DIFFS= []
AR= []
for i in range(0,len(fnames)):
data= np.loadtxt(fnames[i])
x= data[:,0]
y=data[:,1]
gmod=lm.Model(linmod)
result= gmod.fit(y, x=x, p=0.003, bg=0.001)
plt.plot(x, y)
plt.plot(x, result.best_fit, 'r-')
plt.show()
print result.best_values['bg']
print result.best_values['p']
p= result.best_values['p']
bg1= result.best_values['bg']
for j in range(0, 719):
diffs = ((x[j+1]- x[j])*y[j])
DIFFS.append(diffs)
ar= sum_list(DIFFS)
AR.append(ar)
inr= (x[0]-bg1)*(y[0]**3)/3 + ar
INV[i]= inr
If you are working with files (e.g opening them), I suggest to use os module, maybe a construct like this will help you to avoid the nested for loop:
for root,dirs,files in os.walk(os.getcwd()):
for i in files:
with open(os.path.join(root,i)) as f:
#do your summation
I am still new to python but using it for my linguistics research.
So I am doing some research into toponyms, and I got a list of input data from a topographic institution, which looks like the following:
Official_Name, tab, Dialect_Name, tab, Administrative_district, Topographic_district, Y_coordinates, X_coordinates, Longitude, Latitude.
So, I defined a class:
class MacroTop:
def __init__(self, Official_Name, Dialect_Name, Adm_District, Topo_District, Y, X, Long, Lat):
self.Official_Name = Official_Name
self.Dialect_Name = Dialect_Name
self.Adm_District = Adm_District
self.Topo_District = Topo_District
self.Y = Y
self.X = X
self.Long = Long
self.Lat = Lat
So, with open(), I wanted to load my .txt file with the data I have to read it into the class using a loop but it did not work.
The result I want is to be able to access a feature of the class, say, Dialect_Name and be able to look through all the entries of that feature. I can do that just in the loop, but I wanted to define a class so I could be able to do more manipulation afterwards.
my loop:
with open("locLuxAll.txt", "r") as topo_list:
lines = topo_list.readlines()
for line in lines:
line = line.split('\t')
print(line)
print(line[0]) # This would access all the data that is characterized as Official_Name
I tried to make another loop:
for i in range(0-len(lines)):
lines[i] = MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7]))
But that did not seem to work.
This line fails:
for i in range(0-len(lines)):
You're trying to loop through negative number I guess, so the output will be an empty list.
In [11]: [i for i in range(-200)]
Out[11]: []
EDIT:
Your code seems unreadable to me, you have for i in range(len(lines)) but in this for loop, you're iterating through line variable, where is it from? First of all I'd not write back to lines list as it comes from readlines. Create new list for that, and you dont need i variable, those lines will be kept in order anyway.
class_lines = []
for line in lines:
class_lines.append(MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(line[4]), str(line[5]), str(line[6]), str(line[7])))
Or even with list comprehension:
class_lines = [MacroTop(str(line[0]), str(line[1]), str(line[2]), str(line[3]), str(
line[4]), str(line[5]), str(line[6]), str(line[7])) for line in lines]
I have a device that stores three data sets in a .DAT file, they always have the same heading and number of columns, but the number of rows vary.
They are (n x 4), (m x 4), (L x 3).
I need to extract the three data sets into seperate arrays for plotting.
I have been trying to use numpy.genfromtxt and numpy.loadtxt, but the only way I can get them to work for this format is to manually define the row which each data set starts.
As I will regularly need to deal with this format I have been trying to automate it.
If someone could suggest a method which might work I would greatly appreciate it. I have attached an example file.
example file
Just a quck and dirty solution. At your file size, you might run into performance issues. If you know m, n and L, initialize the output vectors with the respective length.
here is the strategy: Load the whole File in a variable. Read the variable line by line. As soon as you discover a keyword, raise a flag that you are in the specific block. In the next line, read out the line to the correct variables.
isblock1 = isblock2 = isblock3 = False
fout = [] # construct also all the other variables that you want to collect.
with open(file, 'r') as file:
lines = file.readlines() #read all the lines
for line in lines:
if isblock1:
(f, psd, ipj, itj) = line.split()
fout.append(f) #do this also with the other variables
if isblock2:
(t1, p1, p2, p12) = line.split()
if isblock3:
(t2, v1, v2) = line.split()
if 'Frequency' is in line:
isblock1 = True
isblock2 = isblock3 = False
if 'Phasor' is in line:
isblock2 = True
isblock1 = isblock3 = False
if 'Voltage' is in line:
isblock3 = True
isblock1 = isblock2 = False
Hope that helps.
I have a file that is space delimited with values for x,y,x. I need to visualise the data so I guess I need so read the file into 3 separate arrays (X,Y,Z) and then plot them. How do I read the file into 3 seperate arrays I have this so far which removes the white space element at the end of every line.
def fread(f=None):
"""Reads in test and training CSVs."""
X = []
Y = []
Z = []
if (f==None):
print("No file given to read, exiting...")
sys.exit(1)
read = csv.reader(open(f,'r'),delimiter = ' ')
for line in read:
line = line[:-1]
I tried to add something like:
for x,y,z in line:
X.append(x)
Y.append(y)
Z.append(z)
But I get an error like "ValueError: too many values to unpack"
I have done lots of googling but nothing seems to address having to read in a file into a separate array every element.
I should add my data isn't sorted nicely into rows/columns it just looks like this
"107745590026 2 0.02934046648 0.01023879368 3.331810236 2 0.02727724425 0.07867902517 3.319272757 2 0.01784882881"......
Thanks!
EDIT: If your data isn't actually separated into 3-element lines (and is instead one long space-separated list of values), you could use python list slicing with stride to make this easier:
X = read[::3]
Y = read[1::3]
Z = read[2::3]
This error might be happening because some of the lines in read contain more than three space-separated values. It's unclear from your question exactly what you'd want to do in these cases. If you're using python 3, you could put the first element of a line into X, the second into Y, and all the rest of that line into Z with the following:
for x, y, *z in line:
X.append(x)
Y.append(y)
for elem in z:
Z.append(elem)
If you're not using python 3, you can perform the same basic logic in a slightly more verbose way:
for i, elem in line:
if i == 0:
X.append(elem)
elif i == 1:
Y.append(elem)
else:
Z.append(elem)
I have a large 40 million line, 3 gigabyte text file (probably wont be able to fit in memory) in the following format:
399.4540176 {Some other data}
404.498759292 {Some other data}
408.362737492 {Some other data}
412.832976111 {Some other data}
415.70665675 {Some other data}
419.586515381 {Some other data}
427.316825959 {Some other data}
.......
Each line starts off with a number and is followed by some other data. The numbers are in sorted order. I need to be able to:
Given a number x and and a range y, find all the lines whose number is within y range of x. For example if x=20 and y=5, I need to find all lines whose number is between 15 and 25.
Store these lines into another separate file.
What would be an efficient method to do this without having to trawl through the entire file?
If you don't want to generate a database ahead of time for line lengths, you can try this:
import os
import sys
# Configuration, change these to suit your needs
maxRowOffset = 100 #increase this if some lines are being missed
fileName = 'longFile.txt'
x = 2000
y = 25
#seek to first character c before the current position
def seekTo(f,c):
while f.read(1) != c:
f.seek(-2,1)
def parseRow(row):
return (int(row.split(None,1)[0]),row)
minRow = x - y
maxRow = x + y
step = os.path.getsize(fileName)/2.
with open(fileName,'r') as f:
while True:
f.seek(int(step),1)
seekTo(f,'\n')
row = parseRow(f.readline())
if row[0] < minRow:
if minRow - row[0] < maxRowOffset:
with open('outputFile.txt','w') as fo:
for row in f:
row = parseRow(row)
if row[0] > maxRow:
sys.exit()
if row[0] >= minRow:
fo.write(row[1])
else:
step /= 2.
step = step * -1 if step < 0 else step
else:
step /= 2.
step = step * -1 if step > 0 else step
It starts by performing a binary search on the file until it is near (less than maxRowOffset) the row to find. Then it starts reading every line until it finds one that is greater than x-y. That line, and every line after it are written to an output file until a line is found that is greater than x+y, and which point the program exits.
I tested this on a 1,000,000 line file and it runs in 0.05 seconds. Compare this to reading every line which took 3.8 seconds.
You need random access to the lines which you won't get with a text files unless the lines are all padded to the same length.
One solution is to dump the table into a database (such as SQLite) with two columns, one for the number and one for all the other data (assuming that the data is guaranteed to fit into whatever the maximum number of characters allowed in a single column in your database is). Then index the number column and you're good to go.
Without a database, you could read through file one time and create an in-memory data structure with pairs of values showing containing (number, line-offset). You calculate the line-offset by adding the lengths of each row (including line end). Now you can binary search these value pairs on number and randomly access the lines in the file using the offset. If you need to repeat the search later, pickle the in-memory structure and reload for later re-use.
This reads the entire file (which you said you don't want to do), but does so only once to build the index. After that you can execute as many requests against the file as you want and they will be very fast.
Note that this second solution is essentially creating a database index on your text file.
Rough code to create the index in second solution:
import Pickle
line_end_length = len('\n') # must be a better way to do this!
offset = 0
index = [] # probably a better structure to use than a list
f = open(filename)
for row in f:
nbr = float(row.split(' ')[0])
index.append([nbr, offset])
offset += len(row) + line_end_length
Pickle.dump(index, open('filename.idx', 'wb')) # saves it for future use
Now, you can perform a binary search on the list. There's probably a much better data structure to use for accruing the index values than a list, but I'd have to read up on the various collection types.
Since you want to match the first field, you can use gawk:
$ gawk '{if ($1 >= 15 && $1 <= 25) { print }; if ($1 > 25) { exit }}' your_file
Edit: Taking a file with 261,775,557 lines that is 2.5 GiB big, searching for lines 50,010,015 to 50,010,025 this takes 27 seconds on my Intel(R) Core(TM) i7 CPU 860 # 2.80GHz. Sounds good enough for me.
In order to find the line that starts with the number just above your lower limit, you have to go through the file line by line until you find that line. No other way, i.e. all data in the file has to be read and parsed for newline characters.
We have to run this search up to the first line that exceeds your upper limit and stop. Hence, it helps that the file is already sorted. This code will hopefully help:
with open(outpath) as outfile:
with open(inpath) as infile:
for line in infile:
t = float(line.split()[0])
if lower_limit <= t <= upper_limit:
outfile.write(line)
elif t > upper_limit:
break
I think theoretically there is no other option.