I have a question about this program:
%%file data.csv
x1,x2,y
0.4946,5.7661,0
4.7206,5.7661,1
1.2888,5.3433,0
4.2898,5.3433,1
1.4293,4.5592,0
4.2286,4.5592,1
1.1921,5.8563,0
3.1454,5.8563,1
f = open('data.csv')
data = []
f.readline()
for line in f:
(x1,x2,y) = line.split(',')
x1 = float(x1)
x2 = float(x2)
y = int(y)
data.append((x1,x2,y))
What is the purpose of readline here? I have seen different examples but here seems that it delete the first line.
Python is reading the data serially, so if a line gets read once, python jumps to the next one. The r.readline() reads the first line, so in the loop it doesn't get read.
That's precisely the point: to delete the first line. If you notice, the file has the names of the columns as its first line (x1,x2,y), and the program wants to ignore that line.
Using readline() method before reading lines of file in loop
is equals to:
for line in f.readlines()[1:]:
...
for example that may be used to skip table header.
In your file, when you will convert x1 variable to float type it raise ValueError because in first iteration x1 contain not digit sting type value "x1". And to avoid that error you use readline() to swich iterator to second line wich contain pure digits.
Related
I am using Python 3 to process a results file. The structure of the file is a combination of string identifiers followed by lists of integer values in this format:
ENERGY_BOUNDS
1.964033E+07 1.733253E+07 1.491825E+07 1.384031E+07 1.161834E+07 1.000000E+07 8.187308E+06 6.703200E+06
6.065307E+06 5.488116E+06 4.493290E+06 3.678794E+06 3.011942E+06 2.465970E+06 2.231302E+06 2.018965E+06
EIGENVALUE
1.219034E+00
There are maybe 50 different sets of data with unique identifiers in this file. What I want to do is write a code that will search for a specific identifier (e.g. ENERGY_BOUNDS), then read the values that follow into a list, stopping at the next identifier (in this case EIGENVALUE). I then need to be able to manipulate the list (finding its length, printing its values, etc.).
I am writing this as a function so I can call it multiple times in my code when I want to search for different identifiers. So far what I have is:
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(line)
list_of_results.append(nextValue.rstrip())
return list_of_results
It works fine up until it comes to reading the next line after the identifier, and I am stuck on how to continue reading the results after that line and how to make it stop at the next identifier.
Following is simple and tested answer.
You are making two mistakes
line is a string and not iterator so doing next(line) is causing error.
You are just reading one line after identifier has been found while you need to keep on reading until another identifier appears.
Following is the code after doing little modification of your code. It's also tested on your data
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as read_obj:
list_of_results = []
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(read_obj)
while(not nextValue.strip().isalpha()): #keep on reading untill next identifier appears
list_of_results.extend(nextValue.split())
nextValue = next(read_obj)
print(list_of_results)
I would suggest adding a variable that indicates whether you have found a line containing an identifier.
Afterwards, simply add the values into the array until the next identifier has been reached.
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
identifier_found = False
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
identifier_found = True
elif identifier_found:
if line.strip().isalpha(): # Next identifier reached, exit loop
break
list_of_results += line.split() # Add values to result
return list_of_results
Use booleans, continue, and break!
Try to implement logic as follows:
Set a boolean (I'll use in_range) to False
Look through the lines and see if they match the identifier.
If it does, set the boolean to True and continue
If it does not, continue
If the boolean is False AND the line begins with a space: continue
If the boolean is True AND the line begins with a space: Add the line to the list.
If the boolean is True AND the line doesn't begin with a space: break.
This ends the searching process once a new identifier has been started.
The other 2 answers are already helpful. Here is my method incase that you need something else. With comments to explain.
If you dont want to use the end_identifier you can use .isAlpha() which checks if the string only contains letters.
def read_data_from_file(file_name, start_identifier, end_identifier):
list_of_results = []
with open(file_name, 'r') as read_obj:
start_identifier_reached = False # variable to check if we reached the needed identifier_reached
for line in read_obj:
if start_identifier in line:
start_identifier_reached = True # now we reached the identifier
continue # We go back to the start so we dont write the identifier into the list
if start_identifier_reached and (end_identifier not in line): # Put the values into the list until we reach the end_identifier
list_of_results.append(line.rstrip())
else:
return list_of_results
I'm following along with a Great Course tutorial for learning Python, and this code doesn't seem to work.
#Open File
filename = input("Enter the name of the data file: ")
infile = open(filename, 'r')
#Read in file
datalist = []
for line in infile:
#Get data from line
date, l, h, rf = (line.split(','))
rainfall = float(rf)
max_temp = float(h)
min_temp = float(l)
m, d, y = (date.split('/'))
month = int(m)
day= int(d)
year=int(y)
#Put data into list
datalist.append([day,month,year,min_temp,max_temp,rainfall])
I'm trying to import a csv file, then create a tuple. The problem occurs when I'm converting the values in the tuple to floats. It works fine until it runs through the file. Then it presents me with this error:
Traceback (most recent call last): File
"C:/Users/Devlin/PycharmProjects/untitled3/James'
Programs/Weather.py", line 16, in rainfall = float(rf)
ValueError: could not convert string to float:`
Any ideas on what am doing wrong?
It's hard to tell what exactly are you doing wrong without seeing the input file itself, but what seems to be wrong here (besides the fact that your values in a file seem to be comma-separated and that you might have been better off using Python stdlib's csv module) is that you're encountering a string somewhere when iterating over the lines, and are trying to convert that to float which is a no go:
>>> float('spam')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 'spam'
One of the solutions would be to simply skip the strings you're encountering, and you can choose between two approaches there (LBYL (Look Before You Leap) vs EAFP (Easier to Ask for Forgiveness than Permission)), and, in a nutshell, I suggest you go with the latter since it's generally preferred due to the gap that exists between checking and using, which means that things can change out from under you, and oftentimes you'll have to handle the error in any case, even if you check first. Apart from that, instead of manually closing file-like objects later on (which you seem to have forgotten to do), I suggest using with statement as a context manager which automatically manages that for you. So, taking all that into account, something along the following lines should do the trick (disclaimer: not thoroughly tested):
import csv
data = []
filename = input('Enter the name of the data file: ')
with open(filename) as f:
reader = csv.reader(f, delimiter=',', skipinitialspace=True)
for line in reader:
try:
date, (l, h, rf) = line[0], map(float, line[1:])
m, d, y = map(int, date.split('/'))
except ValueError as e:
print('Skipping line: %s [because of: %s]' % (line, e))
continue
data.append([d, m, y, l, h, rf])
Hopefully this is enough to get you back on track ;-)
Review your csv file.
It is hard to say without seeing what's in the file, but the most likely explanation (according to the error message) is that you have a line for which the forth value is empty, e.g:
2018-01-30,5,12,
So the rf variable would be empty when parsing that line, and the you would get that ValueError when trying to cast the value as a float.
Also, some advice on how to do it better:
You may want to split your line first, then count how many data fields it has, and then discarding it before assigning the whole line to date, l, h, rf. Something like this:
`
for line in infile:
# Get data from line. Use strip() to avoid whitespace
items = line.strip().split(',')
if len(items) != 4:
# skip the iteration - do nothing with that malformed line
continue
date, l, h, rf = items
`
You may want to have a look at the csv module for reading/writing csv files easily.
The error means you that the string you are trying to cast a float is actually not a number. In your case, it looks like it's an empty string. It's probably because the last line of your file is empty, so you can check it at the beginning of your loop and break or continue if it is. An other strategy would be to catch the error, but it would then ignore a malformed line when you could want to be alerted of it, so it's up to you to pick the one that suites you.
Using square brackets also puts your values in a list, not in a tuple. You need parenthesis for that.
And you should also close your files when you are done.
Python also has a CSV module you may find useful.
#Open File
filename = input("Enter the name of the data file: ")
infile = open(filename, 'r')
#Read in file
datalist = []
for line in infile:
if line.strip() == '': # If the line only contains spaces
continue # Then, just skip it
# Some stuff ...
# Put everything in a tuple that we add to our list
datalist.append((day,month,year,min_temp,max_temp,rainfall))
infile.close() # Close the file
If I have a text file that has a bunch of random text before I get to the stuff I actually want, how do I move the file pointer there?
Say for example my text file looks like this:
#foeijfoijeoijoijfoiej ijfoiejoi jfeoijfoifj i jfoei joi jo ijf eoij oie jojf
#feoijfoiejf ioj oij oi jo ij i joi jo ij oij #### oijroijf 3## # o
#foeijfoiej i jo i iojf 3 ## #io joi joij oi j## io joi joi j3# 3i ojoi joij
# The stuff I care about
(The hashtags are a part of the actual text file)
How do I move the file pointer to the line of stuff I care about, and then how would I get python to tell me the number of the line, and start the reading of the file there?
I've tried doing a loop to find the line that the last hashtag is in, and then reading from there, but I still need to get rid of the hashtag, and need the line number.
Try using the readlines function. This will return a list containing each line. You can use a for loop to parse through each line, searching for what you need, then obtain the number of the line via its index in the list. For instance:
with open('some_file_path.txt') as f:
contents = f.readlines()
object = '#the line I am looking for'
for line in contents:
if object in line:
line_num = contents.index(object)
To get rid of the pound sign, just use the replace function. Eg. new_line = line.replace('#','')
You can't seek to it directly without knowing the size of the junk data or scanning through the junk data. But it's not too hard to wrap the file in itertools.dropwhile to discard lines until you see the "good" data, after which it iterates through all remaining lines:
import itertools
# Or def a regular function that returns True until you see the line
# delimiting the beginning of the "good" data
not_good = '# The stuff I care about\n'.__ne__
with open(filename) as f:
for line in itertools.dropwhile(not_good, f):
... You'll iterate the lines at and after the good line ...
If you actually need the file descriptor positioned appropriately, not just the lines, this variant should work:
import io
with open(filename) as f:
# Get first good line
good_start = next(itertools.dropwhile(not_good, f))
# Seek back to undo the read of the first good line:
f.seek(-len(good_start), io.SEEK_CUR)
# f is now positioned at the beginning of the line that begins the good data
You can tweak this to get the actual line number if you really need it (rather than just needing the offset). It's a little less readable though, so explicit iteration via enumerate may make more sense if you need to do it (left as exercise). The way to make Python work for you is:
from future_builtins import map # Py2 only
from operator import itemgetter
with open(filename) as f:
linectr = itertools.count()
# Get first good line
# Pair each line with a 0-up number to advance the count generator, but
# strip it immediately so not_good only processes lines, not line nums
good_start = next(itertools.dropwhile(not_good, map(itemgetter(0), zip(f, linectr))))
good_lineno = next(linectr) # Keeps the 1-up line number by advancing once
# Seek back to undo the read of the first good line:
f.seek(-len(good_start), io.SEEK_CUR)
# f is now positioned at the beginning of the line that begins the good data
I'm having some trouble with this problem. I'm a beginner to Python and have been searching this website but can't seem to figure out how to do this specific problem. If I have a file that looks something like this:
3
Alpha
Beta
Gamma
4
Delta
Epsilon
Omega
Zeta
I want to read that first integer (in this case, 3 but it could vary) and print it, and then read the next three lines (Alpha, Beta, Gamma) and print those. After that, I would want to read the next integer (in this case, 4) and then read the next four lines (Delta, Epsilon, Omega, Zeta) and print those.
I think I've figured out how to do it if that integer is fixed, but I'm not sure how to do it if that integer is a variable and could be anything. Here is what I have for if the integer is fixed:
with open('myfile.txt') as input_data:
for line in input_data:
if line.strip() == '3':
print "3"
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == '3':
break
print line
I would put this in a while loop that loops over the whole file, reading (and printing) an integer x, reading (and printing) the next x lines, then reading integer y, reading the next y lines, etc. Since I could be dealing with big files, it seems like reading line by line is the way to go with f.readline(), so after I read the first integer and numbers after it, I should read the next one.
Any help would be appreciated a lot. Thanks!
edit:
I would want to read from the file that has my data (3 Alpha Beta Gamma 4 etc). I would then read 3 (or any integer), which would then signal to read Alpha Beta Gamma. I would then write this to a file. Then, I would read 4 (or any integer again), which would then signal to read Delta Epsilon Omega Zeta. I would go through the rest of the file like this.
I have some idea for the individual parts, so after reading from file "myfile.txt", I would write this output to "output.txt":
with open('myfile.txt', 'r') as fin:
alllines = fin.readlines();
with open('output.txt', 'w') as fout:
for i in range(len(alllines)):
fout.write(alllines[i]);
I could also just do fin.readline() and read it line by line.
If your goal is to read a file that is perfectly formatted such that the first line is a number N, the next N lines after the first are words to print, and the line after that is a number M, and the next M lines are the ones you want to print (and on and on), then the following would work:
with open('myfile.txt') as input_data:
lines_to_print = int(input_data.readline().strip())
for x in xrange(lines_to_print):
print input_data.readline()
This only works in the hypothetical case you presented.
This would print :
Alpha
Beta
Gamma
Delta
Epsilon
Omega
Zeta
Although, if your goal is to print everything that's not a number, then you could also do this:
with open('myfile.txt') as input_data:
for line in input_data:
if not line.isdigit():
print line
I don't want to edit my original answer, so here's a new one instead.
Step one is to open and read the file, line by line:
with open('myfile.txt') as input_data:
while True:
line = input_data.readline()
print line
if not line:
break
## this is where you do stuff, like file writing and whatnot
if not line: will exit the loop if there's no more lines. In your "do stuff" section above, first thing is to check if it's a number:
## lineIsANumber = (check if "line" is a number)
If it is a number, then start another loop
if lineIsANumber:
for x in xrange(number_read_from_file):
## read next line
## print line
## write to a file
The tricky part is writing to a different file each time. One thing you can do is name your files "1" "2" etc... for the order that they're created. You can use an int and increment it each time lineIsANumber is true. You would need to keep track of this in the main loop, the while True at the top level.
If you try this and get stuck I can help further
I have a file like this below.
0 0 0
0.00254 0.00047 0.00089
0.54230 0.87300 0.74500
0 0 0
I want to modify this file. If a value is less than 0.05, then a value is to be 1. Otherwise, a value is to be 0.
After python script runs, the file should be like
1 1 1
1 1 1
0 0 0
1 1 1
Would you please help me?
OK, since you're new to StackOverflow (welcome!) I'll walk you through this. I'm assuming your file is called test.txt.
with open("test.txt") as infile, open("new.txt", "w") as outfile:
opens the files we need, our input file and a new output file. The with statement ensures that the files will be closed after the block is exited.
for line in infile:
loops through the file line by line.
values = [float(value) for value in line.split()]
Now this is more complicated. Every line contains space-separated values. These can be split into a list of strings using line.split(). But they are still strings, so they must be converted to floats first. All this is done with a list comprehension. The result is that, for example, after the second line has been processed this way, values is now the following list: [0.00254, 0.00047, 0.00089].
results = ["1" if value < 0.05 else "0" for value in values]
Now we're creating a new list called results. Each element corresponds to an element of values, and it's going to be a "1" if that value < 0.05, or a "0" if it isn't.
outfile.write(" ".join(results))
converts the list of "integer strings" back to a string, separated by 7 spaces each.
outfile.write("\n")
adds a newline. Done.
The two list comprehensions could be combined into one, if you don't mind the extra complexity:
results = ["1" if float(value) < 0.05 else "0" for value in line.split()]
if you can use libraries I'd suggest numpy :
import numpy as np
myarray = np.genfromtxt("my_path_to_text_file.txt")
my_shape = myarray.shape()
out_array = np.where(my_array < 0.05, 1, 0)
np.savetxt(out_array)
You can add formating as arguments to the savetxt function. The docstrings of the function are pretty self explanatory.
If you are stuck with pure python :
with open("my_path_to_text_file") as my_file:
list_of_lines = my_file.readlines()
list_of_lines = [[int( float(x) < 0.05) for x in line.split()] for line in list_of_lines]
then write that list to file as you see fit.
You can use this code
f_in=open("file_in.txt", "r") #opens a file in the reading mode
in_lines=f_in.readlines() #reads it line by line
out=[]
for line in in_lines:
list_values=line.split() #separate elements by the spaces, returning a list with the numbers as strings
for i in range(len(list_values)):
list_values[i]=eval(list_values[i]) #converts them to floats
# print list_values[i],
if list_values[i]<0.05: #your condition
# print ">>", 1
list_values[i]=1
else:
# print ">>", 0
list_values[i]=0
out.append(list_values) #stores the numbers in a list, where each list corresponds to a lines' content
f_in.close() #closes the file
f_out=open("file_out.txt", "w") #opens a new file in the writing mode
for cur_list in out:
for i in cur_list:
f_out.write(str(i)+"\t") #writes each number, plus a tab
f_out.write("\n") #writes a newline
f_out.close() #closes the file
The following code performs the replacements in-place: for that , the file is opened in 'rb+' mode. It's absolutely mandatory to open it in binary mode b. The + in 'rb+' means that it's possible to write and to read in the file. Note that the mode can be written 'r+b' also.
But using 'rb+' is awkward:
if you read with for line in f , the file is read by chunks and several lines are kept in the buffer where they are really read one after the other, until another chunk of data is read and loaded in the buffer. That makes it harder to perform transformations, because one must follow the position of the file's pointer with the help of tell() and to move the pointer with seek() and in fact I've not completly understood how it must done.
.
Happily, there's a solution with replace(), because , I don't know why, but I believe the facts, when readline() reads a line, the file 's pointer doesn't go further on disk than the end of the line (that is to say it stops at the newline).
Now it's easy to move and know positions of the file's pointer
to make writing after reading, it's necessary to make seek() being executed , even if it should be to do seek(0,1), meaning a move of 0 caracters from the actual position. That must change the state of the file's pointer, something like that.
Well, for your problem, the code is as follows:
import re
from os import fsync
from os.path import getsize
reg = re.compile('[\d.]+')
def ripl(m):
g = m.group()
return ('1' if float(g)<0.5 else '0').ljust(len(g))
path = ...........'
print 'length of file before : %d' % getsize(path)
with open('Copie de tixti.txt','rb+') as f:
line = 'go'
while line:
line = f.readline()
lg = len(line)
f.seek(-lg,1)
f.write(reg.sub(ripl,line))
f.flush()
fsync(f.fileno())
print 'length of file after : %d' % getsize(path)
flush() and fsync() must be executed to ensure that the instruction f.write(reg.sub(ripl,line)) effectively writes at the moment it is ordred to.
Note that I've never managed a file encoded in unicode like. It's certainly still more dificult since every unicode character is encoded on several bytes (and in the case of UTF8 , variable number of bytes depending on the character)