I have CSV files that contain numerous values that I want to reference. I wanted to parse them succinctly using eval. Here's what I tried:
line = fileHandle.readline()
while line != "":
if line != "\n":
parameter = line.split(',')[0]
value = line.split(',')[2].replace("\n", "")
eval("%s = \"%s\"" % (parameter, value))
print(parameter + " = " + eval(parameter)) # a quick test
line = fileHandle.readline()
What I get is:
Traceback (innermost last):
File "<string>", line 73, in ?
File "<string>", line 70, in createJMSProviders
File "<string>", line 49, in createJMSProviderFromFile
File "<string>", line 1
externalProviderURL="tibjmsnaming://..."
^
SyntaxError: invalid syntax
I reads to me like it is not possible to eval("externalProviderURL=\"tibjmsnaming://...\""). What is wrong with this statement?
As per S.Lott's suggestion, here is how I would solve this issue. I might be simplifying a little bit. If so, I apologize, but I haven't seen your data.
import csv
my_dict = {}
with open('my/data.csv') as f:
my_reader = csv.reader(f)
for row in my_reader:
my_dict[row[0]] = row[2]
As you can see, there are a number of differences from your code here. First of all, I'm using Python's with statement, which is a good habit to get into when working with files. Second, I'm using python's csv module to create a reader object, which you can iterate over in a for loop. This is significantly more pythonic than using a while loop. Finally, and probably most relevantly, I'm adding these values to a dictionary, rather than trying to plop them into variables in the global scope. To access these parameters, you can simply do the following:
my_dict['externalProviderURL']
However, you get a lot more than this. Storing your values in an actual data structure will allow you use all of it's built in methods. For example, you can go back and iterate over it's keys and values
for key, value in my_dict.iteritems():
print key
print value
Pythonic code often involves a significant use of dictionaries. They're finely tuned for performance, and are made particularly useful since most anything can be stored as a value in the dictionary (lists, other dictionaries, functions, classes etc.)
eval() is for evaluation Python expressions, and assignment (a = 1) is a statement.
You'll want exec().
>>> exec("externalProviderURL=\"tibjmsnaming://...\"")
>>> externalProviderURL
'tibjmsnaming://...'
(FYI, to use eval() you'd have to do externalProviderURL=eval("\"tibjmsnaming://...\""), but it looks like your situation is more suited to exec).
Related
X = corpus.get("Andrew Shapiro")
testsite_array = []
with X as my_file:
for line in my_file:
testsite_array.append(line)
where corpus is a dictonary and Andrew Shapiro is an item in it. It gives me following error.
File "C:/Users/Vrushab PC/Downloads/Dissertation/untitled0.py", line 71, in <module>
with X as my_file:
AttributeError: __enter__
In order to use the with statement, the object, X in this case, the object must have implemented the enter method and exit method. The whole point is that it allow for the object to clean it's self up even in the case of an exception. Think try:except:finally done much more cleanly.
In order to answer your question though, I would need to know what you expected X to be. You named your temporary placeholder for it as my_file so is X supposed to be a file path that you want to open or something?
A full example of what you are attempting to do would be helpful.
Generally though, you would use the with statement for doing things like opening files like this:
with open(X, 'r') as my_file:
...
Tahoe
I noticed that if I iterate over a file that I opened, it is much faster to iterate over it without "read"-ing it.
i.e.
l = open('file','r')
for line in l:
pass (or code)
is much faster than
l = open('file','r')
for line in l.read() / l.readlines():
pass (or code)
The 2nd loop will take around 1.5x as much time (I used timeit over the exact same file, and the results were 0.442 vs. 0.660), and would give the same result.
So - when should I ever use the .read() or .readlines()?
Since I always need to iterate over the file I'm reading, and after learning the hard way how painfully slow the .read() can be on large data - I can't seem to imagine ever using it again.
The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read() reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.
f.readline() reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline() also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.
Using the syntax for line in f: allows the user to iterate over the file line by line as noted in the question.
(As noted in the other answer, this documentation is a very good read):
https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
Note:
It was previously claimed that f.readline() could be used to skip a line during a for loop iteration. However, this doesn't work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.
Hope this helps!
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory
Sorry for all the edits!
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
for line in f:
print line,
This is the first line of the file.
Second line of the file
Note that readline() is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.
I ran timeit on two identical snippts but one with for-loop and the other with readlines(). You can see my snippet below:
def test_read_file_1():
f = open('ml/README.md', 'r')
for line in f.readlines():
print(line)
def test_read_file_2():
f = open('ml/README.md', 'r')
for line in f:
print(line)
def test_time_read_file():
from timeit import timeit
duration_1 = timeit(lambda: test_read_file_1(), number=1000000)
duration_2 = timeit(lambda: test_read_file_2(), number=1000000)
print('duration using readlines():', duration_1)
print('duration using for-loop:', duration_2)
And the results:
duration using readlines(): 78.826229238
duration using for-loop: 69.487692794
The bottomline, I would say, for-loop is faster but in case of possibility of both, I'd rather readlines().
readlines() is better than for line in file when you know that the data you are interested starts from, for example, 2nd line. You can simply write readlines()[1:].
Such use cases are when you have a tab/comma separated value file and the first line is a header (and you don't want to use additional module for tsv or csv files).
#The difference between file.read(), file.readline(), file.readlines()
file = open('samplefile', 'r')
single_string = file.read() #Reads all the elements of the file
#into a single string(\n characters might be included)
line = file.readline() #Reads the current line where the cursor as a string
#is positioned and moves to the next line
list_strings = file.readlines()#Makes a list of strings
I wrote this little Python 2.7 prototype script to try and read specified lines (in this example lines 3,4,5) from a formatted input file. I am going to be later parsing data from this and operating on the input to construct other files.
from sys import argv
def comparator (term, inputlist):
for i in inputlist:
if (term==i):
return True
print "fail"
return False
readthese = [3,4,5]
for filename in argv[1:]:
with open(filename) as file:
for line in file:
linenum=#some kind of way to get line number from file
if comparator(linenum, readthese):
print(line)
I fixed all the errors I had found with the script but currently I don't see anyway to get a line number from file. It's a bit different than pulling the line number from a file object since file is a class not an object if I'm not mistakened. Is there someway I can pull the the line number for my input file?
I think a lot of my confusion probably stems from what I did with my with statement so if someone could also explain what exactly I have done with that line that would be great.
You could just enumerate the file object since enumerate works with anything iterable...
for line_number, line in enumerate(file):
if comparator(line_number, line):
print line
Note, this indexes starting at 0 -- If you want the first line to be 1, just tell enumerate that's where you want to start:
for line_number, line in enumerate(file, 1):
...
Note, I'd recommend not using the name file -- On python2.x, file is a type so you're effectively shadowing a builtin (albeit a rarely used one...).
You could also use the list structure's index itself like so:
with open('a_file.txt','r') as f:
lines = f.readlines()
readthese = [3,4,5]
for lineno in readthese:
print(lines[1+lineno])
Since the list of lines already implicitly contains the line numbers based on index+1
If the file is too large to hold in memory you could also use:
readthese = [3,4,5]
f = open('a_file.txt','r')
for lineno in readthese:
print(f.readline(lineno+1))
f.close()
I am python beginner struggling to create and save a list containing tuples from csv file in python.
The code I got for now is:
def load_file(filename):
fp = open(filename, 'Ur')
data_list = []
for line in fp:
data_list.append(line.strip().split(','))
fp.close()
return data_list
and then I would like to save the file
def save_file(filename, data_list):
fp = open(filename, 'w')
for line in data_list:
fp.write(','.join(line) + '\n')
fp.close()
Unfortunately, my code returns a list of lists, not a list of tuples... Is there a way to create one list containing multiple tuples without using csv module?
split returns a list, if you want a tuple, convert it to a tuple:
data_list.append(tuple(line.strip().split(',')))
Please use the csv module.
First question: why is a list of lists bad? In the sense of "duck-typing", this should be fine, so maybe you think about it again.
If you really need a list of tuples - only small changes are needed.
Change the line
data_list.append(line.strip().split(','))
to
data_list.append(tuple(line.strip().split(',')))
That's it.
If you ever want to get rid of custom code (less code is better code), you could stick to the csv-module. I'd strongly recommend using as many library methods as possible.
To show-off some advanced Python features: your load_file-method could also look like:
def load_file(filename):
with open(filename, 'Ur') as fp:
data_list = [tuple(line.strip().split(",") for line in fp]
I use a list comprehension here, it's very concise and easy to understand.
Additionally, I use the with-statement, which will close your file pointer, even if an exception occurred within your code. Please always use with when working with external resources, like files.
Just wrap "tuple()" around the line.strip().split(',') and you'll get a list of tuples. You can see it in action in this runnable gist.
I have file with contents in list form such as
[1,'ab','fgf','ssd']
[2,'eb','ghf','hhsd']
[3,'ag','rtf','ssfdd']
I want to read that file line by line using f.readline and assign each line to a list.
I tried doing this:
k=[ ]
k=f.readline()
print k[1]
I expected a result to show 2nd element in the list in first line, but it showed the first bit and gave o/p as '1'.
How to get the expected output?
If all you want is to take the input format shown and store it as a list attempting to execute the input file (with eval()) is not a good idea. This leaves your program open to all sorts of accidentally and intentionally harmful input. You are better advised to just parse the input file:
s=f.readline()[1:-1]
k = s.split(',')
print k[1]
readline just returns strings. You need to cast it to what you want. eval does the job, be warned that however it does execute everything inside the string, so this is only an option if you trust the input (i.e. you've saved it yourself).
If you need to save data from your program to a file, you might want to use pickle.
if the sample posted is actual content of your file (which I highly doubt), here is what you could do starting with Python 2.6, docs:
>>> for line in open(fname):
print(ast.literal_eval(line)[1])
ab
eb
ag
You could use eval on each line; this would evaluate the the expression in the line and should yield your expected list, if the formatting is correct.
A safer solution would be a simple CSV parser. For that your input could look something like this (comma-separated):
123,321,12,123,321,'asd',ewr,'afdg','et al',213
Maybe this is feasible.
Maybe You can use eval as suggested, but I'm just curious: Is there any reason not to use JSON as file format?
You can use the json module:
import json
with open('lists.txt', 'r') as f:
lines = f.readlines()
for line in lines:
line = line.replace("'", '"')
l = json.loads(line)
print l[1]
Outputs:
ab
eb
ag