Update: My current question is how can I get my code to read to the EOF starting from the beginning with each new search phrase.
This is an assignment I am doing and currently stuck on. Mind you this is a beginner's programming class using Python.
jargon = open("jargonFile.txt","r")
searchPhrase = raw_input("Enter the search phrase: ")
while searchPhrase != "":
result = jargon.readline().find(searchPhrase)
if result == -1:
print "Cannot find this term."
else:
print result
searchPhrase = raw_input("Enter the search phrase: ")
jargon.close()
The assignment is to take a user's searchPhrase and find it in a file (jargonFile.txt) and then have it print the result (which is the line it occured and the character occurence). I will be using a counter to find the line number of the occurence but I will implement this later. For now my question is the error I am getting. I cann't find a way for it to search the entire file.
Sample run:
Enter the search phrase: dog
16
Enter the search phrase: hack
Cannot find this term.
Enter the search phrase:
"dog" is found in the first line however it is also found in other lines of the jargonFile (multiple times as a string) but it is only showing the first occurence in the first line. The string hack is found numerous times in the jargonFile but my code is setup to only search the first line. How may I go about solving this problem?
If this is not clear enough I can post up the assignment if need be.
First you open the file and read it into a string with readline(). Later on you try to readline() from the string you obtained in the first step.
You need to take care what object (thing) you're handling: open() gave you a file "jargon", readline on jargon gave you the string "jargonFile".
So jargonFile.readline does not make sense anymore
Update as answer to comment:
Okay, now that the str error problem is solved think about the program structure:
big loop
enter a search term
open file
inner loop
read a line
print result if string found
close file
You'd need to change your program so it follows that descripiton
Update II:
SD, if you want to avoid reopening the file you'd still need two loops, but this time one loop reads the file into memory, when that's done the second loop asks for the search term. So you would structure it like
create empty list
open file
read loop:
read a line from the file
append the file to the list
close file
query loop:
ask the user for input
for each line in the array:
print result if string found
For extra points from your professor add some comments to your solution that mention both possible solutions and say why you choose the one you did. Hint: In this case it is a classic tradeoff between execution time (memory is fast) and memory usage (what if your jargon file contains 100 million entries ... ok, you'd use something more complicated than a flat file in that case, bu you can't load it in memory either.)
Oh and one more hint to the second solution: Python supports tuples ("a","b","c") and lists ["a","b","c"]. You want to use the latter one, because list can be modified (a tuple can't.)
myList = ["Hello", "SD"]
myList.append("How are you?")
foreach line in myList:
print line
==>
Hello
SD
How are you?
Okay that last example contains all the new stuff (define list, append to list, loop over list) for the second solution of your program. Have fun putting it all together.
Hmm, I don't know anything at all about Python, but it looks to me like you are not iterating through all the lines of the file for the search string entered.
Typically, you need to do something like this:
enter search string
open file
if file has data
start loop
get next line of file
search the line for your string and do something
Exit loop if line was end of file
So for your code:
jargon = open("jargonFile.txt","r")
searchPhrase = raw_input("Enter the search phrase: ")
while searchPhrase != "":
<<if file has data?>>
<<while>>
result = jargon.readline().find(searchPhrase)
if result == -1:
print "Cannot find this term."
else:
print result
<<result is not end of file>>
searchPhrase = raw_input("Enter the search phrase: ")
jargon.close()
Cool, did a little research on the page DNS provided and Python happens to have the "with" keyword. Example:
with open("hello.txt") as f:
for line in f:
print line
So another form of your code could be:
searchPhrase = raw_input("Enter the search phrase: ")
while searchPhrase != "":
with open("jargonFile.txt") as f:
for line in f:
result = line.find(searchPhrase)
if result == -1:
print "Cannot find this term."
else:
print result
searchPhrase = raw_input("Enter the search phrase: ")
Note that "with" automatically closes the file when you're done.
Your file is jargon, not jargonFile (a string). That's probably what's causing your error message. You'll also need a second loop to read each line of the file from the beginning until you find the word you're looking for. Your code currently stops searching if the word is not found in the current line of the file.
How about trying to write code that only gives the user one chance to enter a string? Input that string, search the file until you find it (or not) and output a result. After you get that working you can go back and add the code that allows multiple searches and ends on an empty string.
Update:
To avoid iterating the file multiple times, you could start your program by slurping the entire file into a list of strings, one line at a time. Look up the readlines method of file objects. You can then search that list for each user input instead of re-reading the file.
you shouldn't try to re-invent the wheel. just use the
re module functions.
your program could work better if you used:
result = jargon.read() .
instead of:
result = jargon.readline() .
then you could use the re.findall() function
and join the strings (with the indexes) you searched for with str.join()
this could get a little messy but if take some time to work it out, this could fix your problem.
the python documentation has this perfectly documented
Everytime you enter a search phrase, it looks for it on the next line, not the first one. You need to re-open the file for every search phrase, if you want it behave like you describe.
Take a look at the documentation for File objects:
http://docs.python.org/library/stdtypes.html#file-objects
You might be interested in the readlines method. For a simple case where your file is not enormous, you could use that to read all the lines into a list. Then, whenever you get a new search string, you can run through the whole list to see whether it's there.
Related
I am very new to Python and am looking for assistance to where I am going wrong with an assignment. I have attempted different ways to approach the problem but keep getting stuck at the same point(s):
Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
Problem 2: When I try and combine the lists I keep receiving "None" for my result or Nonetype errors [which I think means I have added the None's together(?)].
The assignment is:
#8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.You can download the sample data at http://www.py4e.com/code3/romeo.txt
My current code which is giving me a Nonetype error is:
poem = input("enter file:")
play = open(poem)
lst= list()
for line in play:
line=line.rstrip()
word=line.split()
if not word in lst:
lst= lst.append(word)
print(lst.sort())
If someone could just talk me through where I am going wrong that will be greatly appreciated!
your problem was lst= lst.append(word) this returns None
with open(poem) as f:
lines = f.read().split('\n') #you can also you readlines()
lst = []
for line in lines:
words = line.split()
for word in words:
if word:
lst.append(word)
Problem 1: When I am trying to create a list of words from a file, I keep making a list for the words per line rather than the entire file
You are doing play = open(poem) then for line in play: which is method for processing file line-by-line, if you want to process whole content at once then do:
play = open(poem)
content = play.read()
words = content.split()
Please always remember to close file after you used it i.e. do
play.close()
unless you use context manager way (i.e. like with open(poem) as f:)
Just to help you get into Python a little more:
You can:
1. Read whole file at once (if it is big it is better to grab it into RAM if you have enough of it, if not grab as much as you can for the chunk to be reasonable, then grab another one and so on)
2. Split data you read into words and
3. Use set() or dict() to remove duplicates
Along the way, you shouldn't forget to pay attention to upper and lower cases,
if you need same words, not just different not repeating strings
This will work in Py2 and Py3 as long as you do something about input() function in Py2 or use quotes when entering the path, so:
path = input("Filename: ")
f = open(filename)
c = f.read()
f.close()
words = set(x.lower() for x in c.split()) # This will make a set of lower case words, with no repetitions
# This equals to:
#words = set()
#for x in c.split():
# words.add(x.lower())
# set() is an unordered datatype ignoring duplicate items
# and it mimics mathematical sets and their properties (unions, intersections, ...)
# it is also fast as hell.
# Checking a list() each time for existance of the word is slow as hell
#----
# OK, you need a sorted list, so:
words = sorted(words)
# Or step-by-step:
#words = list(words)
#words.sort()
# Now words is your list
As for your errors, do not worry, they are common at the beginning in almost any objective oriented language.
Other explained them well in their comments. But not to make the answer lacking...:
Always pay attention on functions or methods which operate on the datatype (in place sort - list.sort(), list.append(), list.insert(), set.add()...) and which ones return a new version of the datatype (sorted(), str.lower()...).
If you ran into a similar situation again, use help() in interactive shell to see what exactly a function you used does.
>>> help(list.append)
>>> help(list.sort)
>>> help(str.lower)
>>> # Or any short documentation you need
Python, especially Python 3.x is sensitive to trying operations between types, but some might have a different connotation and can actually work while doing unexpected stuff.
E.g. you can do:
print(40*"x")
It will print out 40 'x' characters, because it will create a string of 40 characters.
But:
print([1, 2, 3]+None)
will, logically not work, which is what is happening somewhere in the rest of your code.
In some languages like javascript (terrible stuff) this will work perfectly well:
v = "abc "+123+" def";
Inserting the 123 seamlessly into the string. Which is usefull, but a programming nightmare and nonsense from another viewing angle.
Also, in Py3 a reasonable assumption from Py2 that you can mix unicode and byte strings and that automatic cast will be performed is not holding.
I.e. this is a TypeError:
print(b"abc"+"def")
because b"abc" is bytes() and "def" (or u"def") is str() in Py3 - what is unicode() in Py2)
Enjoy Python, it is the best!
In Codio, there is a challenge to take a file, search for the number of times a
string appears in it, then print that number. I was able to get the result using some suggestions, but I am still unclear on a few things.
The main question is, at what point is the loop searching for the substring S? The count() syntax that I see everywhere involves using the string to be searched, followed by the dot operator, and then the function with the substring we want to find as the parameter. It would look something like: P.count(S)
What confuses me is that the function is using line in place of P. So does this mean the function is searching line for the substring? And if so, how does that work if line is simply the counter variable for the for loop? I just want to have a clearer understanding of how this function is working in this context to get me the correct count of times that substring S appears in file P.
import sys
P= sys.argv[1]
S= sys.argv[2]
# Your code goes here
f = open(P, 'r')
c = 0
for line in f.readlines():
c += line.count(S)
print(c)
does this mean the function is searching "line" for the substring
Yes, that's exactly what it means. And the value of line changes in every loop iteration.
And if so, how does that work if "line" is simply the counter variable for the "for" loop
It's not. Python for loops don't have counters. line is the actual line of text.
for letter in ['A', 'B', 'C']:
print(letter)
prints
A
B
C
Let's dissect the loop:
for line in f.readlines():
c += line.count(S)
f is a file descriptor of your open file.
readlines is a generator, a function sort of thing that returns the lines of the file. If you think of it as a list of strings, each of which is a line of the file, you'll be close enough to understand the loop operation.
Thus, the statement for line in f.readlines(): iterates the variable line through the file contents; on each loop iteration, line will be the appropriate string value, the next line of the file.
Therefore, line.count(S) returns the quantity of times the target string S appears in that line of the file. The increment c += adds that to your counter.
Does that make things clear enough?
BTW, please learn to use descriptive variable names. One-letter names with mixed upper- and lower-case are a bad habit in the long run.
I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?
readline() returns a list. Try print(test[0].strip())
You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()
Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!
You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')
I'm writing a script that logs errors from another program and restarts the program where it left off when it encounters an error. For whatever reasons, the developers of this program didn't feel it necessary to put this functionality into their program by default.
Anyways, the program takes an input file, parses it, and creates an output file. The input file is in a specific format:
UI - 26474845
TI - the title (can be any number of lines)
AB - the abstract (can also be any number of lines)
When the program throws an error, it gives you the reference information you need to track the error - namely, the UI, which section (title or abstract), and the line number relative to the beginning of the title or abstract. I want to log the offending sentences from the input file with a function that takes the reference number and the file, finds the sentence, and logs it. The best way I could think of doing it involves moving forward through the file a specific number of times (namely, n times, where n is the line number relative to the beginning of the seciton). The way that seemed to make sense to do this is:
i = 1
while i <= lineNumber:
print original.readline()
i += 1
I don't see how this would make me lose data, but Python thinks it would, and says ValueError: Mixing iteration and read methods would lose data. Does anyone know how to do this properly?
You get the ValueError because your code probably has for line in original: in addition to original.readline(). An easy solution which fixes the problem without making your program slower or consume more memory is changing
for line in original:
...
to
while True:
line = original.readline()
if not line: break
...
Use for and enumerate.
Example:
for line_num, line in enumerate(file):
if line_num < cut_off:
print line
NOTE: This assumes you are already cleaning up your file handles, etc.
Also, the takewhile function could prove useful if you prefer a more functional flavor.
Assuming you need only one line, this could be of help
import itertools
def getline(fobj, line_no):
"Return a (1-based) line from a file object"
return itertools.islice(fobj, line_no-1, line_no).next() # 1-based!
>>> print getline(open("/etc/passwd", "r"), 4)
'adm:x:3:4:adm:/var/adm:/bin/false\n'
You might want to catch StopIteration errors (if the file has less lines).
Here's a version without the ugly while True pattern and without other modules:
for line in iter(original.readline, ''):
if …: # to the beginning of the title or abstract
for i in range(lineNumber):
print original.readline(),
break
I am searching a particular line in a file.
If my required line is not present i want to print that line is missing in the file.
For example my file contains below lines:
list 0
list 7
list 2
list 5
Here is the I have written so far :
fo=open(filename,"r")
for i in range(0,6):
str="list"+str(i)
for line in fo.readlines():
if not str in line:
print "%s%s" %(str,"is missing in file"
please anyone help me
The first problem is that list+str(i) isn't going to work, unless you happen to have defined list = 'list ' somewhere earlier.
The second problem is that by naming your variable str, you're hiding the function str, which means you can't call that function the next time through the loop.
The third problem is that you only open the file once, but you call readlines() on it 7 times. After the first time, there are no more lines to read, so you'll get an empty list back. Just call it once, outside the loop, and store the value: lines = fo.readlines(). Or, alternatively, reopen the file each time through the loop, instead of just once.
The third problem is that you're going to print the output once for every line that doesn't, instead of just one if any line doesn't match. This one is the only part that's tricky, so I'll come back to it.
The fourth problem is that your print statement is missing a ).
Finally, you've tagged your question with both python-2.7 and python-3.x. I'll assume you weren't just trying to throw on every tag in the world in hopes that would get more viewers, and actually want your code to run under both 2.7 and 3.3. In that case, you can't use print as a statement; you have to use it as a function.
So, how do you say "if the string is not in any of the lines"? The easy way is with the any function:
if not any(s in line for line in lines):
If you can't understand that, you can get the same effect by writing out the loop explicitly, something like this:
found = False
for line in lines:
if s in line:
found = True
break
if not found:
There are a lot of other problems that you should fix (e.g., close the file—ideally by using a with statement; avoid readlines when possible; etc.), but no more that you need to fix. So, here's a minimally-edited working version:
fo=open(filename,"r")
lines = fo.readlines()
for i in range(0,6):
s='list '+str(i)
if not any(s in line for line in lines):
print("%s%s" %(s,"is missing in file"))
You should never name a variable str as str is a builtin type in Python. Also, you were attempting to concatenate list, a function and a type, and a string. Instead you want to concatenate the string 'list ' with the string representation of the numbers. You don't need to call readlines() where you did. Instead, read the file into a list early and iterate over that list.
fo=open(filename,"r").readlines()
for i in range(0,6):
s = 'list ' + str(i)
foundline = False
for line in fo:
if s in line:
foundline = True
break
if not foundline:
print "%s%s" % (s,"is missing in file")
with open(filename,'r') as infile:
inlines=infile.readlines()
for term in xrange(0,10):
line='list %d\n' % term
if line not in inlines:
print line,'is not in the file'
//nb, this matches the entire search string to the input line. I assume you mean this rather than to look for the search term inside the input line, becuase searching for 'line 1' inside the line would match against 'line 1', 'line 10', 'line 1000' etc..