reading lines of a file between two words - python

I have a file containing numbers and 2 words : "start" and "middle"
I want to read numbers from "start" to "middle" in one array and numbers from "middle" to end of the file into another array.
this is my python code:
with open("../MyList","r") as f:
for x in f.readlines():
if x == "start\n":
continue
if x == "middle\n":
break
x = x.split("\n")[0]
list_1.append(int(x))
print list_1
for x in f.readlines():
if x == "middle\n":
continue
list_2.append(int(x))
print list_2
but the problem is that my program never enters second loop and jumps to
print list_2
I searched in older questions but can not figure out the problem.

Its because you are reading the whole at the 1st loop, when it enter 2nd loop, file pointer is already at end of file and you will get an empty list from f.readlines().
You can fix that either by reopen the file or set the file pointer to the beginning of file again with f.seek(0) before the 2nd for loop
with open("../MyList","r") as f:
with open("../MyList","r") as f:
for x in f.readlines():
# process your stuff for 1st loop
# reset file pointer to beginning of file again
f.seek(0)
for x in f.readlines():
# process your stuff for 2nd loop
it will be not so efficient by reading whole file into memory if you are processing large file, you can just iterate over the file object instead of read all into memory like code below
list1 = []
list2 = []
list1_start = False
list2_start = False
with open("../MyList","r") as f:
for x in f:
if x.strip() == 'start':
list1_start = True
continue
elif x.strip() == 'middle':
list2_start = True
list1_start = False
continue
if list1_start:
list1.append(x.strip())
elif list2_start:
list2.append(x.strip())
print(list1)
print(list2)

Your first loop is reading the entire file to the end, but processes only half of it. When the second loop hits, the file pointer is already at the end, so no new lines are read.
From the python docs:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines
thus read. If the optional sizehint argument is present, instead of
reading up to EOF, whole lines totalling approximately sizehint bytes
(possibly after rounding up to an internal buffer size) are read.
Objects implementing a file-like interface may choose to ignore
sizehint if it cannot be implemented, or cannot be implemented
efficiently.
Either process everything in one loop, or read line-by-line (using readline instead of readlines).

You can read the whole file once in a list and later you can slice it.
if possible you can try this:
with open("sample.txt","r") as f:
list_1 = []
list_2 = []
fulllist = []
for x in f.readlines():
x = x.split("\n")[0]
fulllist.append(x)
print fulllist
start_position = fulllist.index('start')
middle_position = fulllist.index('middle')
end_position = fulllist.index('end')
list_1 = fulllist[start_position+1 :middle_position]
list_2 = fulllist[middle_position+1 :end_position]
print "list1 : ",list_1
print "list2 : ",list_2

Discussion
Your problem is that you read the whole file at once, and when you
start the 2nd loop there's nothing to be read...
A possible solution involves reading the file line by line, tracking
the start and middle keywords and updating one of two lists
accordingly.
This imply that your script, during the loop, has to mantain info about
its current state, and for this purpose we are going to use a
variable, code, that's either 0, 1 or 2 meaning no action,
append to list no. 1 or append to list no. 2, Because in the beginning
we don't want to do anything, its initial value must be 0
code = 0
If we want to access one of the two lists using the value of code as
a switch, we could write a test or, in place of a test, we can use a
list of lists, lists, containing a dummy list and two lists that are
updated with valid numbers. Initially all these inner lists are equal
to the empty list []
l1, l2 = [], []
lists = [[], l1, l2]
so that later we can do as follows
lists[code].append(number)
With these premises, it's easy to write the body of the loop on the
file lines,
read a number
if it's not a number, look if it is a keyword
if it is a keyword, change state
in any case, no further processing
if we have to append, append to the correct list
try:
n = int(line)
except ValueError:
if line == 'start\n' : code=1
if line == 'middle\n': code=2
continue
if code: lists[code].append(n)
We have just to add a bit of boilerplate, opening the file and
looping, that's all.
Below you can see my test data, the complete source code with all the
details and a test execution of the script.
Demo
$ cat start_middle.dat
1
2
3
start
5
6
7
middle
9
10
$ cat start_middle.py
l1, l2 = [], []
code, lists = 0, [[], l1, l2]
with open('start_middle.dat') as infile:
for line in infile.readlines():
try:
n = int(line)
except ValueError:
if line == 'start\n' : code=1
if line == 'middle\n': code=2
continue
if code: lists[code].append(n)
print(l1)
print(l2)
$ python start_middle.py
[5, 6, 7]
[9, 10]
$

Related

Python pop and append are not moving all elems in list1 to list 2

Why do pop and append not finish out the entire loop? My first guess was that pop didn't readjust the index of the original list, but that doesn't appear to be true when I print(txt[0]) to confirm it's still at the front. I'm trying to figure out why the below does not work. Thank you.
txt = 'shOrtCAKE'
txt = list(txt)
new_list = []
for x in txt:
value = txt.pop(0)
new_list.append(value)
print(new_list)
print(txt)
print(txt[0])
You shouldn't modify the list while iterating over it. Instead use this code
for x in txt:
value = x
new_list.append(value)
txt = [] # assuming you want txt to be empty for some reason
But then if you end up printing txt[0] you'll end up with error as the list index will be out of range
However you don't really need to be looping. Just do the following:
new_list = txt[:] # [:] ensures that any changes done to txt won't reflect in new_list
You should not remove elements from a list that you are iterating over. In this case, you are not even using the values of the list obtained during iteration.
There are various possibilities if you still want to use pop, which don't involve iterating over txt. For example:
Loop a fixed number of times (len(txt) computed at the start):
for _ in range(len(txt)):
new_list.append(txt.pop(0))
Loop while txt is not empty:
while txt:
new_list.append(txt.pop(0))
Loop until pop fails:
while True:
try:
new_list.append(txt.pop(0))
except IndexError:
break
Of course, you don't have to use pop. You could do this for example:
new_list.extend(txt) # add all the elements of the old list
txt.clear() # and then empty the old list

I'm looking for a string in a file, seems to not be working

My function first calculates all possible anagrams of the given word. Then, for each of these anagrams, it checks if they are valid words, but checking if they equal to any of the words in the wordlist.txt file. The file is a giant file with a bunch of words line by line. So I decided to just read each line and check if each anagram is there. However, it comes up blank. Here is my code:
def perm1(lst):
if len(lst) == 0:
return []
elif len(lst) == 1:
return [lst]
else:
l = []
for i in range(len(lst)):
x = lst[i]
xs = lst[:i] + lst[i+1:]
for p in perm1(xs):
l.append([x] + p)
return l
def jumbo_solve(string):
'''jumbo_solve(string) -> list
returns list of valid words that are anagrams of string'''
passer = list(string)
allAnagrams = []
validWords = []
for x in perm1(passer):
allAnagrams.append((''.join(x)))
for x in allAnagrams:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
validWords.append(x)
return(validWords)
print(jumbo_solve("rarom"))
If have put in many print statements to debug, and the passed in list, "allAnagrams", is fully functional. For example, with the input "rarom, one valid anagram is the word "armor", which is contained in the wordlist.txt file. However, when I run it, it does not detect if for some reason. Thanks again, I'm still a little new to Python so all the help is appreciated, thanks!
You missed a tiny but important aspect of:
word in open("C:\\Users\\Chris\\Python\\wordlist.txt")
This will search the file line by line, as if open(...).readlines() was used, and attempt to match the entire line, with '\n' in the end. Really, anything that demands iterating over open(...) works like readlines().
You would need
x+'\n' in open("C:\\Users\\Chris\\Python\\wordlist.txt")
if the file is a list of words on separate lines to make this work to fix what you have, but it's inefficient to do this on every function call. Better to do once:
wordlist = open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n')
this will create a list of words if the file is a '\n' separated word list. Note you can use
`readlines()`
instead of read().split('\n'), but this will keep the \n on every word, like you have, and you would need to include that in your search as I show above. Now you can use the list as a global variable or as a function argument.
if x in wordlist: stuff
Note Graphier raised an important suggestion in the comments. A set:
wordlist = set(open("C:\\Users\\Chris\\Python\\wordlist.txt").read().split('\n'))
Is better suited for a word lookup than a list, since it's O(word length).
You have used the following code in the wrong way:
if x in open("C:\\Users\\Chris\\Python\\wordlist.txt"):
Instead, try the following code, it should solve your problem:
with open("words.txt", "r") as file:
lines = file.read().splitlines()
for line in lines:
# do something here
So, putting all advice together, your code could be as simple as:
from itertools import permutations
def get_valid_words(file_name):
with open(file_name) as f:
return set(line.strip() for line in f)
def jumbo_solve(s, valid_words=None):
"""jumbo_solve(s: str) -> list
returns list of valid words that are anagrams of `s`"""
if valid_words is None:
valid_words = get_valid_words("C:\\Users\\Chris\\Python\\wordlist.txt")
return [word for word in permutations(s) if word in valid_words]
if __name__ == "__main__":
print(jumbo_solve("rarom"))

python - if statement and list index outside of range

I have a csv file with a 1st part made up of 3 entries and 2nd part made up of 2 entries. I want to isolate the data made up of 3 entries.
i wrote the following:
filename=open("datafile.txt","r")
data_3_entries_list=[]
for line in filename:
fields=line.split(",")
if fields[2] == 0:
break
else:
data_3_entries_list.extend(line)
i get the error message:
if fields[2] == 0 :
IndexError: list index out of range
print(data_3_entries_list)
I also tried with if fields[2] is None but i get the same error.
I dont understand why im getting this error?
Use len() or str.count()
for line in filename:
fields=line.split(",")
if len(fields) == 3:
data_3_entries_list.append(line)
for line in filename:
if fields.count(",") == 2:
data_3_entries_list.append(line)
There is no implicit value for non-existent elements of a list; if fields only has 2 items, then fields[2] simply does not exist and will produce an error.
Check the length of the list explicitly:
if len(fields) == 3:
break
data_3_entries_list.append(line) # You may want append here, not extend
If i have a csv file like this one, then this can work out
with open('datafile.csv', 'w') as p:
p.write("Welcome,to,Python")
p.write("Interpreted,Programming,Language")
filename=open("datafile.csv","r")
data_3_entries_list=[]
for line in filename:
fields=line.split(",")
if fields[2] == 0:
break
else:
data_3_entries_list.extend(line)
Otherwise you'd have to atleast show us or me how your csv file is formatted.
You're getting the error because you're trying to get the value of an index that may not exist. There's a better way to do this, though.
Instead of testing the length outright, you can also use a split to see if the index exists. A list that isn't len(a_list) >= 3 will simply return an empty list.
>>> mylist = range(3) #this list is [0, 1, 2]
>>> mylist[2:]
[2]
>>> mylist = range(2) #however THIS list is [0, 1]
>>> mylist[2:]
[]
In this way you can use the pythonic if a_list: do_work() to test if you even want to work with the data. This syntax is generally preferred to if len(a_list) == value: do_work().
filename=open("datafile.txt","r")
data_3_entries_list=[]
for line in filename:
fields=line.split(",")
if fields[2:]: ##use a split to verify that such a split yields a result
data_3_entries_list.extend(line)

while loop inside other while loop, infinite loop

i want to parse a huge file consisting of thousands of blocks each of which contains several sub-blocks.for making it simple consider the the input file containing bellow lines:
a
2
3
4
b
3
9
2
c
7
each on separate lines.where alphabets define each block and numbers are properties of the block,
i want the output as a dictionary with block name as key and list of properties just 2 and 3 (if present) like this:
{a:[2,3],b:[3,2],c:[]}
i think the best way is using two while loops to read and search lines like this:
dict={}
with open('sample_basic.txt','r') as file:
line=file.readline()
line=line.strip()
while line:
if line.isalpha():
block_name=line
line=file.readline()
line=line.strip()
list=[]
while line:
lev_1=line
if lev_1 in ['2','3']:
list.append(lev_1)
line=file.readline()
line=line.strip()
if lev_1.isalpha():
dict[block_name]=list
break
else:
line=file.readline()
line=line.strip()
but it just goes to a infinite loop by execution.
i was looking for error but i cant find where it is.
i appreciate if anyone could give me some hint about it.
I did not check your code too closely, so I can not help you with the infinite loop, but I wrote new code without nested loops:
import collections
d = collections.defaultdict(list)
with open('sample_basic.txt') as f:
for line in f:
line = line.strip()
if line.isalpha():
blockname=line
else:
if line in ('2', '3'):
d[blockname].append(int(line))
The output using a file with the content you write is {'b': [3, 2], 'a': [2, 3]}.
If you want the empty list with the key c included in your dictionary do
d={}
with open('sample_basic.txt') as f:
for line in f:
line = line.strip()
if line.isalpha():
blockname=line
d[blockname] = []
else:
if line in ('2', '3'):
d[blockname].append(int(line))

GoTo(basic) program

I'm doing some tutorials online, and I'm stuck with an exercise :Write a function getBASIC() which takes no arguments, and does the following: it should keep reading lines from input using a while loop; when it reaches the end it should return the whole program in the form of a list of strings. Example of list of strings:
5 GOTO 30
10 GOTO 20
20 GOTO 10
30 GOTO 40
40 END
I wrote a program, but it doesn't work, however I will post it too:
def getBASIC():
L=[]
while "END" not in L:
L.append(str(input()))
if str(input()).endswith("END"):
break
return L
Also I notice you that I'm not allowed to use IS or RECURSION .
There are several errors:
you call input() twice without appending it to the list the second time
'END' in L determines whether there is 'END' (whole) line in the list L (there isn't)
Note: input()already returns a str object; you don't need to call str() on its returned value.
To read input until you've got empty line you could:
def getBASIC():
return list(iter(input, ''))
Or to read until END is encountered at the end of line:
def getBASIC():
L = []
while True:
line = input()
L.append(line)
if line.endswith("END"):
break #NOTE: it doesn't break even if `line` is empty
return L
Back when I was learning Pascal, we used a priming read for loops that needed at least one iteration. This still works well in Python (I prefer it to a while True / break loop).
By simply testing the last line in the list (rather than the last line read) we eliminate the need for a variable to store the input and can combine the reading and appending operations.
def getBASIC():
lines = [input("]")] # use Applesoft BASIC prompt :-)
while not lines[-1].upper().rstrip().endswith("END"):
lines.append(input("]"))
return lines
try this one:
def get_basic():
L = []
while True:
line = str( input() )
L.append( line )
if "END" in line:
break
return L
Use raw_input() instead of input(). input() function takes string from standard input and tries to execute it as a python source coode. raw_input() will return a string as expected.
You use input() 2 time inside a loop. That means you await two string to be input inside one cycle iteration. You don't need last condition (if statement) inside your while loop. It'll end up when "END" is encountered in L.
The next code should do the job:
def getBASIC():
L=[]
while True:
inp = raw_input()
L.append(inp)
if inp.endswith('END'):
break
return L
Your code has the following problems.
"while "END" not in L" will not work if your input is "40 END"
In Python 2.7, "input()" is equivalent to "eval(raw_input()))". So, Python is trying to evaluate the "GOTO" statements.
"if str(input()).endswith("END"):" does not append input to L
So, here is an updated version of your function:
def getBASIC():
L = []
while True:
# Grab input
L.append(str(raw_input()))
# Check if last input ends with "END"
if L[-1].endswith("END"):
break
return L

Categories

Resources