python and iteration specifically "for line in first_names" - python

If I 'pop' an item off an array in python, I seem to shoot myself in the foot by messing up the array's total length? See the following example:
Also am I just being an idiot or is this normal behaviour? And is there a better way to achieve what I am trying to do?
first_names = []
last_names = []
approved_names = []
blacklisted_names = []
loopcounter = 0
with open("first_names.txt") as file:
first_names = file.readlines()
#first_names = [line.rstrip() for line in first_names]
for line in first_names:
line = line.strip("\r\n")
line = line.strip("\n")
line = line.strip(" ")
if line == "":
first_names.pop(loopcounter)
#first_names.pop(first_names.index("")) # Does not work as expected
#loopcounter -= 1 # Does not work as expected either......
loopcounter += 1
loopcounter = 0
def save_names():
with open("first_names2.txt",'wt',encoding="utf-8") as file:
file.writelines(first_names)
and the resulting files:
first_names.txt
{
Abbey
Abbie
Abbott
Abby
Abe
Abie
Abstinence
Acton
}
And the output file
{
Abbey
Abbie
Abbott
Abe
Abie
Abstinence
Acton
}

list.pop() removes an item from a list and returns the value (see e.g. this ref). For the very basic task of cleaning and writing the list of names, an easy edit would be:
with open("first_names.txt") as file:
first_names = file.readlines()
cleaned_lines = []
for line in first_names:
clean_l = line.strip("\r\n").strip("\n").strip(" ")
if clean_l != "":
cleaned_lines.append(clean_l)
with open("first_names2.txt",'wt',encoding="utf-8") as file:
file.writelines(cleaned_lines)
If you don't want to create a cleaned copy of the list first_names, you could iteratively append single lines to the file as well.
with open("first_names.txt") as file:
first_names = file.readlines()
with open("first_names2.txt",'wt',encoding="utf-8") as file:
for line in first_names:
clean_l = line.strip("\r\n").strip("\n").strip(" ")
if clean_l != "":
file.writelines([clean_l, ])

In general it is not a good idea to mutate a list on which you're iterating, as you stated in your question. If you pop an element from the list you don't necessarily mess up the array's length, but you may encounter unexpected behavior when dealing with which index to pop. In this case you may skip some elements of the array.
A quick solution would be to make a copy of the list and use the built-in enumerate() method as follows:
copy = first_names.copy()
for i, line in enumerate(copy):
line = line.strip("\r\n")
line = line.strip("\n")
line = line.strip(" ")
if line == "":
first_names.remove(i)
More on enumerate() here.

The usual practice would be to filter or create a new list, rather than change the list you are iterating. It's not uncommon to create a new list with the changes you want, and then just assign it back to the original variable name. Here is a list comprehension. Note the if statement that filters out the undesirable blank lines.
first_names = [name.strip() for name in first_names if name.strip()]
https://docs.python.org/3/glossary.html#term-list-comprehension
And you can do the same with iterators using map to apply a function to each item in the list, and filter to remove the blank lines.
first_names_iterator = filter(lambda x: bool(x), map(lambda x: x.strip(), first_names))
first_names = list(first_names_iterator)
https://docs.python.org/3/library/functions.html#map
https://docs.python.org/3/library/functions.html#filter
The last line demonstrates that you could have just passed the iterator to list's constructor to get a list, but iterators are better. You can iterate through them without having to have the whole list at once. If you wanted a list, you should use list comprehension.
The lambda notation is just a fast way to write a function. I could have defined a function with a good name, but that's often overkill for things like map, filter, or a sort key.
Full code:
test_cases = [
'Abbey',
' Abbie ',
'',
'Acton',
]
print(test_cases)
first_names = list(test_cases)
first_names = [name.strip() for name in first_names if name.strip()]
print(first_names)
first_names = list(test_cases)
for name in filter(lambda x: bool(x),
map(lambda x: x.strip(),
first_names)):
print(name)

Related

Return multiple lists in Python

I have this part of a code:
def readTXT():
part_result = []
'''Reading all data from text file'''
with open('dataset/sometext.txt', 'r') as txt:
for lines in txt:
part = lines.split()
part_result = [int(i) for i in part]
#sorted([(p[0], p[14]) for p in part_result], key=lambda x: x[1])
print(part_result)
return part_result
And I'm trying to get all lists as a return, but for now I'll get only the first one, what is quite obvious, because my return is inside the for loop. But still, shouldn't the loop go through every line and return the corresponding list?
After doing research, all I found was return list1, list2 etc. But have should I manage it, if my lists will be generated from a text file line by line?
It frustates me, not being able to return multiple lists at once.
Here's my suggestion. Creating a 'major_array' and adding 'part_result' in that array on each iteration of loop. This way if your loop iterates 10 times, you will then have 10 arrays added in your 'major_array'. And finally the array is returned when the for loop finishes. :)
def readTXT():
#create a new array
major_array = []
part_result = []
'''Reading all data from text file'''
with open('dataset/sometext.txt', 'r') as txt:
for lines in txt:
part = lines.split()
part_result = [int(i) for i in part]
#sorted([(p[0], p[14]) for p in part_result], key=lambda x: x[1])
print(part_result)
major_array.append(part_result)
return major_array
Here is a solution:
def readTXT():
with open('dataset/sometext.txt') as lines:
all_lists = []
for line in lines:
all_lists.append([int(cell) for cell in line.split()])
return all_lists
Note that the return statement is outside of the loop. You get only one list because you return inside the loop.
For a more advanced user, this solution is a shorter and more efficient but at the cost of being a little hard to understand:
def readTXT():
with open('dataset/sometext.txt') as lines:
return [[int(x) for x in line.split()] for line in lines]

reading lines of a file between two words

I have a file containing numbers and 2 words : "start" and "middle"
I want to read numbers from "start" to "middle" in one array and numbers from "middle" to end of the file into another array.
this is my python code:
with open("../MyList","r") as f:
for x in f.readlines():
if x == "start\n":
continue
if x == "middle\n":
break
x = x.split("\n")[0]
list_1.append(int(x))
print list_1
for x in f.readlines():
if x == "middle\n":
continue
list_2.append(int(x))
print list_2
but the problem is that my program never enters second loop and jumps to
print list_2
I searched in older questions but can not figure out the problem.
Its because you are reading the whole at the 1st loop, when it enter 2nd loop, file pointer is already at end of file and you will get an empty list from f.readlines().
You can fix that either by reopen the file or set the file pointer to the beginning of file again with f.seek(0) before the 2nd for loop
with open("../MyList","r") as f:
with open("../MyList","r") as f:
for x in f.readlines():
# process your stuff for 1st loop
# reset file pointer to beginning of file again
f.seek(0)
for x in f.readlines():
# process your stuff for 2nd loop
it will be not so efficient by reading whole file into memory if you are processing large file, you can just iterate over the file object instead of read all into memory like code below
list1 = []
list2 = []
list1_start = False
list2_start = False
with open("../MyList","r") as f:
for x in f:
if x.strip() == 'start':
list1_start = True
continue
elif x.strip() == 'middle':
list2_start = True
list1_start = False
continue
if list1_start:
list1.append(x.strip())
elif list2_start:
list2.append(x.strip())
print(list1)
print(list2)
Your first loop is reading the entire file to the end, but processes only half of it. When the second loop hits, the file pointer is already at the end, so no new lines are read.
From the python docs:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines
thus read. If the optional sizehint argument is present, instead of
reading up to EOF, whole lines totalling approximately sizehint bytes
(possibly after rounding up to an internal buffer size) are read.
Objects implementing a file-like interface may choose to ignore
sizehint if it cannot be implemented, or cannot be implemented
efficiently.
Either process everything in one loop, or read line-by-line (using readline instead of readlines).
You can read the whole file once in a list and later you can slice it.
if possible you can try this:
with open("sample.txt","r") as f:
list_1 = []
list_2 = []
fulllist = []
for x in f.readlines():
x = x.split("\n")[0]
fulllist.append(x)
print fulllist
start_position = fulllist.index('start')
middle_position = fulllist.index('middle')
end_position = fulllist.index('end')
list_1 = fulllist[start_position+1 :middle_position]
list_2 = fulllist[middle_position+1 :end_position]
print "list1 : ",list_1
print "list2 : ",list_2
Discussion
Your problem is that you read the whole file at once, and when you
start the 2nd loop there's nothing to be read...
A possible solution involves reading the file line by line, tracking
the start and middle keywords and updating one of two lists
accordingly.
This imply that your script, during the loop, has to mantain info about
its current state, and for this purpose we are going to use a
variable, code, that's either 0, 1 or 2 meaning no action,
append to list no. 1 or append to list no. 2, Because in the beginning
we don't want to do anything, its initial value must be 0
code = 0
If we want to access one of the two lists using the value of code as
a switch, we could write a test or, in place of a test, we can use a
list of lists, lists, containing a dummy list and two lists that are
updated with valid numbers. Initially all these inner lists are equal
to the empty list []
l1, l2 = [], []
lists = [[], l1, l2]
so that later we can do as follows
lists[code].append(number)
With these premises, it's easy to write the body of the loop on the
file lines,
read a number
if it's not a number, look if it is a keyword
if it is a keyword, change state
in any case, no further processing
if we have to append, append to the correct list
try:
n = int(line)
except ValueError:
if line == 'start\n' : code=1
if line == 'middle\n': code=2
continue
if code: lists[code].append(n)
We have just to add a bit of boilerplate, opening the file and
looping, that's all.
Below you can see my test data, the complete source code with all the
details and a test execution of the script.
Demo
$ cat start_middle.dat
1
2
3
start
5
6
7
middle
9
10
$ cat start_middle.py
l1, l2 = [], []
code, lists = 0, [[], l1, l2]
with open('start_middle.dat') as infile:
for line in infile.readlines():
try:
n = int(line)
except ValueError:
if line == 'start\n' : code=1
if line == 'middle\n': code=2
continue
if code: lists[code].append(n)
print(l1)
print(l2)
$ python start_middle.py
[5, 6, 7]
[9, 10]
$

Cannot write sorted list into new file.

I'm trying to write a sorted list onto a file. I have 1000 integers that I've sorted in ascending order, but cannot manage to write the new list of ascending numbers into my new file 'results'. I am new to programming and any help would be greatly appreciated.
This is my code so far:
def insertion_sort():
f = open("integers.txt", "r")
lines = f.read().splitlines()
print(lines)
print(type(lines[0]))
results = list(map(int, lines))
print(type(results[0]))
results.sort()
print(results)
f=open("integers.txt", "r")
lines = f.read().splitlines()
results = list(map(int,lines))
insertion_sort()
value = results.sort()
file_to_save_to = open("results.txt", "w")
file_to_save_to.write(str(value))
file_to_save_to.close()
Your problem is from this line
value = results.sort()
the sort method of a list doesn't return anything, it modifies the list in place.
Instead you should use sorted
value = sorted(results)
or the list directly as results without storing the return value of it's sort method
results.sort()`
First of all, it would be great if you would change your function to receive a parameter (list of ints), and return sorted list instead of None (as it is right now)
def insertion_sort(T):
return sorted(T)
f=open("integers.txt", "r")
lines = f.read().splitlines()
results = list(map(int,lines))
value = insertion_sort(results)
file_to_save_to = open("results.txt", "w")
file_to_save_to.write(str(value))
file_to_save_to.close()
You should try to use function sorted
value = sorted(results)
And you may not use list(map(int, lines))) you can pass map object to the sorted, so code will be like this:
results = sorted(map(int,lines))
And full example will be like this:
f=open("integers.txt", "r")
lines = f.read().splitlines()
sorted_lines = sorted(map(int,lines))
file_to_save_to = open("results.txt", "w")
for line in sorted_lines:
file_to_save_to.write(str(line))
file_to_save_to.close()
You're calling insertion_sort() which is a function that performs the operation, but doesn't yet return anything. You'll want to return the results by simply adding the return statement at the end of your function:
def insertion_sort():
f = open("integers.txt", "r")
lines = f.read().splitlines()
print(lines)
print(type(lines[0]))
results = list(map(int, lines))
print(type(results[0]))
results.sort()
print(results)
return results # <- Add this line
Then, use the returned list by changing:
insertion_sort() to value = insertion_sort()

appending a list from a read text file python3

I am attempting to read a txt file and create a dictionary from the text. a sample txt file is:
John likes Steak
John likes Soda
John likes Cake
Jane likes Soda
Jane likes Cake
Jim likes Steak
My desired output is a dictionary with the name as the key, and the "likes" as a list of the respective values:
{'John':('Steak', 'Soda', 'Cake'), 'Jane':('Soda', 'Cake'), 'Jim':('Steak')}
I continue to run into the error of adding my stripped word to my list and have tried a few different ways:
pred = ()
prey = ()
spacedLine = inf.readline()
line = spacedLine.rstrip('\n')
while line!= "":
line = line.split()
pred.append = (line[0])
prey.append = (line[2])
spacedLine = inf.readline()
line = spacedLine.rstrip('\n')
and also:
spacedLine = inf.readline()
line = spacedLine.rstrip('\n')
while line!= "":
line = line.split()
if line[0] in chain:
chain[line[0] = [0, line[2]]
else:
chain[line[0]] = line[2]
spacedLine = inf.readline()
line = spacedLine.rstrip('\n')
any ideas?
This will do it (without needing to read the entire file into memory first):
likes = {}
for who, _, what in (line.split()
for line in (line.strip()
for line in open('likes.txt', 'rt'))):
likes.setdefault(who, []).append(what)
print(likes)
Output:
{'Jane': ['Soda', 'Cake'], 'John': ['Steak', 'Soda', 'Cake'], 'Jim': ['Steak']}
Alternatively, to simplify things slightly you could use a temporarycollections.defaultdict:
from collections import defaultdict
likes = defaultdict(list)
for who, _, what in (line.split()
for line in (line.strip()
for line in open('likes.txt', 'rt'))):
likes[who].append(what)
print(dict(likes)) # convert to plain dictionary and print
Your input is a sequence of sequences. Parse the outer sequence first, parse each item next.
Your outer sequence is:
Statement
<empty line>
Statement
<empty line>
...
Assume that f is the open file with the data. Read each statement and return a list of them:
def parseLines(f):
result = []
for line in f: # file objects iterate over text lines
if line: # line is non-empty
result.append(line)
return result
Note that the function above accepts a much wider grammar: it allows arbitrarily many empty lines between non-empty lines, and two non-empty lines in a row. But it does accept any correct input.
Then, your statement is a triple: X likes Y. Parse it by splitting it by whitespace, and checking the structure. The result is a correct pair of (x, y).
def parseStatement(s):
parts = s.split() # by default, it splits by all whitespace
assert len(parts) == 3, "Syntax error: %r is not three words" % s
x, likes, y = parts # unpack the list of 3 items into varaibles
assert likes == "likes", "Syntax error: %r instead of 'likes'" % likes
return x, y
Make a list of pairs for each statement:
pairs = [parseStatement(s) for s in parseLines(f)]
Now you need to group values by key. Let's use defaultdict which supplies a default value for any new key:
from collections import defaultdict
the_answer = defaultdict(list) # the default value is an empty list
for key, value in pairs:
the_answer[key].append(value)
# we can append because the_answer[key] is set to an empty list on first access
So here the_answer is what you need, only it uses lists as dict values instead of tuples. This must be enough for you to understand your homework.
dic={}
for i in f.readlines():
if i:
if i.split()[0] in dic.keys():
dic[i.split()[0]].append(i.split()[2])
else:
dic[i.split()[0]]=[i.split()[2]]
print dic
This should do it.
Here we iterater through f.readlines f being the file object,and on each line we fill up the dictionary by using first part of split as key and last part of split as value

String Object Not Callable When Using Tuples and Ints

I am utterly flustered. I've created a list of tuples from a text file and done all of the conversions to ints:
for line in f:
if firstLine is True: #first line of file is the total knapsack size and # of items.
knapsackSize, nItems = line.split()
firstLine = False
else:
itemSize, itemValue = line.split()
items.append((int(itemSize), int(itemValue)))
print items
knapsackSize, nItems = int(knapsackSize), int(nItems) #convert strings to ints
I have functions that access the tuples for more readable code:
def itemSize(item): return item[0]
def itemValue(item): return item[1]
Yet when I call these functions, i.e.,:
elif itemSize(items[nItems-1]) > sizeLimit
I get an inexplicable "'str' object is not callable" error, referencing the foregoing line of code. I have type checked everything that should be a tuple or an int using instsanceof, and it all checks out. What gives?
Because at this point:
itemSize, itemValue = line.split()
itemSize is still a string - you've appended to items the int converted values...
I would also change your logic slightly for handling first line:
with open('file') as fin:
knapsackSize, nItems = next(fin).split() # take first line
for other_lines in fin: # everything after
pass # do stuff for rest of file
Or just change the whole lot (assuming it's a 2column file of ints)
with open('file') as fin:
lines = (map(int, line.split()) for line in fin)
knapsackSize, nItems = next(lines)
items = list(lines)
And possibly instead of your functions to return items - use a dict or a namedtuple...
Or if you want to stay with functions, then go to the operator module and use:
itemSize = operator.itemgetter(0)

Categories

Resources