I am reading a file with about 13,000 names on it into a list.
Then, I look at each character of each item on that list and if there is a match I remove that line from the list of 13,000.
If I run it once, it removes about half of the list. On the 11th run it seems to cut it down to 9%. Why is this script missing results? Why does it catch them with successive runs?
Using Python 3.
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
removeline = 0
for line in callsigns:
for character in line:
if character in bad:
removeline = 1
if removeline == 1:
lines.remove(line)
removeline = 0
return callsigns
for x in range (0, 11):
lines = clean(lines, bad_letters)
print (len(lines))
You are changing (i.e., mutating) the lines array while you're looping (i.e. iterating) over it. This is never a good idea because it means that you are changing something while you're reading it, which leads to you skipping over lines and not removing them in the first go.
There are many ways of fixing this. In the below example, we keep track of which lines to remove, and remove them in a separate loop in a way so that the indices do not change.
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
removeline = 0
to_remove = []
for line_i, line in enumerate(callsigns):
for b in bad:
if b in line:
# We're removing this line, take note of it.
to_remove.append(line_i)
break
# Remove the lines in a second step. Reverse it so the indices don't change.
for r in reversed(to_remove):
del callsigns[r]
return callsigns
for x in range (0, 11):
lines = clean(lines, bad_letters)
Save the names you want to keep in a separate list.. Maybe this way:-
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
valid = [i for i in callsigns if not any(j in i for j in bad)]
return valid
valid_names = clean(lines,bad_letters)
print (len(valid_names))
Related
string = "Python, program!"
result = []
for x in string:
if x not in result:
result.append(x)
print(result)
This program makes it so if a repeat letter is used twice in a string, it'll appear only once in the list. In this case, the string "Python, program!" will appear as
['P', 'y', 't', 'h', 'o', 'n', ',', ' ', 'p', 'r', 'g', 'a', 'm', '!']
My question is, how do I make it so the program ignores punctuation such as ". , ; ? ! -", and also white spaces? So the final output would look like this instead:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']
Just check if the string (letter) is alphanumeric using str.isalnum as an additional condition before appending the character to the list:
string = "Python, program!"
result = []
for x in string:
if x.isalnum() and x not in result:
result.append(x)
print(result)
Output:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']
If you don't want numbers in your output, try str.isalpha() instead (returns True if the character is alphabetic).
You can filler them out using the string module. This build in library contains several constants that refer to collections of characters in order, like letters and whitespace.
import string
start = "Python, program!" #Can't name it string since that's the module's name
result = []
for x in start:
if x not in result and (x in string.ascii_letters):
result.append(x)
print(result)
I am writing a simple text comparison tool. It takes two text files - a template and a target - and compares each character in each line using two for-loops. Any differences are highlighted with a Unicode full block symbol (\u2588). In the case that the target line is longer than the template, I am using itertools.zip_longest to fill the non-existant characters with a fill value.
from itertools import zip_longest
def compare(filename1, filename2):
file1 = open(filename1, "r")
file2 = open(filename2, "r")
for line1, line2 in zip_longest(file1, file2):
for char1, char2 in zip_longest(line1, line2, fillvalue=None):
if char1 == char2:
print(char2, end='')
elif char1 == None:
print('\u2588', end='')
compare('template.txt', 'target.txt')
Template file: Target file:
First line First lineXX
Second line Second line
Third line Third line
However, this appears to mess with Python's automatic line break placement. When a line ends with such a fill value, a line break is not generated, giving this result:
First line██Second line
Third line
Instead of:
First line██
Second line
Third line
The issue persisted after rewriting the script to use .append and .join (not shown to keep it short), though it allowed me to highlight the issue:
Result when both files are identical:
['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '\n']
First line
['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line
Result when first line of target file has two more characters:
['F', 'i', 'r', 's', 't', ' ', 'l', 'i', 'n', 'e', '█', '█']
First line██['S', 'e', 'c', 'o', 'n', 'd', ' ', 'l', 'i', 'n', 'e', '\n']
Second line
['T', 'h', 'i', 'r', 'd', ' ', 'l', 'i', 'n', 'e']
Third line
As you can see, Python automatically adds a line break \n if the lines are of identical length, but as soon as zip_longest is involved, the last character in the list is the block, not a line break. Why does this happen?
Strip your lines before comparing characters and print new line between each line:
from itertools import zip_longest
def compare(filename1, filename2):
file1 = open(filename1, "r")
file2 = open(filename2, "r")
for line1, line2 in zip_longest(file1, file2):
line1, line2 = line1.strip(), line2.strip() # <- HERE
for char1, char2 in zip_longest(line1, line2, fillvalue=None):
if char1 == char2:
print(char2, end='')
elif char1 == None:
print('\u2588', end='')
print() # <- HERE
compare('template.txt', 'target.txt')
I have this list which contains letters, and I need to check if a pre-determined word located in another list is horizontally inside this list of letters.
i.e.:
mat_input = [['v', 'e', 'd', 'j', 'n', 'a', 'e', 'o'], ['i', 'p', 'y', 't', 'h', 'o', 'n', 'u'], ['s', 'u', 'e', 'w', 'e', 't', 'a', 'e']]
words_to_search = ['python', 'fox']
I don't need to tell if a word was not found, but if it was I need to tell which one.
My problem is that so far I've tried to compare letter by letter, in a loop similar to this:
for i in range(n): # n = number of words
for j in range(len(word_to_search[i])): # size of the word I'm searching
for k in range(h): # h = height of crossword
for m in range(l): # l = lenght of crossword
But it's not working, inside the last loop I tried several if/else conditions to tell if the whole word was found. How can I solve this?
You can use str.join:
mat_input = [['v', 'e', 'd', 'j', 'n', 'a', 'e', 'o'], ['i', 'p', 'y', 't', 'h', 'o', 'n', 'u'], ['s', 'u', 'e', 'w', 'e', 't', 'a', 'e']]
words_to_search = ['python', 'fox']
joined_input = list(map(''.join, mat_input))
results = {i:any(i in b or i in b[::-1] for b in joined_input) for i in words_to_search}
Output:
{'python': True, 'fox': False}
I'd start by joining each sublist in mat_input into one string:
mat_input_joined = [''.join(x) for x in mat_input]
Then loop over your words to search and simply use the in operator to see if the word is contained in each string:
for word_to_search in words_to_search:
result = [word_to_search in x for x in mat_input_joined]
print('Word:',word_to_search,'found in indices:',[i for i, x in enumerate(result) if x])
Result:
Word: python found in indices: [1]
Word: fox found in indices: []
Im reading txt file and add array row by row. but I need to change every row like this
My list like = [[1strow],[2ndrow],[3rdrow],........,[8000throw]]. ıts like list in list.
My rows : Every row contain 23 letters but I only want to change 2-23 not first one.
e,a,b,c,d,r,y,t,w,s,e,t......s (23th letter , but If you start 0 cause of index, Its 22th)
t,y,e,e,s,f,g,r,t,q,w,e,r,.....s
What I want is
e,a1,b2,c3,d4,r5,y6,t7,w8,s9,e10,t11......s22
t,y1,e2,e3,s4,f5,g6,r7,t8,q9,w10,e11,r12,.....a22
My main code :
with open('C:/Users/xxx/Desktop/input/mushrooms.csv', 'r') as csvfile:
spamreader = csv.reader(csvfile)
for row in spamreader:
datas.append(row)
print(datas[0]) --> ['p', 'x', 's', 'n', 't', 'p', 'f', 'c', 'n', 'k', 'e', 'e', 's', 's', 'w', 'w', 'p', 'w', 'o', 'p', 'k', 's', 'u']
How can I do that with python ?
row = ['e','a','b','c','d','r','y','t','w','s','e','t']
newrow = row[0:1] + [letter + str(num) for num,letter in enumerate(row[1:],1)]
In your specific example,
newdatas = [row[0:1] + [letter + str(num) for num,letter in enumerate(row[1:],1)] for row in datas]
So I have a text file that looks like this:
abcd
efghij
klm
and I need to convert it into a two-dimensional list. And it should look like this:
[['a', 'b', 'c', 'd'],
['e', 'f', 'g', 'h', 'i', 'j'],
['k', 'l', 'm']]
so far I have managed to get this result:
[["abcd"], ["efghij"], ["klm"]]
Can anyone help me figure out what the next step should be?
Here is my code so far:
def readMaze(filename):
with open(filename) as textfile:
global mazeList
mazeList = [line.split() for line in textfile]
print mazeList
str.split() splits on whitespace. str.split('') splits each character separately. (apparently I'm misremembering, str.split('') throws a ValueError for "empty separator")
You'll just build a list from it.
text = """abcd
efghij
klm"""
mazelist = [list(line) for line in text.splitlines()]
# the splitlines call just makes it work since it's a string not a file
print(mazelist)
# [['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h', 'i', 'j'], ['k', 'l', 'm']]
Make a list of each line in the file:
with open('tmp.txt') as f:
z = [list(thing.strip()) for thing in f]
As explained above, you just need to build a list from the strings.
Assuming the string is held in some_text;
lines = some_text.split('\n')
my_list = []
for line in lines:
line_split = list(line)
my_list.append(line_split)
as one-liner;
my_list = map(lambda item: list(item), some_text.split('\n'))
should do the trick.