How can I shorten this loop in python? - python

I'm working on a "word generator" But I have a problem.
Sometimes it creates unwanted words.
I would like to remove them but not to heavy the loop.
I have to do like the example in the code with the whole alphabet.
There would be nothing wrong with it that much, but there are a few other problems that I would have dealt with if only I knew how to cut it short.
# alph = "abcdefghijklmnopqrstuvwyzxą"
my_var = ["abc", "aabc", "cbd", "ccbd", "qwe", "qqwe"]
my_var2 = []
def removeDup():
for x in my_var:
if x.find("aa") == -1 and x.find("cc") == -1 and x.find("qq") == -1:
my_var2.append(x)
print(my_var2)
removeDup()
My idea is dynamic variables, but I can't make one loop in the other without creating chaos
I tried something like the one in the picture, but I can only take out words with repeated letters

There's no need for dynamic variables. Just make a list of all the duplicate characters.
dups = [z*2 for z in alph]
for x in open('xxx.txt', encoding='utf-8'):
if not any(dup in x for dup in dups):
print(x.strip())

Related

Runtime Error (Python3) when you manipulate lists with very long strings

I wrote a Python3 code to manipulate lists of strings but the code gives Runtime Error for long strings. Here is my code for the problem:
string = "BANANA"
slist= list (string)
mark = list(range(len(slist)))
vowel_substrings = list()
consonants_substrings = list()
#print(mark)
for i in range(len(slist)):
if slist[i]=='A' or slist[i]=='E' or slist[i]=='I' or slist[i]=='O' or mark[i]=='U':
mark[i] = 1
else:
mark[i] = 0
#print(mark)
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings.append(string[j:l+1])
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings.append(string[j:l+1])
#print(consonants_substrings)
unique_consonants = list(set(consonants_substrings))
unique_vowels = list(set(vowel_substrings))
##add two lists
all_substrings = consonants_substrings+(vowel_substrings)
#print(all_substrings)
##Find points earned by vowel guy and consonant guy
vowel_guy_score = 0
consonant_guy_score = 0
for strng in unique_vowels:
vowel_guy_score += vowel_substrings.count(strng)
for strng in unique_consonants:
consonant_guy_score += consonants_substrings.count(strng)
#print(vowel_guy_score) #Kevin
#print(consonant_guy_score) #Stuart
if vowel_guy_score > consonant_guy_score:
print("Kevin ",vowel_guy_score)
elif vowel_guy_score < consonant_guy_score:
print("Stuart ",consonant_guy_score)
else:
print("Draw")
gives the right answer. But if you have a long string, shown below, it fails.
NANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANANNANAN
I think initialization or memory allocation might be a problem but I don't know how to allocate memory before even knowing how much memory the code will need. Thank you in advance for any help you can provide.
In the middle there, you generate a data structure of size O(n³): for each starting position × each ending position × length of the substring. That's probably where your memory problems appear (you haven't posted a traceback).
One possible optimisation would be, instead of having a list of substrings and then generating the set, use instead a Counter class. That would let you know how many times each substring appears without storing all the copies:
vowel_substrings = collections.Counter()
consonant_substrings = collections.Counter()
for j in range(len(slist)):
if mark[j] == 1:
for l in range(j,len(string)):
vowel_substrings[string[j:l+1]] += 1
#print(string[j:l+1])
else:
for l in range(j,len(string)):
consonants_substrings[string[j:l+1]] += 1
Even better would be to calculate the scores as you go along, without storing any of the substrings. If I'm reading the code correctly, the substrings aren't actually used for anything — each letter is effectively scored based on its distance from the end of the string, and the scores are added up. This can be calculated in a single pass through the string, without making any additional copies or keeping track of anything other than the cumulative scores and the length of the string.

Amateur Text Editor Malfunctioning

My brother and I are creating a simple text editor that changes entries to pig latin using Python. Code below:
our_word = ("cat")
vowels = ("a","e","i","o","u")
#remember I have to compare variables not strings
way = "way"
for i in range(len(our_word)):
for j in range (len(vowels)):
#checking if there is any vowel present
if our_word[i] == vowels[j]:
# if there were to be any vowels our_word[i] wil now be changed with way
#.replace is our function the dot is what notates this in the python library
our_word = our_word.replace(our_word[i], way)
print(our_word)
Right now we're testing the word 'cat' but the program when run returns the following:
/Users/x/PycharmProjects/pythonProject3/venv/bin/python /Users/x/PycharmProjects/pythonProject3/main.py
cwwayyt
Process finished with exit code 0
We're not sure why there is a double 'w' and a double 'y'. It seems the word 'cat' is edited once to 'cwayt' and then a second time to 'cwwayyt'.
Any suggestions are welcome!
The problem arises from the fact that on the next iteration of for loop after doing the substitution, you are looking at the next position, which is part of the way that you just substituted into place. Instead, you need to skip past this. You would also experience another problem, that it only loops up to the original length, rather than the new increased length. You are probably better in this situation to use a while loop with an index variable that you can manipulate to point to the correct place as needed. For example:
our_word = "cat"
vowels = "aeiou"
way = "way"
i = 0
while i < len(our_word):
if our_word[i] in vowels:
our_word = our_word[:i] + way + our_word[i + 1:]
i += len(way) # <=== if you made a substitution, skip over the bit
# that you just substituted in place
else:
i += 1 # <=== if you didn't make any substitution
# just go to the next position next time
print(our_word)

How can I keep track of what combinations have been tried in a brute force approach?

I'm using Python 3 to create a brute-force Vigenere decipher-er. Vigenere codes are basically adding strings of letters together.
The way I want my code to work is the user puts in however any keys they want (this bit's done), the letters are turned into their numbers (also done) then it adds every pair of keys together (working on this, also what I need help with) and prints out the two keys and what they added to.
To do this, I need to be able to keep track of which pairs of keys have been added together. How can I do this?
BTW, my current code is this. I'm doing this both fro the decoding and the programming practice, so I really just want the way to keep track of added key pairs, not the whole program.
#defines start variables
import math
alph = "abcdefghijklmnopqrstuvwxyz"
keyqty = int(input("how many keys?"))
listofkeys = []
listofindex = []
timer = 0
#gets keys
while True:
if timer >= keyqty:
break
else:
pass
listofkeys.append(input("key: ").lower())
timer += 1
tempkey = ""
#blank before key
for item in listofkeys:
listofindex.append("")
for letter in item:
listofindex.append(alph.find(letter)
timer = 0
newkey = False
key1index = []
key2index = []
endex = []
printletter = ""
doneadds = []
Obviously, it still needs some other work, but some help would be appreciated.
You can either use a set for fast lookup (amortized constant time).
tried = set()
for ...
if word not in tried:
try()
tried.add(word)
or use itertools.product() to generate your trials without the need of keeping track of the already tried ones.
for password in itertools.product(alph, repeat=keyqty):
try(password)

AIO Castle Cavalry - My code is too slow, is there a way I can shorten this?

So I am currently preparing for a competition (Australian Informatics Olympiad) and in the training hub, there is a problem in AIO 2018 intermediate called Castle Cavalry. I finished it:
input = open("cavalryin.txt").read()
output = open("cavalryout.txt", "w")
squad = input.split()
total = squad[0]
squad.remove(squad[0])
squad_sizes = squad.copy()
squad_sizes = list(set(squad))
yn = []
for i in range(len(squad_sizes)):
n = squad.count(squad_sizes[i])
if int(squad_sizes[i]) == 1 and int(n) == int(total):
yn.append(1)
elif int(n) == int(squad_sizes[i]):
yn.append(1)
elif int(n) != int(squad_sizes[i]):
yn.append(2)
ynn = list(set(yn))
if len(ynn) == 1 and int(ynn[0]) == 1:
output.write("YES")
else:
output.write("NO")
output.close()
I submitted this code and I didn't pass because it was too slow, at 1.952secs. The time limit is 1.000 secs. I wasn't sure how I would shorten this, as to me it looks fine. PLEASE keep in mind I am still learning, and I am only an amateur. I started coding only this year, so if the answer is quite obvious, sorry for wasting your time 😅.
Thank you for helping me out!
One performance issue is calling int() over and over on the same entity, or on things that are already int:
if int(squad_sizes[i]) == 1 and int(n) == int(total):
elif int(n) == int(squad_sizes[i]):
elif int(n) != int(squad_sizes[i]):
if len(ynn) == 1 and int(ynn[0]) == 1:
But the real problem is your code doesn't work. And making it faster won't change that. Consider the input:
4
2
2
2
2
Your code will output "NO" (with missing newline) despite it being a valid configuration. This is due to your collapsing the squad sizes using set() early in your code. You've thrown away vital information and are only really testing a subset of the data. For comparison, here's my complete rewrite that I believe handles the input correctly:
with open("cavalryin.txt") as input_file:
string = input_file.read()
total, *squad_sizes = map(int, string.split())
success = True
while squad_sizes:
squad_size = squad_sizes.pop()
for _ in range(1, squad_size):
try:
squad_sizes.remove(squad_size) # eliminate n - 1 others like me
except ValueError:
success = False
break
else: # no break
continue
break
with open("cavalryout.txt", "w") as output_file:
print("YES" if success else "NO", file=output_file)
Note that I convert all the input to int early on so I don't have to consider that issue again. I don't know whether this will meet AIO's timing constraints.
I can see some things in there that might be inefficient, but the best way to optimize code is to profile it: run it with a profiler and sample data.
You can easily waste time trying to speed up parts that don't need it without having much effect. Read up on the cProfile module in the standard library to see how to do this and interpret the output. A profiling tutorial is probably too long to reproduce here.
My suggestions, without profiling,
squad.remove(squad[0])
Removing the start of a big list is slow, because the rest of the list has to be copied as it is shifted down. (Removing the end of the list is faster, because lists are typically backed by arrays that are overallocated (more slots than elements) anyway, to make .append()s fast, so it only has to decrease the length and can keep the same array.
It would be better to set this to a dummy value and remove it when you convert it to a set (sets are backed by hash tables, so removals are fast), e.g.
dummy = object()
squad[0] = dummy # len() didn't change. No shifting required.
...
squad_sizes = set(squad)
squad_sizes.remove(dummy) # Fast lookup by hash code.
Since we know these will all be strings, you can just use None instead of a dummy object, but the above technique works even when your list might contain Nones.
squad_sizes = squad.copy()
This line isn't required; it's just doing extra work. The set() already makes a shallow copy.
n = squad.count(squad_sizes[i])
This line might be the real bottleneck. It's effectively a loop inside a loop, so it basically has to scan the whole list for each outer loop. Consider using collections.Counter for this task instead. You generate the count table once outside the loop, and then just look up the numbers for each string.
You can also avoid generating the set altogether if you do this. Just use the Counter object's keys for your set.
Another point unrelated to performance. It's unpythonic to use indexes like [i] when you don't need them. A for loop can get elements from an iterable and assign them to variables in one step:
from collections import Counter
...
count_table = Counter(squad)
for squad_size, n in count_table.items():
...
You can collect all occurences of the preferred number for each knight in a dictionary.
Then test if the number of knights with a given preferred number is divisible by that number.
with open('cavalryin.txt', 'r') as f:
lines = f.readlines()
# convert to int
list_int = [int(a) for a in lines]
#initialise counting dictionary: key: preferred number, item: empty list to collect all knights with preferred number.
collect_dict = {a:[] for a in range(1,1+max(list_int[1:]))}
print(collect_dict)
# loop though list, ignoring first entry.
for a in list_int[1:]:
collect_dict[a].append(a)
# initialise output
out='YES'
for key, item in collect_dict.items():
# check number of items with preference for number is divisilbe
# by that number
if item: # if list has entries:
if (len(item) % key) > 0:
out='NO'
break
with open('cavalryout.txt', 'w') as f:
f.write(out)

How to match fields from two lists and further filter based upon the values in subsequent fields?

EDIT: My question was answered on reddit. Here is the link if anyone is interested in the answer to this problem https://www.reddit.com/r/learnpython/comments/42ibhg/how_to_match_fields_from_two_lists_and_further/
I am attempting to get the pos and alt strings from file1 to match up with what is in
file2, fairly simple. However, file2 has values in the 17th split element/column to the
last element/column (340th) which contains string such as 1/1:1.2.2:51:12 which
I also want to filter for.
I want to extract the rows from file2 that contain/match the pos and alt from file1.
Thereafter, I want to further filter the matched results that only contain certain
values in the 17th split element/column onwards. But to do so the values would have to
be split by ":" so I can filter for split[0] = "1/1" and split[2] > 50. The problem is
I have no idea how to do this.
I imagine I will have to iterate over these and split but I am not sure how to do this
as the code is presently in a loop and the values I want to filter are in columns not rows.
Any advice would be greatly appreciated, I have sat with this problem since Friday and
have yet to find a solution.
import os,itertools,re
file1 = open("file1.txt","r")
file2 = open("file2.txt","r")
matched = []
for (x),(y) in itertools.product(file2,file1):
if not x.startswith("#"):
cells_y = y.split("\t")
pos_y = cells[0]
alt_y = cells[3]
cells_x = x.split("\t")
pos_x = cells_x[0]+":"+cells_x[1]
alt_x = cells_x[4]
if pos_y in pos_x and alt_y in alt_x:
matched.append(x)
for z in matched:
cells_z = z.split("\t")
if cells_z[16:len(cells_z)]:
Your requirement is not clear, but you might mean this:
for (x),(y) in itertools.product(file2,file1):
if x.startswith("#"):
continue
cells_y = y.split("\t")
pos_y = cells[0]
alt_y = cells[3]
cells_x = x.split("\t")
pos_x = cells_x[0]+":"+cells_x[1]
alt_x = cells_x[4]
if pos_y != pos_x: continue
if alt_y != alt_x: continue
extra_match = False
for f in range(17, 341):
y_extra = y[f].split(':')
if y_extra[0] != '1/1': continue
if y_extra[2] <= 50: continue
extra_match = True
break
if not extra_match: continue
xy = x + y
matched.append(xy)
I chose to concatenate x and y into the matched array, since I wasn't sure whether or not you would want all the data. If not, feel free to go back to just appending x or y.
You may want to look into the csv library, which can use tab as a delimiter. You can also use a generator and/or guards to make the code a bit more pythonic and efficient. I think your approach with indexes works pretty well, but it would be easy to break when trying to modify down the road, or to update if your file lines change shape. You may wish to create objects (I use NamedTuples in the last part) to represent your lines and make it much easier to read/refine down the road.
Lastly, remember that Python has a shortcut feature with the comparative 'if'
for example:
if x_evaluation and y_evaluation:
do some stuff
when x_evaluation returns False, Python will skip y_evaluation entirely. In your code, cells_x[0]+":"+cells_x[1] is evaluated every single time you iterate the loop. Instead of storing this value, I wait until the easier alt comparison evaluates to True before doing this (comparatively) heavier/uglier check.
import csv
def filter_matching_alt_and_pos(first_file, second_file):
for x in csv.reader(open(first_file, 'rb'), delimiter='\t'):
for y in csv.reader(open(second_file, 'rb'), delimiter='\t'):
# continue will skip the rest of this loop and go to the next value for y
# this way, we can abort as soon as one value isn't what we want
# .. todo:: we could make a filter function and even use the filter() built-in depending on needs!
if x[3] == y[4] and x[0] == ":".join(y[:1]):
yield x
def match_datestamp_and_alt_and_pos(first_file, second_file):
for z in filter_matching_alt_and_pos(first_file, second_file):
for element in z[16:]:
# I am not sure I fully understood your filter needs for the 2nd half. Here, I split all elements from the 17th onward and look for the two cases you mentioned. This seems like it might be very heavy, but at least we're using generators!
# same idea as before, we abort as early as possible to avoid needless indexing and checks
for chunk in element.split(":"):
# WARNING: if you aren't 100% sure the 2nd element is an int, this is very dangerous
# here, I use the continue keyword and the negative-check to help eliminate excess overhead. The execution is very similar as above, but might be easier to read/understand and can help speed things along in some cases
# once again, I do the lighter check before the heavier one
if not int(chunk[2])> 50:
# continue automatically skips to the next iteration on element
continue
if not chunk[:1] == "1/1":
continue
yield z
if __name__ == '__main__':
first_file = "first.txt"
second_file = "second.txt"
# match_datestamp_and_alt_and_pos returns a generator; for loop through it for the lines which matched all 4 cases
match_datestamp_and_alt_and_pos(first_file=first_file, second_file=second_file)
namedtuples for the first part
from collections import namedtuple
FirstFileElement = namedtuple("FirstFrameElement", "pos unused1 unused2 alt")
SecondFileElement = namedtuple("SecondFrameElement", "pos1 pos2 unused2 unused3 alt")
def filter_matching_alt_and_pos(first_file, second_file):
for x in csv.reader(open(first_file, 'rb'), delimiter='\t'):
for y in csv.reader(open(second_file, 'rb'), delimiter='\t'):
# continue will skip the rest of this loop and go to the next value for y
# this way, we can abort as soon as one value isn't what we want
# .. todo:: we could make a filter function and even use the filter() built-in depending on needs!
x_element = FirstFileElement(*x)
y_element = SecondFileElement(*y)
if x.alt == y.alt and x.pos == ":".join([y.pos1, y.pos2]):
yield x

Categories

Resources