I have attempted to write a program which asks the user for a string and a number (On the same line) and then prints all possible combinations of the string up to the size of the number. The output format should be: All capitals, Each combination on each line, Length of combination(Shortest First) and in alphabetical.
My code outputs the right combinations in the right order but it places an empty before the outputs and I'm not sure why.
from itertools import combinations
allcombo = []
S = input().strip()
inputlist = S.split()
k = int(inputlist[1])
S = inputlist[0]
#
for L in range(0, k+1):
allcombo = []
for pos in combinations(S, L):
pos = sorted(pos)
pos = str(pos).translate({ord(c): None for c in "[]()', "})
allcombo.append(pos)
allcombo = sorted(allcombo)
print(*allcombo, sep = '\n')
Input:
HACK 2
Output:
(Empty Line)
A
C
H
K
AC
AH
AK
CH
CK
HK
Also I've only been coding for about a week so if anyone would like to show me how to write this properly, I'd be very pleased.
Observe the line:
for L in range(0, k+1) # Notice that L is starting at 0.
Now, observe this line:
for pos in combinations(S, L)
So, we will have the following during our first iteration of the inner for loop:
for pos in combinations(S, 0) # This is an empty collection during your first loop.
Basically no work is being performed inside your loop because there is nothing to iterate over, and you will just being printing an empty string.
Change the following code:
for L in range(0, k+1)
to this:
for L in range(1, k+1) # Skips the empty collection since L starts at 1.
and this will fix your problem.
Related
I am trying to create a dataframe data which consists of two columns which are 'word' and 'misspelling'. I have 5 parts in which I attempt to achieve it which are 1 function, 3 dataframes, and 1 loop.
A function which generate misspellings (got this from Peter Norvig):
def generate(word):
letters = 'abcdefghijklmnopqrstuvwxyz'
splits = [(word[:i], word[i:]) for i in range(len(word) +1)]
deletes = [L + R[1:] for L, R in splits if R]
transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
replaces = [L + c + R[1:] for L, R in splits if R for c in letters]
inserts = [L + c + R for L, R in splits for c in letters]
return set(deletes + transposes + replaces + inserts)
A dataframe with words to generate the misspelling:
wl = ['a', 'is', 'the']
word_list = pd.DataFrame(wl, columns = ['word'])
An empty dataframe meant to be filled up in the loop:
data = pd.DataFrame(columns = ['word', 'misspelling'])
An empty dataframe meant to temporarily hold the values from the function 'generate' in the loop:
temp_list = pd.DataFrame(columns = ['misspelling'])
A loop that will fill up the dataframe data:
y = 0
for a in range(len(word_list)):
temp_list['misspelling'] = pd.DataFrame(generate(word_list.at[a,'word']))
data = pd.concat([data,temp_list], ignore_index = True)
print(len(temp_list)) #to check the length of 'temp_list' in each loop
for x in range(len(temp_list)):
data.at[y,'word'] = word_list.at[a,'word']
y = y + 1
y = data.index[-1] + 1temp_list.drop(columns = ['misspelling'])
What I expected when I check data outside of the loop is for it to have a total of 390 rows which is the total of len(generate('is')) + len(generate('a')) + len(generate('the')).
The total of rows in data turned out to be 234 which is way less. When I went around to check which variable was not tallying up, it turned out to be len(temp_list) which I expect it to update every loop since new values are replacing it.
len(temp_list) remains the same which is causing temp_list['misspelling'] = pd.DataFrame(generate(word_list.at[a,'word'])) to only have the maximum length of len(generate('a')) (in which 'a' is the first value in word_list) although the generated misspellings in temp_list was different each loop.
I thought adding temp_list.drop(columns = ['misspelling']) at the end of the outer loop would reset temp_list but it doesn't seem like it resetted len(temp_list).
temp_list.drop() with inplace=False (which is the default) does not modify the existing dataframe, but returns a new one. However, even if you fix that, it still won’t work, because you would also need to drop the index, and I’m not sure that’s even possible.
I don’t quite understand what you are trying to do (for example, the for x in ... loop never uses x) but I suspect you might be better off using plain Python lists instead of dataframes.
For example, given:
On_A_Line = [2,2,3]
Lengths_Of_Lines = [5,2,4,3,2,3,2]
Characters = ['a','t','i','e','u','w','x']
I want it to print:
aaaaatt
iiiieee
uuwwwxx
So far I have tried:
iteration = 0
for number in Lengths_Of_Lines:
s = Lengths_Of_Lines[iteration]*Characters[iteration]
print(s, end = "")
iteration += 1
which prints what I want without the line spacing:
aaaaattiiiieeeuuwwwxx
I just don't have the python knowledge to know what to do from there.
Solution using a generator and itertools:
import itertools
def repeat_across_lines(chars, repetitions, per_line):
gen = ( c * r for c, r in zip(chars, repetitions) )
return '\n'.join(
''.join(itertools.islice(gen, n))
for n in per_line
)
Example:
>>> repeat_across_lines(Characters, Lengths_Of_Lines, On_A_Line)
'aaaaatt\niiiieee\nuuwwwxx'
>>> print(_)
aaaaatt
iiiieee
uuwwwxx
The generator gen yields each character repeated the appropriate number of times. These are joined together n at a time with itertools.islice, where n comes from per_line. Those results are then joined with newline characters. Because gen is a generator, the next call to islice yields the next n of them that haven't been consumed yet, rather than the first n.
You need to loop over the On_A_Line list. This tells you have many iterations of the inner loop to perform before printing a newline.
iteration = 0
for count in On_A_Line:
for _ in range(count):
s = Lengths_Of_Lines[iteration]*Characters[iteration]
print(s, end = "")
iteration += 1
print("") # Print newline
I am new to coding so I apologize in advance if what I am asking is simple or doesn't make much sense but I will try to elaborate as much as I can. First of all this is not for any work or project I am simply studying to learn a bit of coding for my satisfaction. I've been trying to find some real life problems to apply into coding (pseudo code mostly but python language would also be kind of understandable to me).
I wanted to be able to have a list of x elements and compare 4 of them sequentially.
For example, myList = [a, b, c, d, e, f, g, h, i, j, k, l]
First I want to compare a,b,c and d.
If b>a, c>b, d>c and d> all of 3 previous ones (d>a, d>b, d>c) I want to do something otherwise go to next comparison.
Then I wanted to compare b,c,d and e.
Similarly if c>b, d>c, e>d and e> all of 3 previous ones (e>b, e>c, e>d) I want to do something otherwise go to next comparison.
What if my list contains infinite elements? myList = [:]
Where do I start? Do I have to have a starting point?
I am guessing I have to use a for loop to iterate through the list but I honestly can't figure out how to iterate through the first 4 elements and then continue from the second element in 4 element batches.
Since I am currently studying the Arrays and lists maybe there is some functionality I am missing? Or I simply my brain can grasp it.
I tried looking at other posts in stackoverflow but honestly I can't figure it out from other people's answers. I would appreciate any help or guidance.
Thanks in advance.
You can use the built-in all() function for this problem:
myList = [5, 4, 3, 6, 3, 5, 6, 2, 3, 10, 11, 3]
def do_somthing():
#your code here
pass
for i in range(len(myList)-4):
new_list = myList[i:i+4] #here, using list slicing to jump ahead four elements.
if all(new_list[-1] > b for b in new_list[:-1]) and all(new_list[:-1][c] > new_list[:-1][c+1] for c in range(len(new_list)-2)):
do_something()
L = [...]
# get all the valid indices of the elements in the list, except for the last 4. These are the indices at which the 4-element windows start
for i in range(len(L)-4):
window = L[i:i+4] # the 4 elements you want to compare
print("I am considering the elements starting at index", i, ". They are:", window)
a,b,c,d = window
if d>a>b>c<d and d>b:
print("The checks pass!")
Now, there is a simpler way to do this:
for a,b,c,d in (L[i:i+4] for i in range(len(L)-4):
if d>a>b>c<d and d>b:
print("The checks pass!")
to consume just one item at a time from an iterator and operate on 4 lagged elements try a circle buffer:
# make a generator as example of 'infinte list'
import string
agen = (e for e in string.ascii_lowercase)
# initialize len 4 circle buffer
cb = [next(agen) for _ in range(4)] # assumes there are at least 4 items
ptr = 0 # initialize circle buffer pointer
while True:
a,b,c,d = (cb[(i+ptr)%4] for i in range(4)) # get current 4 saved items
# some fuction here
print(a,b,c,d)
# get next item from generator, catch StopIteration on empty
try:
cb[ptr] = next(agen)
except StopIteration:
break
ptr = (ptr + 1)%4 # update circle buffer pointer
a b c d
b c d e
c d e f
d e f g
e f g h
f g h i
g h i j
h i j k
i j k l
j k l m
k l m n
l m n o
m n o p
n o p q
o p q r
p q r s
q r s t
r s t u
s t u v
t u v w
u v w x
v w x y
w x y z
'some function' could include a stopping condition too:
# random.choice() as example of 'infinte iterator'
import string
import random
random.choice(string.ascii_lowercase)
# initialize len 4 circle buffer
cb = [random.choice(string.ascii_lowercase) for _ in range(4)] # assumes there are at least 4 items
ptr = 0 # initialize circile buffer pointer
while True:
a,b,c,d = (cb[(i+ptr)%4] for i in range(4)) # get current 4 saved items
# some fuction here
print(a,b,c,d)
if a<b<c<d: # stopping condition
print("found ordered string: ", a,b,c,d)
break
# get next item from generator, catch StopIteration on empty
try:
cb[ptr] = random.choice(string.ascii_lowercase)
except StopIteration:
break
ptr = (ptr + 1)%4 # update circle buffer pointer
o s w q
s w q k
w q k j
q k j r
k j r q
j r q r
r q r u
q r u v
found ordered string: q r u v
Since you can index a list, how about start from index 0, compare the 0th, (0+1)th, (0+2)th, and (0+3)th elements. Then, by the next round, increase your index to 1, and compare the 1st, (1+1)th, (1+2)th, and (1+3)th elements, and so on. For the nth round, you compare the n, n+1, n+2, and (n+3)th elements, until you reach the 4th element before the end. This is how you generally do stuff like 'testing m elements each time from a sequence of length n', and you can easily expand this pattern to matrices or 3d arrays. The code you see in other answers are basically all doing this, and certain features in Python make this job very easy.
Now, 'what if the list contains infinite elements'? Well, then you'll need a generator, which is a bit advanced at this stage I assume, but the concept is very simple: you let a function read that infinite stream of elements in a (might be infinite) loop, set a cursor on one of them, return (yield) the element under the cursor as well as the 3 elements following it each time, and increase the cursor by one before the next loop starts:
def unsc_infinity(somelist):
cur = 0
while True:
yield somelist[c:c+4]
cur = cur + 1
infinity_reader = unsc_infinity(endless_stream)
next(infinity_reader)
# gives the 0, 1, 2, 3 th elements in endless_stream
next(infinity_reader)
# gives the 1, 2, 3, 4 th elements in endless_stream
next(infinity_reader)
# ...
And you can loop over that generator too:
for a, b, c, d in unsc_infinity(endless_stream):
if d>a>b>c<d and d>b:
do_something()
Hope that helps a bit for you to build a mental model about how this kind of problems are done.
My function looks like this:
def accum(s):
a = []
for i in s:
b = s.index(i)
a.append(i * (b+1))
x = "-".join(a)
return x.title()
with the expected input of:
'abcd'
the output should be and is:
'A-Bb-Ccc-Dddd'
but if the input has a recurring character:
'abccba'
it returns:
'A-Bb-Ccc-Ccc-Bb-A'
instead of:
'A-Bb-Ccc-Cccc-Bbbbb-Aaaaaa'
how can I fix this?
Don't use str.index(), it'll return the first match. Since c and b and a appear early in the string you get 2, 1 and 0 back regardless of the position of the current letter.
Use the enumerate() function to give you position counter instead:
for i, letter in enumerate(s, 1):
a.append(i * letter)
The second argument is the starting value; setting this to 1 means you can avoid having to + 1 later on. See What does enumerate mean? if you need more details on what enumerate() does.
You can use a list comprehension here rather than use list.append() calls:
def accum(s):
a = [i * letter for i, letter in enumerate(s, 1)]
x = "-".join(a)
return x.title()
which could, at a pinch, be turned into a one-liner:
def accum(s):
a = '-'.join([i * c for i, c in enumerate(s, 1)]).title()
This is because s.index(a) returns the first index of the character. You can use enumerate to pair elements to their indices:
Here is a Pythonic solution:
def accum(s):
return "-".join(c*(i+1) for i, c in enumerate(s)).title()
simple:
def accum(s):
a = []
for i in range(len(s)):
a.append(s[i]*(i+1))
x = "-".join(a)
return x.title()
I'm not a python expert, and I ran into this snippet of code which actually works and produces the correct answer, but I'm not sure I understand what happens in the second line:
for i in range(len(motifs[0])):
best = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
I was trying to replace it with something like:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
profile.append([(best.count(base)+1)/float(len(best)) for base in 'ACGT'])
and also tried to break down the last line like this:
for i in range(len(motifs[0])):
for j in range(len(motifs)):
best =[motifs[j][i]]
for base in 'ACGT':
profile.append(best.count(base)+1)/float(len(best)
I tried some more variations but non of them worked.
My question is: What are those expressions (second and third line of first code) mean and how would you break it down to a few lines?
Thanks :)
''.join([motifs[j][i] for j in range(len(motifs))])
is idiomatically written
''.join(m[i] for m in motifs)
so it concatenates the i'th entry of all motifs, in order. Similarly,
[(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT']
builds a list of (best.count(bseq)+1)/float(len(seq)) values for of ACGT; since the base variable doesn't actually occur, it's a list containing the same value four times and can be simplified to
[(best.count(bseq)+1) / float(len(seq))] * 4
for i in range(len(motifs[0])):
seq = ''.join([motifs[j][i] for j in range(len(motifs))])
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
is equivalent to:
for i in range(len(motifs[0])):
seq = ''
for j in range(len(motifs)):
seq += motifs[j][i]
profile.append([(best.count(bseq)+1)/float(len(seq)) for base in 'ACGT'])
which can be improved in countless ways.
For example:
seqs = [ ''.join(motif) for motif in motifs ]
bc = best.count(bseq)+1
profilte.extend([ map(lambda x: bc / float(len(x)),
seq) for base in 'ACGT' ] for seq in seqs)
correctness of which, I cannot test due to lack of input/output conditions.
Closest I got without being able to test it
for i, _ in enumerate(motifs[0]):
seq = ""
for m in motifs:
seq += m[i]
tmp = []
for base in "ACGT":
tmp.append(best.count(bseq) + 1 / float(len(seq)))
profile.append(tmp)