Stuck: My own vesion for generating permutation

Stuck: My own vesion for generating permutation - python

I am trying to write my own code for generating permutation of items represented by numbers. Say 4 items can be represented by 0,1,2,3
I've seen the code from itertools product. That code is pretty neat. My way of coding this is using binary or ternary,... My code below only works for bits of less than 10. Part of this code split the str using list(s). Number 120 in base 11 is 1010, splitting '1010' yields, 1,0,1,0. For it to work correctly, I need to to split to 10, 10. Is there a way around this and still work with the rest of the code?
Alternatively, what is a recursive version for this? Thanks
aSet = 11
subSet = 2
s = ''
l = []
number = aSet**subSet
#finding all permutation, repeats allowed
for num in range(number):
s = ''
while num//aSet != 0:
s = str(num%aSet) + s
num = num//aSet
else:
s = str(num%aSet) + s
s = s.zfill(subSet)
l.append(list(s))

Indeed, the problem with using a string, is that list(s) will chop it into individual characters. You should not create a string at all, but use a list for s from the start:
aSet = 11
subSet = 2
l = []
number = aSet**subSet
#finding all permutation, repeats allowed
for num in range(number):
s = []
for _ in range(subSet):
s.insert(0, num%aSet)
num = num//aSet
l.append(s)

Related

How to generate a number sequence 1111222233334444....9999...?

I want to generate 111122223333.... A sequence of numbers, each number appearing the same number of times, up to a certain number.
I use python for loop to generate the number sequence, but it cost too much time when the end number is 7000.
import pandas as pd
startNum = 1
endNum = 7000
sequence = []
for i in range(endNum):
for j in range(endNum):
sequence.append(i)
print(i)
So what should i do to reduce time, and get my sequence? no matter method, not include excel.Thanks!
I'd like to get the number sequcency 111122223333

So your code is not really doing what you're asking, so I'll add a few comments on what your code does to understand where it's not working, and provide you with an answer that does what you want.
import pandas as pd
startNum = 1
endNum = 7000
sequence = []
for i in range(endNum): # Here you are looping from 1 to endNum = 7000
for j in range(endNum): # Here you are looping from 1 to endNum = 7000
sequence.append(i) # You are adding i (7000 times because of your previous loop)
print(i) # You probably mean to print sequence ?
You probably want the second loop to be run on the number of repeating characters that you want (which is 4).
Here's the code that does what you want:
startNum = 1
endNum = 7000
sequence = []
repeat = 4
for i in range(endNum):
for _ in range(repeat):
sequence.append(i)
print(sequence)
In your case, I'd prefer using extend and list comprehension (both codes are equivalent):
startNum = 1
endNum = 7000
sequence = []
repeat = 4
for i in range(endNum):
sequence.extend([i for _ in range(repeat)])

i don't know what for but
endnum = 7
''.join([f"{str(i)*4}" for i in range(endnum)])
print(result)
result
0000111122223333444455556666
and it takes less then 1s with endnum = 7000
0:00:00.006550

You can try this:
startNum = 1
endNum = 5
seq = [ (str(i+1))*endNum for i in range(endNum) ]
print("".join(seq))

Reverse digits in a number

I want to reverse digits in a number in python. Here are my two implementations.
One: convert the number into string and reverse each char in it
number = 2376674032
number_s = str(number)
index = len(number_s) - 1
str_list = []
while index > -1:
str_list.append(number_s[index])
index -= 1
result = int("".join(str_list))
print(result)
Two: using simple mathematics
number = 2376674032
N = 0
K = number
R = number % 10
while K > 0:
N = N*10 + R
K = K // 10
R = K % 10
result = N
print(result)
As I'm pretty new to python programming, so could someone help me with the following questions:
with the first approach, will "".join(str_list) produce a new string with each list element? if so is a better way to concatenate strings in python(something similar to StringBuffer in java)
which of the implementations is better from performance perspective?

You can reverse a string using -1 as the step in a slice. So this works:
number = 2376674032
number_s = str(number)
reverse_s = number_s[::-1]
reversed = int(reverse_s)

you want to reverse a number …..input it as string format , and do this:
number="8374783246837"
revnumber=number[::-1]
Done

a = 1234
a = int("".join(reversed(str(a))))
This will give a = 4321
reversed functions returns an iterable object.
If we do :
a = list(reversed(str(a)))
it will return [“3”,”2″,”1″]. We have then joined it and converted into int.

To make the number an integer type, we have to use the int function, as below:
numbers=str(123456)
#or numbers="123456"
print((int(numbers[::-1])))
print((type(int(numbers[::-1]))))
output:
654321
<class 'int'>

We can do this in a single line as well using [::-1]
print(int(str(int(input()))[::-1]))

#here is my answer . you can do it using simple recursion
# count digits recursively
def reverse_digits(n):
# base case
if n == 0:
pass
#recursive case
else:
print(n%10,end='')
return reverse_digits(n//10)
# reverse 123
reverse_digits(123)
````````````````````````````````````````````````````

Compressing multiple nested `for` loops

Similar to this and many other questions, I have many nested loops (up to 16) of the same structure.
Problem: I have 4-letter alphabet and want to get all possible words of length 16. I need to filter those words. These are DNA sequences (hence 4 letter: ATGC), filtering rules are quite simple:
no XXXX substrings (i.e. can't have same letter in a row more than 3 times, ATGCATGGGGCTA is "bad")
specific GC content, that is number of Gs + number of Cs should be in specific range (40-50%). ATATATATATATA and GCGCGCGCGCGC are bad words
itertools.product will work for that, but data structure here gonna be giant (4^16 = 4*10^9 words)
More importantly, if I do use product, then I still have to go through each element to filter it out. Thus I will have 4 billion steps times 2
My current solution is nested for loops
alphabet = ['a','t','g','c']
for p1 in alphabet:
for p2 in alphabet:
for p3 in alphabet:
...skip...
for p16 in alphabet:
word = p1+p2+p3+...+p16
if word_is_good(word):
good_words.append(word)
counter+=1
Is there good pattern to program that without 16 nested loops? Is there a way to parallelize it efficiently (on multi-core or multiple EC2 nodes)
Also with that pattern i can plug word_is_good? check inside middle of the loops: word that starts badly is bad
...skip...
for p3 in alphabet:
word_3 = p1+p2+p3
if not word_is_good(word_3):
break
for p4 in alphabet:
...skip...

from itertools import product, islice
from time import time
length = 16
def generate(start, alphabet):
"""
A recursive generator function which works like itertools.product
but restricts the alphabet as it goes based on the letters accumulated so far.
"""
if len(start) == length:
yield start
return
gcs = start.count('g') + start.count('c')
if gcs >= length * 0.5:
alphabet = 'at'
# consider the maximum number of Gs and Cs we can have in the end
# if we add one more A/T now
elif length - len(start) - 1 + gcs < length * 0.4:
alphabet = 'gc'
for c in alphabet:
if start.endswith(c * 3):
continue
for string in generate(start + c, alphabet):
yield string
def brute_force():
""" Straightforward method for comparison """
lower = length * 0.4
upper = length * 0.5
for s in product('atgc', repeat=length):
if lower <= s.count('g') + s.count('c') <= upper:
s = ''.join(s)
if not ('aaaa' in s or
'tttt' in s or
'cccc' in s or
'gggg' in s):
yield s
def main():
funcs = [
lambda: generate('', 'atgc'),
brute_force
]
# Testing performance
for func in funcs:
# This needs to be big to get an accurate measure,
# otherwise `brute_force` seems slower than it really is.
# This is probably because of how `itertools.product`
# is implemented.
count = 100000000
start = time()
for _ in islice(func(), count):
pass
print(time() - start)
# Testing correctness
global length
length = 12
for x, y in zip(*[func() for func in funcs]):
assert x == y, (x, y)
main()
On my machine, generate was just a bit faster than brute_force, at about 390 seconds vs 425. This was pretty much as fast as I could make them. I think the full thing would take about 2 hours. Of course, actually processing them will take much longer. The problem is that your constraints don't reduce the full set much.
Here's an example of how to use this in parallel across 16 processes:
from multiprocessing.pool import Pool
alpha = 'atgc'
def generate_worker(start):
start = ''.join(start)
for s in generate(start, alpha):
print(s)
Pool(16).map(generate_worker, product(alpha, repeat=2))

Since you happen to have an alphabet of length 4 (or any "power of 2 integer"), the idea of using and integer ID and bit-wise operations comes to mind instead of checking for consecutive characters in strings. We can assign an integer value to each of the characters in alphabet, for simplicity lets use the index corresponding to each letter.
Example:
6546354310 = 33212321033134 = 'aaaddcbcdcbaddbd'
The following function converts from a base 10 integer to a word using alphabet.
def id_to_word(word_id, word_len):
word = ''
while word_id:
rem = word_id & 0x3 # 2 bits pet letter
word = ALPHABET[rem] + word
word_id >>= 2 # Bit shift to the next letter
return '{2:{0}>{1}}'.format(ALPHABET[0], word_len, word)
Now for a function to check whether a word is "good" based on its integer ID. The following method is of a similar format to id_to_word, except a counter is used to keep track of consecutive characters. The function will return False if the maximum number of identical consecutive characters is exceeded, otherwise it returns True.
def check_word(word_id, max_consecutive):
consecutive = 0
previous = None
while word_id:
rem = word_id & 0x3
if rem != previous:
consecutive = 0
consecutive += 1
if consecutive == max_consecutive + 1:
return False
word_id >>= 2
previous = rem
return True
We're effectively thinking of each word as an integer with base 4. If the Alphabet length was not a "power of 2" value, then modulo % alpha_len and integer division // alpha_len could be used in place of & log2(alpha_len) and >> log2(alpha_len) respectively, although it would take much longer.
Finally, finding all the good words for a given word_len. The advantage of using a range of integer values is that you can reduce the number of for-loops in your code from word_len to 2, albeit the outer loop is very large. This may allow for more friendly multiprocessing of your good word finding task. I have also added in a quick calculation to determine the smallest and largest IDs corresponding to good words, which helps significantly narrow down the search for good words
ALPHABET = ('a', 'b', 'c', 'd')
def find_good_words(word_len):
max_consecutive = 3
alpha_len = len(ALPHABET)
# Determine the words corresponding to the smallest and largest ids
smallest_word = '' # aaabaaabaaabaaab
largest_word = '' # dddcdddcdddcdddc
for i in range(word_len):
if (i + 1) % (max_consecutive + 1):
smallest_word = ALPHABET[0] + smallest_word
largest_word = ALPHABET[-1] + largest_word
else:
smallest_word = ALPHABET[1] + smallest_word
largest_word = ALPHABET[-2] + largest_word
# Determine the integer ids of said words
trans_table = str.maketrans({c: str(i) for i, c in enumerate(ALPHABET)})
smallest_id = int(smallest_word.translate(trans_table), alpha_len) # 1077952576
largest_id = int(largest_word.translate(trans_table), alpha_len) # 3217014720
# Find and store the id's of "good" words
counter = 0
goodies = []
for i in range(smallest_id, largest_id + 1):
if check_word(i, max_consecutive):
goodies.append(i)
counter += 1
In this loop I have specifically stored the word's ID as opposed to the actual word itself incase you are going to use the words for further processing. However, if you are just after the words then change the second to last line to read goodies.append(id_to_word(i, word_len)).
NOTE: I receive a MemoryError when attempting to store all good IDs for word_len >= 14. I suggest writing these IDs/words to a file of some sort!

How to split a returned integer: (Trying to find recurrences within it)

This might be a very simple question, but it's giving me a lot of trouble.
Code:
def search_likes(passed_list): #passed_list contains links to find below
print("Found",len(passed_list),"videos, now finding likes.")
x = 0
print("Currently fidning likes for video",x,".")
while x< len(passed_list):
likeFINDER = []
r = requests.get(passed_list[0])
soup= BeautifulSoup(r.content, 'lxml')
d_data = soup.find_all("span", {"class": "yt-uix-button-content"}) #Location of the number i'm looking for
likeFINDER.append(d_data)
str1= ''.join(str(e) for e in likeFINDER) #Converts the list into a string
likeNUMBER= (int(''.join(list(filter(lambda x: x.isdigit(), str1))))) #Removes string and leaves integers
x+=1 #count
Output:
845528455314391440
I would like to split the code where it begins to repeat itself. Ie ['84552','8455314391440']
If you have any insight on how to do this I would really appreciate it!
Thanks,
Ben

Given a string s containing your numbers, and a number n that is the size of the repetition you want to find, then you can do:
s.find(s[:n], n)
This finds the index of the first occurrence, after the start of the string, that is equal to the first n characters of the string. For example:
s = str(845528455314391440)
n = 3
r = s.find(s[:n], n)
print(r)
Output:
5
You can then use that to split the string and turn the parts into numbers:
a, b = int(s[:r]), int(s[r:])
print(a, b)
Output:
84552 8455314391440
All combined into a function, accounting for numbers without repitition:
def split_repeat(i, n):
s = str(i)
r = s.find(s[:n], n)
if r == -1:
return None
else:
return int(s[:r]), int(s[r:])
Usage:
print(split_repeat(845528455314391440, 3))
print(split_repeat(876543210, 3))
print(split_repeat(1122, 3))
Output:
(84552, 8455314391440)
None
None

Here is a simple example that will show yo how you can match the numbers of how ever many of the first few digits you need to. This example will use the first 2 digits.
we can turn the numbers into strings then use string_name[:2] where 2 is the number of digits from the front you want to match. I am using the number 11 to match my list of numbers but this is only for an example.
Let me know if you have any question:
set_list = []
var1 = 11012314
for i in range(1000):
set_list.append(i)
for item in set_list:
x = str(item)
y = str(var1)
if x[:2] == y[:2]:
print(item)
When you run this code you will see numbers printed to the console that match the first two digits to our variable with the value 11.

you can do :
def myfunc(num):
a=str(num)
for i in range(len(a)):
l1=a[:i]
l2=a[i:]
if l1 in l2:
b=l1
return [int(b), int(a[len(b):])]
that will give you :
>>> myfunc(845538455314391440)
[84553, 8455314391440]

Python - replay values in list

Please help for task with the list in Python my logic is bad works:( .
This is full text of task: Write a program that takes a list of
numbers on one line and displays the values in a single row, are
repeated in it more than once.
To solve the problem can be useful sort method list.
The procedure for withdrawal of repetitive elements may be arbitrary.
My beginning code is :
st = (int(i) for i in input().split())
ls = []
for k in st:
if k == k + 1 and k > 1:
Task is : if we have replay value in list we must print it. We only can use sort() method and without any modules importing.
Results Examples:
Sample Input 1:
4 8 0 3 4 2 0 3
Sample Output 1:
0 3 4
Sample Input 2:
10
Sample Output 2:
Sample Input 3:
1 1 2 2 3 3
Sample Output 3:
1 2 3
This code isn't run( sort() function doesn't want sort my_list. But I must input values like my_list = (int(k) for k in input().split())
st = list(int(k) for k in input())
st.sort()
for i in range(0,len(st)-1):
if st[i] == st[i+1]:
print(str(st[i]), end=" ")

my_list = (int(k) for k in input().split())
After running this line, my_list is a generator, something that will create a sequence - but hasn't yet done so. You can't sort a generator. You either need to use []:
my_list = [int(k) for k in input().split()]
my_list.sort()
which makes my_list into a list from the start, instead of a generator, or:
my_list = list(int(k) for k in input().split()))
my_list.sort()
gather up the results from the generator using list() and then store it in my_list.
Edit: for single digits all together, e.g. 48304, try [int(k) for k in input()]. You can't usefully do this with split().
Edit: for printing the results too many times: make the top of the loop look backwards a number, like this, so if it gets to the second or third number of a repeating number, it skips over and continues on around the loop and doesn't print anything.
for i in range(0,len(st)-1):
if st[i] == st[i-1]:
continue
if st[i] == st[i+1]:
print...

st = (int(i) for i in input().split())
used = []
ls = []
for k in st:
if k in used: # If the number has shown up before:
if k not in used: ls.append(k) # Add the number to the repeats list if it isn't already there
else:
used.append(k) # Add the number to our used list
print ' '.join(ls)
In summary, this method uses two lists at once. One keeps track of numbers that have already shown up, and one keeps track of second-timers. At the end the program prints out the second-timers.

I'd probably make a set to keep track of what you've seen, and start appending to a list to keep track of the repeats.
lst = [num for num in input("prompt ").split()]
s = set()
repeats = []
for num in lst:
if num in s and num not in repeats:
repeats.append(num)
s.add(num)
print ' '.join(map(str,repeats))
Note that if you don't need to maintain order in your output, this is faster:
lst = [num for num in input("prompt ").split()]
s = set()
repeats = set()
for num in lst:
if num in s:
repeats.add(num)
s.add(num)
print ' '.join(map(str, repeats))
Although if you can use imports, there's a couple cool ways to do it.
# Canonically...
from collections import Counter
' '.join([num for num,count in Counter(input().split()).items() if count>1])
# or...
from itertools import groupby
' '.join([num for num,group in groupby(sorted(input().split())) if len(list(group))>1])
# or even...
from itertools import tee
lst = sorted(input('prompt ').split())
cur, nxt = tee(lst)
next(nxt) # consumes the first element, putting it one ahead.
' '.join({cur for (cur,nxt) in zip(cur,nxt) if cur==nxt})

this gives the answers you're looking for, not sure if it's exactly the intended algorithm:
st = (int(i) for i in input().split())
st = [i for i in st]
st.sort()
previous = None
for current in st:
if ((previous is None and current <= 1)
or (previous is not None and current == previous + 1)):
print(current, end=' ')
previous = current
>>> "4 8 0 3 4 2 0 3"
0 3 4
>>> "10"
>>> "1 1 2 2 3 3"
1 2 3
updated to:
start with st = (int(i) for i in input().split())
use only sort method, no other functions or methods... except print (Python3 syntax)
does that fit the rules?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Stuck: My own vesion for generating permutation - python

Related

How to generate a number sequence 1111222233334444....9999...?

Reverse digits in a number

Compressing multiple nested `for` loops

How to split a returned integer: (Trying to find recurrences within it)

Python - replay values in list

Categories

Resources