Can I pause itertools on python, and resume later? - python

I need to create a list of strings with all the possible combinations of all letters uppercase and lowercase, with non repeating characters, of lenght 14, this is massive and I know it will take a lot of time and space.
My code right now is this:
import itertools
filename = open("strings.txt", "w")
for com in itertools.permutations('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 14):
filename.write("\n"+"1_"+"".join(com)+"\n"+"0_"+"".join(com))
print ("".join(com))
pretty basic, it does the job and I have not found a faster way as of yet (tried a java algorithm I found that seemed faster but python was faster)
Since this will take a long time, from time to time I need to turn off my computer, so I need to be able to save somewhere where I left and continue, else I will start from the beginning each time it crashes/turn off my pc / anything happen.
Is there any way to do that?

You can pickle that iterator object. Its internal state will be stored in the pickle file. When you resume it should start from where it left off.
Something like this:
import itertools
import os
import pickle
import time
# if the iterator was saved, load it
if os.path.exists('saved_iter.pkl'):
with open('saved_iter.pkl', 'rb') as f:
iterator = pickle.load(f)
# otherwise recreate it
else:
iterator = itertools.permutations('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 14)
try:
for com in iterator:
# process the object from the iterator
print(com)
time.sleep(1.0)
except KeyboardInterrupt:
# if the script is about to exit, save the iterator state
with open('saved_iter.pkl', 'wb') as f:
pickle.dump(iterator, f)
Which results in:
>python so_test.py
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'o')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'p')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'q')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'r')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 's')
>python so_test.py
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 't')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'u')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'v')
('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'w')

Related

Custom re-arrange list using python

I have the following list:
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
I would like to sort the list such that every sixth element comes after the fifth value, eleventh after the second, second after the third, so on. The list should be of the following output:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
What I tried so far?
lst = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
new_lst = [lst[0], lst[5], lst[10], lst[1], lst[6], lst[11], lst[2], lst[7], lst[12], lst[3], lst[8], lst[13] , lst[4], lst[9], lst[14]]
new_lst
This provides the desired output, but I am looking for an optimal script. How do I do that?
From the pattern, reshape as 2d then transpose and flatten
sum is convenient function where you can mention start point, in this case the identity is () or [], depending on type
### sol 1
import numpy as np
print('Using numpy')
x = ['A', 'B', 'C','D', 'E', 'F', 'G', 'H','I','J', 'K', 'L','M','N','O']
np.array(x).reshape((-1, 5)).transpose().reshape(-1)
# array(['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O'], dtype='<U1')
# Sol 2
print('One more way without numpy')
list(
sum(
zip(x[:6], x[5:11], x[10:]),
()
)
)
# Sol 3
print('One more way without numpy')
sum(
[list(y) for y in zip(x[:6], x[5:11], x[10:])],
[]
)
# Sol 4
print('One more way without numpy')
list(
sum(
[y for y in zip(x[:6], x[5:11], x[10:])],
()
)
)
You can also use list comprehension if you want to avoid libraries:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
[x for t in zip(lst[:6], lst[5:11], lst[10:]) for x in t]
# ['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']
If you want it repeating for every fifth and tenth element after current, then it would be
# Must consist of at least 14 values
input_list = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
output_list = []
for i in range(len(t) // 3):
output_list.append(t[i])
output_list.append(t[i + 5])
output_list.append(t[i + 10])
print(output_list)
No libraries used. It will give the desired result:
['A', 'F', 'K', 'B', 'G', 'L', 'C', 'H', 'M', 'D', 'I', 'N', 'E', 'J', 'O']

How can I get all the unique categories within my dataframe using python? [duplicate]

This question already has answers here:
Find the unique values in a column and then sort them
(8 answers)
Closed 3 years ago.
im new to python and trying to work with dataframes manipulation:
I have a df with unique categories:
I am unable to paste the dataframe because I use Spyder IDE and it is not interactive does not display all fields.
My input to get all these unique categories within a dataframe:
uc =[]
for i in df['Category']:
if i[0] not in df['Category']:
uc.append(i[0])
print(uc)
But when I use this script, I only receive the first letters of these categories:
Output:
['F', 'P', 'N', 'F', 'L', 'T', 'W', 'S', 'W', 'B', 'S', 'F', 'T', 'T', 'B', 'T', 'B', 'L', 'S', 'F', 'F', 'F', 'N', 'P', 'H', 'T', 'L', 'T', 'S', 'E', 'P', 'N', 'T', 'L', 'P', 'L', 'W', 'F', 'N', 'L', 'N', 'L', 'F', 'F', 'N', 'T', 'P', 'L', 'B', 'W', 'L', 'W', 'F', 'F', 'H', 'T', 'F', 'T', 'T', 'N', 'G', 'L', 'M', 'N', 'F', 'N', 'F', 'L', 'N', 'P', 'F', 'B', 'B', 'S', 'F', 'P', 'F', 'P', 'P', 'P', 'B', 'P', 'B', 'B', 'L', 'B', 'F', 'P', 'P', 'B', 'B', 'C', 'G', 'C', 'G', 'B', 'P', 'T', 'P', 'P', 'N', 'G', 'S', 'G', 'F', 'G', 'F', 'T', 'S', 'P', 'F', 'C', 'C', 'C', 'C', 'C', 'G', 'C', 'F', 'C', 'F', 'B', 'G', 'C', 'B', 'B', 'B', 'C', 'P', 'G', 'S', 'D', 'P', 'G', 'F', 'L', 'C', 'G', 'P', 'S', 'B', 'P', 'T', 'T', 'L', 'M', 'F', 'T', 'P', 'C', 'F', 'B', 'M', 'G', 'C', 'P', 'T', 'L', 'F', 'F', 'F', 'T', 'P', 'C', 'G', 'T', 'F', 'F', 'S', 'B', 'M', 'T', 'T', 'T', 'T', 'H', 'B', 'N', 'F', 'A', 'T', 'E', 'M', 'L', 'G', 'P', 'B', 'L', 'N', 'S', 'G', 'G', 'F', 'F', 'F', 'G', 'G', 'G', 'G', 'F', 'T', 'G', 'P', 'G', 'C', 'G', 'G', 'G', 'F', 'T', 'T', 'L', 'F', 'S', 'T', 'F', 'F', 'G', 'G', 'L', 'M', 'T', 'L', 'F', 'B', 'A', 'F', 'B', 'F', 'B', 'B', 'T', 'F', 'B', 'F', 'F', 'P', 'V', 'M', 'S', 'F', 'C', 'B', 'N', 'M', 'W', 'B', 'F', 'B', 'F', 'F', 'M', 'L']
How do I change my script to reveive unique categories within a dataframe?
Try with
df['Category'].unique()
print(df['Category'].unique()) see what you get.
Also, i[0] is retrieving the first char of a string value in the df['Category'].
also, if you are new to pandas, you MUST abandon the old habit of for loop. And always type() your result to obtain better understanding.
Do you want this?
uc = set(df['Category'])
This will create a set containing the unique values of 'Category'

Remove a '\n' list item from a list of a reading file

I have a similar problem. My code so far is:
file='/Users/Giannis/Desktop/Python Assegments/Week 6/boardlist1.txt'
boardlist = []
file = open(file, 'r')
line = file.readlines()
wordstring = ''
for i in range(0,len(line)):
final_list = []
raw = list(line[i])
boardlist.append(raw)
print(boardlist)
file.close()
The file you see is a txt which is:
EFJAJCOWSS
SDGKSRFDFF
ASRJDUSKLK
HEANDNDJWA
ANSDNCNEOP
PMSNFHHEJE
JEPQLYNXDL
My print results are:
[['E', 'F', 'J', 'A', 'J', 'C', 'O', 'W', 'S', 'S', '\n'], ['S', 'D', 'G', 'K', 'S', 'R', 'F', 'D', 'F', 'F', '\n'], ['A', 'S', 'R', 'J', 'D', 'U', 'S', 'K', 'L', 'K', '\n'], ['H', 'E', 'A', 'N', 'D', 'N', 'D', 'J', 'W', 'A', '\n'], ['A', 'N', 'S', 'D', 'N', 'C', 'N', 'E', 'O', 'P', '\n'], ['P', 'M', 'S', 'N', 'F', 'H', 'H', 'E', 'J', 'E', '\n'], ['J', 'E', 'P', 'Q', 'L', 'Y', 'N', 'X', 'D', 'L']]
And I want to remove every \n character in it. How can I do that with this code?
Use strip(), here is the shortest version of your code :
file='/Users/Giannis/Desktop/Python Assegments/Week 6/boardlist1.txt'
with open(file, 'r') as f:
print([list(i) for i in [i.strip() for i in f]])
[['E', 'F', 'J', 'A', 'J', 'C', 'O', 'W', 'S', 'S'], ['S', 'D', 'G', 'K', 'S', 'R', 'F', 'D', 'F', 'F'], ['A', 'S', 'R', 'J', 'D', 'U', 'S', 'K', 'L', 'K'], ['H', 'E', 'A', 'N', 'D', 'N', 'D', 'J', 'W', 'A'], ['A', 'N', 'S', 'D', 'N', 'C', 'N', 'E', 'O', 'P'], ['P', 'M', 'S', 'N', 'F', 'H', 'H', 'E', 'J', 'E'], ['J', 'E', 'P', 'Q', 'L', 'Y', 'N', 'X', 'D', 'L']]
Note : You don't need to use readlines(), just iterate the file object.

Sliding window in Python

I used the following famous code for my sliding window through the tokenised text document:
def window(fseq, window_size):
"Sliding window"
it = iter(fseq)
result = tuple(islice(it, 0, window_size, round(window_size/4)))
if len(result) == window_size:
yield result
for elem in it:
result = result[1:] + (elem,)
result_list = list(result)
yield result_list
when I want to call my function with window size less than 6, everything is ok, but when I increase it, the beginning of the text is cut.
For example:
c=['A','B','C','D','E', 'F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
print(list(window(c, 4)))
print(list(window(c, 8)))
Output:
[('A', 'B', 'C', 'D'), ['B', 'C', 'D', 'E'], ['C', 'D', 'E', 'F'], ['D', 'E', 'F', 'G'], ['E', 'F', 'G', 'H'], ['F', 'G', 'H', 'I'],...
[['C', 'E', 'G', 'I'], ['E', 'G', 'I', 'J'], ['G', 'I', 'J', 'K'], ['I', 'J', 'K', 'L'], ['J', 'K', 'L', 'M']...
What's wrong? And why in the first output the first element is in round brackets?
My expected output for print(list(window(c, 8))) is:
[['A','B','C', 'D', 'E', 'F','G','H'], ['C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], ['E', 'F', 'G', 'H', 'I', 'K', 'L', 'M']...
Your version is incorrect. It adds a 4th argument (the step size) to the islice() function that limits how large the first slice taken is going to be:
result = tuple(islice(it, 0, window_size, round(window_size/4)))
For 4 or 5, round(window_size/4) produces 1, the default step size. But for larger values, this produces a step size that guarantees that values will be omitted from that first window, so the next test, len(result) == window_size is guaranteed to be false.
Remove that step size argument, and you'll get your first window back again. Also see Rolling or sliding window iterator in Python.
The first result is in 'round brackets' because it is a tuple. If you wanted a list instead, use list() rather than tuple() in your code.
If you wanted to have your window slide along in steps larger than 1, you should not alter the initial window. You need to add and remove step size elements from the window as you iterate along. That's easier done with a while loop:
def window_with_larger_step(fseq, window_size):
"""Sliding window
The step size the window moves over increases with the size of the window.
"""
it = iter(fseq)
result = list(islice(it, 0, window_size))
if len(result) == window_size:
yield result
step_size = max(1, int(round(window_size / 4))) # no smaller than 1
while True:
new_elements = list(islice(it, step_size))
if len(new_elements) < step_size:
break
result = result[step_size:] + list(islice(it, step_size))
yield result
This adds step_size elements to the running result, removing step_size elements from the start to keep the window size even.
Demo:
>>> print(list(window_with_larger_step(c, 6)))
[['A', 'B', 'C', 'D', 'E', 'F'], ['C', 'D', 'E', 'F', 'I', 'J'], ['E', 'F', 'I', 'J', 'M', 'N'], ['I', 'J', 'M', 'N', 'Q', 'R'], ['M', 'N', 'Q', 'R', 'U', 'V'], ['Q', 'R', 'U', 'V', 'Y', 'Z']]
>>> print(list(window_with_larger_step(c, 8)))
[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], ['C', 'D', 'E', 'F', 'G', 'H', 'K', 'L'], ['E', 'F', 'G', 'H', 'K', 'L', 'O', 'P'], ['G', 'H', 'K', 'L', 'O', 'P', 'S', 'T'], ['K', 'L', 'O', 'P', 'S', 'T', 'W', 'X'], ['O', 'P', 'S', 'T', 'W', 'X']]
>>> print(list(window_with_larger_step(c, 10)))
[['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], ['D', 'E', 'F', 'G', 'H', 'I', 'J', 'N', 'O', 'P'], ['G', 'H', 'I', 'J', 'N', 'O', 'P', 'T', 'U', 'V'], ['J', 'N', 'O', 'P', 'T', 'U', 'V', 'Z']]

Python - Permutation/Combination column-wise

I have a list
mylist = [
['f', 'l', 'a', 'd', 'l', 'f', 'k'],
['g', 'm', 'b', 'b', 'k', 'g', 'l'],
['h', 'n', 'c', 'a', 'm', 'j', 'o'],
['i', 'o', 'd', 'c', 'n', 'i', 'm'],
['j', 'p', 'e', 'e', 'o', 'h', 'n'],
]
I want do permutation/combination column-wise, such the elements of the column are restricted to that column i.e., f,g,h,i,j remain in Column 1, l,m,n,o,p remain in Column 2 and so on, in the results of permutation/combination. How can this be achieved in Python 2.7?
You could use zip(*mylist) to list the "columns" of mylist. Then use the * operator (again) to unpack those lists as arguments to IT.product or IT.combinations. For example,
import itertools as IT
list(IT.product(*zip(*mylist)))
yields
[('f', 'l', 'a', 'd', 'l', 'f', 'k'),
('f', 'l', 'a', 'd', 'l', 'f', 'l'),
('f', 'l', 'a', 'd', 'l', 'f', 'o'),
('f', 'l', 'a', 'd', 'l', 'f', 'm'),
...]

Categories

Resources