I have two lists. One with names, and one with numbers that correspond with a name in the first list (corresponding name and number are at the same index point in each list). I need to reference each name and number in a url that can only handle 25 different names & points at a time.
pointNames = ['name1', 'name2', 'name3']
points = ['1', '2', '3'] #yes, the numbers are meant to be strings
My actual lists have roughly 600 values in each. What I'm trying to do is loop through each list at the same time, but in increments of 25. I'm able to do this with a single list using the following:
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
for group in chunker(pointNames, 25):
print (group)
This prints multiple groups of 25 values from the list until it has gone through the entire list. I want to do exactly this, but with two lists. I'm able to print each list entirely with for(point, name) in zip(points, pointNames): but it I need it in groups of 25.
I've also tried combining the two lists into a dictionary:
dictionary = dict(zip(points, pointNames))
for group in chunker(dictionary, 25):
print (group)
but i get the following error:
TypeError: unhashable type: 'slice'
A generator would be more efficient:
import itertools
def chunker(size, *seq):
seq = zip(*seq)
while True:
val = list(itertools.islice(seq, size))
if not val:
break
yield val
for group in chunker(2, pointNames, points):
print(group)
gen_groups = chunker(2, pointNames, points, pointNames, points)
group = next(gen_groups)
print(group)
Using *seq allows you to give any number of list as parameters.
How about this relatively minimal change to your first function:
def chunker(seq1, seq2, size):
seq = list(zip(seq1, seq2))
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
Call as follows:
for group in chunker(pointNames, points, 25):
print(group)
Itertools can slice an iterator (or generator) into chunks, together with a small helper function to keep going until it is done:
import itertools
# helper function, https://stackoverflow.com/a/24527424
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield itertools.chain([first], itertools.islice(iterator, size - 1))
# 600 points and pointNames
points = (str(i) for i in range(600))
pointNames = ('name ' + str(i) for i in range(600))
# work with the chunks
for chunk in chunks(zip(pointNames, points), 25):
print('-' * 40)
print(list(chunk))
Related
I would like to know how I could return every output of the list crack into a Text file
For example if I get:
abandon able ability
I would to have this (what I call output of crack) output in a text file. And so on
def generate(arr, i, s, len):
# base case
if (i == 0): # when len has
# been reached
# print it out
print(s)
return
# iterate through the array
for j in range(0, len):
# Create new string with
# next character Call
# generate again until
# string has reached its len
appended = s + arr[j]
generate(arr, i - 1, appended, len)
return
# function to generate
# all possible passwords
def crack(arr, len):
# call for all required lengths
for i in range(3 , 5):
generate(arr, i, "", len)
# Driver Code
arr = ["abandon ","ability ","able ","about ","above ","absent "]
len = len(arr)
crack(arr, len)
Demonstration of permutation vs combination of 3 items taken from a group of 5 items
In math the words combination and permutation have specific and different meanings. for 2048 items taken 12 at a time:
permutation means the order of the 12 items is significant and the results will be 5.271537971E+39 items.
combination means the order of the 12 items is not significant and the results will be '1.100526171E+31' items.
Both combination and permutation are easy to program in python using the `itertools' library. But, either program will take a long time to run.
import itertools
lower_case_letters = ['a', 'b', 'c', 'd', 'e']
print(' Permutations '.center(80, '*'))
permutations = []
sample_size = 3
for unique_sample in itertools.permutations(lower_case_letters, sample_size):
permutations.append(unique_sample)
print(f'{len(permutations)=} {unique_sample=}')
f = open('permutations_results.txt', 'w')
for permutation in permutations:
f.write(", ".join(permutation) + '\n')
f.close()
print(' Combinations '.center(80, '*'))
combinations = []
for unique_sample in itertools.combinations(lower_case_letters, sample_size):
combinations.append(unique_sample)
print(f'{len(combinations)=} {unique_sample=}')
f = open('combinations_results.txt', 'w')
for combination in combinations:
f.write(", ".join(combination) + '\n')
f.close()
How do you remove similar items in a list in Python but only for a given item. Example,
l = list('need')
If 'e' is the given item then
l = list('nd')
The set() function will not do the trick since it will remove all duplicates.
count() and remove() is not efficient.
use filter
assuming you write function that decide on the items that you want to keep in the list.
for your example
def pred(x):
return x!="e"
l=list("need")
l=list(filter(pred,l))
Assuming given = 'e' and l= list('need').
for i in range(l.count(given)):
l.remove(given)
If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.
import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]
The output for j and k are :
j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]
If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx
re.subn('e','',''.join(lst))[1]
This will give you total number of items replaced in the list.
List comprehension Method. Not sure if the size/complexity is less than that of count and remove.
def scrub(l, given):
return [i for i in l if i not in given]
Filter method, again i'm not sure
def filter_by(l, given):
return list(filter(lambda x: x not in given, l))
Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp
def bruteforce(l, given):
try:
l.remove(given[0])
return bruteforce(l, given)
except ValueError:
return bruteforce(l, given[1:])
except IndexError:
return l
return l
For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!
Below is the method I've chosen to use.
def timer(func, name):
print("-------{}-------".format(name))
try:
start = datetime.datetime.now()
x = func()
end = datetime.datetime.now()
print((end-start).microseconds)
except Exception, e:
print("Failed: {}".format(e))
print("\r")
The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.
l = list("need"*50000)
q = list("ne")
r = list("d"*50000)
For posterity I've added the count / remove method the OP was against. (For good reason!)
def count_remove(l, given):
for i in given:
for x in range(l.count(i)):
l.remove(i)
return l
All that's left to do is test!
timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)
timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)
timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)
timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)
And our results
-------List Comp-------
10000
-------Filter-------
28000
-------Count/Remove-------
199000
-------Bruteforce-------
Failed: maximum recursion depth exceeded
Process finished with exit code 0
The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.
I would like to parse an HDF file that has the following format
HDFFile/
Group1/Subgroup1/DataNDArray
...
/SubgroupN/DataNDArray
...
GroupM/Subgroup1/DataNDArray
...
/SubgroupN/DataNDArray
I am trying to use itertools.product but I get stuck on what to use for the second iterator
MWE:
from itertools import *
import h5py
hfilename = 'data.hdf'
with h5py.File(hfilename, 'r') as hfile:
for group, subgroup, dim in product(hfile.itervalues(), ????, range(10));
parse(group, subgroup, dim)
Obviously my problem is that the second iterator would depend on the extracted value of the first iterator, which can't be available in the same one liner.
I know that I can do it with for loops or with the following example:
with h5py.File(hfilename, 'r') as hfile:
for group in hfile.itervalues():
for subgroup, dim in product(group.itervalues(), range(10)):
parse(group, subgroup, dim)
but I was wondering if there is a way to do it in one itertools run.
Does the second iterator depend on the extracted value of the first iterator? From your example it seems like there are N subgroups in every group.
A solution with list comprehensions and () generators (instead of product) would look like:
M = 3
N = 2
a = ['Group' + str(m) for m in range(1, M + 1)]
b = ['Subgroup' + str(n) for n in range(1, N + 1)]
c = ('{}/{}/DataNDArray'.format(ai, bi) for ai in a for bi in b)
for key in c:
print(key)
and print:
Group1/Subgroup1/DataNDArray
Group1/Subgroup2/DataNDArray
Group2/Subgroup1/DataNDArray
Group2/Subgroup2/DataNDArray
Group3/Subgroup1/DataNDArray
Group3/Subgroup2/DataNDArray
which should be what you want.
input = ['beleriand','mordor','hithlum','eol','morgoth','melian','thingol']
I'm having trouble creating X number of lists of size Y without repeating any elements.
What I have been doing is using:
x = 3
y = 2
import random
output = random.sample(input, y)
# ['mordor', 'thingol']
but if I repeat this then I will have repeats.
I would like the output to be something like
[['mordor', 'thingol'], ['melian', 'hithlum'], ['beleriand', 'eol']]
since I chose x = 3 (3 lists) of size y = 2 (2 elements per list).
def random_generator(x,y):
....
You can simply shuffle the original list and then generate n groups of m elements successively from it. There may be fewer or more than that number of groups possible. Note thatinputis the name of a Python built-in function, so I renamed itwords.
import itertools
from pprint import pprint
import random
def random_generator(seq, n, m):
rand_seq = seq[:] # make a copy to avoid changing input argument
random.shuffle(rand_seq)
lists = []
limit = n-1
for i,group in enumerate(itertools.izip(*([iter(rand_seq)]*m))):
lists.append(group)
if i == limit: break # have enough
return lists
words = ['beleriand', 'mordor', 'hithlum', 'eol', 'morgoth', 'melian', 'thingol']
pprint(random_generator(words, 3, 2))
Output:
[('mordor', 'hithlum'), ('thingol', 'melian'), ('morgoth', 'beleriand')]
It would be more Pythonic to generate the groups iteratively. The above function could easily be turned into generator by making ityieldeach group, one-by-one, instead of returning them all in a relatively much longer list-of-lists:
def random_generator_iterator(seq, n, m):
rand_seq = seq[:]
random.shuffle(rand_seq)
limit = n-1
for i,group in enumerate(itertools.izip(*([iter(rand_seq)]*m))):
yield group
if i == limit: break
pprint([group for group in random_generator_iterator(words, 3, 2)])
rather than randomly taking two things from your list, just randomize your list and iterate through it to create your new array of the dimensions you specify!
import random
my_input = ['beleriand','mordor','hithlum','eol','morgoth','melian','thingol']
def random_generator(array,x,y):
random.shuffle(array)
result = []
count = 0
while count < x:
section = []
y1 = y * count
y2 = y * (count + 1)
for i in range (y1,y2):
section.append(array[i])
result.append(section)
count += 1
return result
print random_generator(my_input,3,2)
You could use random.sample in combination with the itertools.grouper recipe.
input = ['beleriand','mordor','hithlum','eol','morgoth','melian','thingol']
import itertools
import random
def grouper(iterable,group_size):
return itertools.izip(*([iter(iterable)]*group_size))
def random_generator(x,y):
k = x*y
sample = random.sample(input,k)
return list(grouper(sample,y))
print random_generator(3,2)
print random_generator(3,2)
print random_generator(3,2)
print random_generator(3,2)
print random_generator(3,2)
print random_generator(3,2)
for one run, this results in:
[('melian', 'mordor'), ('hithlum', 'eol'), ('thingol', 'morgoth')]
[('hithlum', 'thingol'), ('mordor', 'beleriand'), ('morgoth', 'eol')]
[('morgoth', 'beleriand'), ('melian', 'thingol'), ('hithlum', 'mordor')]
[('beleriand', 'thingol'), ('melian', 'hithlum'), ('eol', 'morgoth')]
[('mordor', 'hithlum'), ('eol', 'beleriand'), ('melian', 'morgoth')]
[('mordor', 'melian'), ('thingol', 'beleriand'), ('morgoth', 'eol')]
And the next run:
[('mordor', 'thingol'), ('eol', 'hithlum'), ('melian', 'beleriand')]
[('eol', 'beleriand'), ('mordor', 'melian'), ('hithlum', 'thingol')]
[('hithlum', 'mordor'), ('thingol', 'morgoth'), ('melian', 'eol')]
[('morgoth', 'eol'), ('mordor', 'thingol'), ('melian', 'beleriand')]
[('melian', 'morgoth'), ('mordor', 'eol'), ('thingol', 'hithlum')]
[('mordor', 'morgoth'), ('hithlum', 'thingol'), ('eol', 'melian')]
I am trying to find all occurrences of sub-strings in a main string (of all lengths). My function takes one string and then returns a dictionary of every sub-string (which occurs more than once, of course) and how many times it occurs (format of the dictionary: {substring: # of occurrences, ...}). I am using collections.Counter(s) to help me with it.
Here is my function:
from collections import Counter
def patternFind(s):
patterns = {}
for index in range(1, len(s)+1)[::-1]:
d = nChunks(s, step=index)
parts = dict(Counter(d))
patterns.update({elem: parts[elem] for elem in parts.keys() if parts[elem] > 1})
return patterns
def nChunks(iterable, start=0, step=1):
return [iterable[i:i+step] for i in range(start, len(iterable), step)]
I have a string, data with about 2500 random letters (in a random order). However, there are 2 strings inserted into it (random points). Say this string is 'TEST'. data.count('TEST') returns 2. However, patternFind(data)['TEST'] gives me a KeyError. Therefore, my program does not detect the two strings in it.
What have I done wrong? Thanks!
Edit: My method of creating testing-instances:
def createNewTest():
n = randint(500, 2500)
x, y = randint(500, n), randint(500, n)
s = ''
for i in range(n):
s += choice(uppercase)
if i == x or i == y: s += "TEST"
return s
Using Regular Expressions
Apart from the count() method you described, regex is an obvious alternative
import re
needle = r'TEST'
haystack = 'khjkzahklahjTESTkahklaghTESTjklajhkhzkhjkzahklahjTESTkahklagh'
pattern = re.compile(needle)
print len(re.findall(pattern, haystack))
Short Cut
If you need to build a dictionary of substrings, possibly you can do this with only subset of those strings. Assuming you know the needle you are looking for in the data then you only need the dictionary of substrings of data that are the same length of needle. This is very fast.
from collections import Counter
needle = "TEST"
def gen_sub(s, len_chunk):
for start in range(0, len(s)-len_chunk+1):
yield s[start:start+len_chunk]
data = 'khjkzahklahjTESTkahklaghTESTjklajhkhzkhjkzahklahjTESTkahklaghTESz'
parts = Counter([sub for sub in gen_sub(data, len(needle))])
print parts[needle]
Brute Force: building dictionary of all substrings
If you need to have a count of all possible substrings, this works but it is very slow:
from collections import Counter
def gen_sub(s):
for start in range(0, len(s)):
for end in range(start+1, len(s)+1):
yield s[start:end]
data = 'khjkzahklahjTESTkahklaghTESTjklajhkhz'
parts = Counter([sub for sub in gen_sub(data)])
print parts['TEST']
Substring generator adapted from this: https://stackoverflow.com/a/8305463/1290420
While jurgenreza has explained why your program didn't work, the solution is still quite slow. If you only examine substrings s for which you know that s[:-1] repeats, you get a much faster solution (typically a hundred times faster and more):
from collections import defaultdict
def pfind(prefix, sequences):
collector = defaultdict(list)
for sequence in sequences:
collector[sequence[0]].append(sequence)
for item, matching_sequences in collector.items():
if len(matching_sequences) >= 2:
new_prefix = prefix + item
yield (new_prefix, len(matching_sequences))
for r in pfind(new_prefix, [sequence[1:] for sequence in matching_sequences]):
yield r
def find_repeated_substrings(s):
s0 = s + " "
return pfind("", [s0[i:] for i in range(len(s))])
If you want a dict, you call it like this:
result = dict(find_repeated_substrings(s))
On my machine, for a run with 2247 elements, it took 0.02 sec, while the original (corrected) solution took 12.72 sec.
(Note that this is a rather naive implementation; using indexes of instead of substrings should be even faster.)
Edit: The following variant works with other sequence types (not only strings). Also, it doesn't need a sentinel element.
from collections import defaultdict
def pfind(s, length, ends):
collector = defaultdict(list)
if ends[-1] >= len(s):
del ends[-1]
for end in ends:
if end < len(s):
collector[s[end]].append(end)
for key, matching_ends in collector.items():
if len(matching_ends) >= 2:
end = matching_ends[0]
yield (s[end - length: end + 1], len(matching_ends))
for r in pfind(s, length + 1, [end + 1 for end in matching_ends if end < len(s)]):
yield r
def find_repeated_substrings(s):
return pfind(s, 0, list(range(len(s))))
This still has the problem that very long substrings will exceed recursion depth. You might want to catch the exception.
The problem is in your nChunks function. It does not give you all the chunks that are necessary.
Let's consider a test string:
s='1test2345test'
For the chunks of size 4 your nChunks function gives this output:
>>>nChunks(s, step=4)
['1tes', 't234', '5tes', 't']
But what you really want is:
>>>def nChunks(iterable, start=0, step=1):
return [iterable[i:i+step] for i in range(len(iterable)-step+1)]
>>>nChunks(s, step=4)
['1tes', 'test', 'est2', 'st23', 't234', '2345', '345t', '45te', '5tes', 'test']
You can see that this way there are two 'test' chunks and your patternFind(s) will work like a charm:
>>> patternFind(s)
{'tes': 2, 'st': 2, 'te': 2, 'e': 2, 't': 4, 'es': 2, 'est': 2, 'test': 2, 's': 2}
here you can find a solution that uses a recursive wrapper around string.find() that searches all the occurences of a substring in a main string.
The collectallchuncks() function returns a defaultdict whith all the substrings as keys and for each substring a list of all the indexes where the substring is found in the main string.
import collections
# Minimum substring size, may be 1
MINSIZE = 3
# Recursive wrapper
def recfind(p, data, pos, acc):
res = data.find(p, pos)
if res == -1:
return acc
else:
acc.append(res)
return recfind(p, data, res+1, acc)
def collectallchuncks(data):
res = collections.defaultdict(str)
size = len(data)
for base in xrange(size):
for seg in xrange(MINSIZE, size-base+1):
chunk = data[base:base+seg]
if data.count(chunk) > 1:
res[chunk] = recfind(chunk, data, 0, [])
return res
if __name__ == "__main__":
data = 'khjkzahklahjTESTkahklaghTESTjklajhkhzkhjkzahklahjTESTkahklaghTESz'
allchuncks = collectallchuncks(data)
print 'TEST', allchuncks['TEST']
print 'hklag', allchuncks['hklag']
EDIT: If you just need the number of occurrences of each substring in the main string you can easily obtain it getting rid of the recursive function:
import collections
MINSIZE = 3
def collectallchuncks2(data):
res = collections.defaultdict(str)
size = len(data)
for base in xrange(size):
for seg in xrange(MINSIZE, size-base+1):
chunk = data[base:base+seg]
cnt = data.count(chunk)
if cnt > 1:
res[chunk] = cnt
return res
if __name__ == "__main__":
data = 'khjkzahklahjTESTkahklaghTESTjklajhkhzkhjkzahklahjTESTkahklaghTESz'
allchuncks = collectallchuncks2(data)
print 'TEST', allchuncks['TEST']
print 'hklag', allchuncks['hklag']