I have created a dictionary d below and am looking for a dictionary with the key ('a','b','c') to have values 'd' and 'e'.
test = ['a','b','c','d','a','b','c','e','p','q','r','s']
test2= tuple(test)
d = {test2[i:i+3]:test2[i+3] for i in range(0,len(test2)-3,1)}
print(d)
The output is:
{('a', 'b', 'c'): 'e', ('b', 'c', 'd'): 'a', ('c', 'd', 'a'): 'b', ('d', 'a', 'b'): 'c', ('b', 'c', 'e'): 'p', ('c', 'e', 'p'): 'q', ('e', 'p', 'q'): 'r', ('p', 'q', 'r'): 's'}
The intended output is:
{('a', 'b', 'c'): ('d','e'), ('b', 'c', 'd'): 'a', ('c', 'd', 'a'): 'b', ('d', 'a', 'b'): 'c', ('b', 'c', 'e'): 'p', ('c', 'e', 'p'): 'q', ('e', 'p', 'q'): 'r', ('p', 'q', 'r'): 's'}
Question: Looking at the first comment, the key takes its most recent value e and so now I'm trying to change the code to achieve the desired output? Thanks.
Option 1:
This create d using defaultdict (d = defaultdict(list)).
It loops through the data in a for loop. Multiple values are appended into a list
from collections import defaultdict
d = defaultdict(list) # create list by default
for i in range(0,len(test2)-3):
d[test2[i:i+3]].append(test2[i+3]) # append value into dictionary entry
# which is a list
# since we used d = defaultdict(list)
Option 2: Similar in form to option 1, but uses normal dictionary with setdefault to have key entries be lists
d = {}
for i in range(0,len(test2)-3):
d.setdefault(test2[i:i+3], []).append(test2[i+3])
Both Options Have the Same Output
defaultdict(<class 'list'>,
{ ('a', 'b', 'c'): ['d', 'e'],
('b', 'c', 'd'): ['a'],
('b', 'c', 'e'): ['p'],
('c', 'd', 'a'): ['b'],
('c', 'e', 'p'): ['q'],
('d', 'a', 'b'): ['c'],
('e', 'p', 'q'): ['r'],
('p', 'q', 'r'): ['s']})
Heres a another solution you can try out as well:
from collections import defaultdict
test = ['a','b','c','d','a','b','c','e','p','q','r','s']
d = defaultdict(list)
while len(test) >= 4:
*key, value = test[:4] # key -> a, b, c value -> d
d[tuple(key)].append(value)
test = test[1:]
print(d)
# defaultdict(<class 'list'>, {('a', 'b', 'c'): ['d', 'e'], ('b', 'c', 'd'): ['a'], ('c', 'd', 'a'): ['b'], ('d', 'a', 'b'): ['c'], ('b', 'c', 'e'): ['p'], ('c', 'e', 'p'): ['q'], ('e', 'p', 'q'): ['r'], ('p', 'q', 'r'): ['s']})
The basic idea is that it keeps shrinking the list test until less than 4 items are left, and groups the first three items (a, b, c) into a tuple, then the last character d as a value.
I have list with urls for crawling.['http://domain1.com','http://domain1.com/page1','http://domain2.com']
Code:
prev_domain = ''
while urls:
url = urls.pop()
if base_url(url) == prev_domain: # base_url is custom function return domain of an url
urls.append(url) # is this is possible?
continue
else:
crawl(url)
Basically I dont want to crawl webpages of same domain continuously. Continuosly crawling a domain url, return http response status code with 429: Too Many Requests. The user has sent too many requests in a given amount of time ("rate limiting"). To by-pass this issue, I'm planning to go with below logic.
Loop through all items in the list and compare current element base url with previously processed element base url.
If base urls are different then process for next step, otherwise do not process current element, just append this element to the same list.
Note : If urls in list are of same domain, make delay in processing each element and then execute.
Please provide your thoughts.
Your algorithm is almost correct, but not the implementation:
>>> L = [1,2,3]
>>> L.pop()
3
>>> L.append(3)
>>> L
[1, 2, 3]
That's why your program loops forever: if the domain is the same as the previous domain, you just append then pop then append, then.... What you need is not a stack, it's a round robin:
>>> L.pop()
3
>>> L.insert(0, 3)
>>> L
[3, 1, 2]
Let's take a shuffled list of permutations of "abcd":
>>> L = [('b', 'c', 'd', 'a'), ('d', 'c', 'b', 'a'), ('a', 'c', 'd', 'b'), ('c', 'd', 'a', 'b'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('a', 'b', 'd', 'c'), ('d', 'a', 'b', 'c'), ('a', 'b', 'c', 'd'), ('d', 'c', 'a', 'b'), ('a', 'd', 'c', 'b'), ('d', 'a', 'c', 'b'), ('c', 'd', 'b', 'a'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c'), ('a', 'd', 'b', 'c'), ('b', 'd', 'c', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd'), ('b', 'a', 'c', 'd')]
The first letter is the domain. Here's a slightly modified version of your code:
>>> prev = None
>>> while L:
... e = L.pop()
... if L and e[0] == prev:
... L.insert(0, e)
... else:
... print(e)
... prev = e[0]
('b', 'a', 'c', 'd')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('a', 'd', 'b', 'c')
('d', 'b', 'a', 'c')
('c', 'd', 'b', 'a')
('d', 'a', 'c', 'b')
('a', 'd', 'c', 'b')
('d', 'c', 'a', 'b')
('a', 'b', 'c', 'd')
('d', 'a', 'b', 'c')
('a', 'b', 'd', 'c')
('b', 'c', 'a', 'd')
('c', 'd', 'a', 'b')
('a', 'c', 'd', 'b')
('d', 'c', 'b', 'a')
('b', 'c', 'd', 'a')
('c', 'b', 'd', 'a')
('d', 'b', 'c', 'a')
('b', 'a', 'd', 'c')
('b', 'd', 'a', 'c')
The modification is: if L and, because if the last element of the list domain is prev, then you'll loop forever with your one element list: pop, same as prev, insert, pop, ...(as with pop/append)
Here's another option: create a dict domain -> list of urls:
>>> d = {}
>>> for e in L:
... d.setdefault(e[0], []).append(e)
>>> d
{'b': [('b', 'c', 'd', 'a'), ('b', 'd', 'a', 'c'), ('b', 'a', 'd', 'c'), ('b', 'c', 'a', 'd'), ('b', 'd', 'c', 'a'), ('b', 'a', 'c', 'd')], 'd': [('d', 'c', 'b', 'a'), ('d', 'a', 'b', 'c'), ('d', 'c', 'a', 'b'), ('d', 'a', 'c', 'b'), ('d', 'b', 'c', 'a'), ('d', 'b', 'a', 'c')], 'a': [('a', 'c', 'd', 'b'), ('a', 'b', 'd', 'c'), ('a', 'b', 'c', 'd'), ('a', 'd', 'c', 'b'), ('a', 'd', 'b', 'c')], 'c': [('c', 'd', 'a', 'b'), ('c', 'd', 'b', 'a'), ('c', 'b', 'd', 'a'), ('c', 'a', 'b', 'd')]}
Now, take an element of every domain and clear the dict, then loop until the dict is empty:
>>> while d:
... for k, vs in d.items():
... e = vs.pop()
... print (e)
... d = {k: vs for k, vs in d.items() if vs} # clear the dict
...
('b', 'a', 'c', 'd')
('d', 'b', 'a', 'c')
('a', 'd', 'b', 'c')
('c', 'a', 'b', 'd')
('b', 'd', 'c', 'a')
('d', 'b', 'c', 'a')
('a', 'd', 'c', 'b')
('c', 'b', 'd', 'a')
('b', 'c', 'a', 'd')
('d', 'a', 'c', 'b')
('a', 'b', 'c', 'd')
('c', 'd', 'b', 'a')
('b', 'a', 'd', 'c')
('d', 'c', 'a', 'b')
('a', 'b', 'd', 'c')
('c', 'd', 'a', 'b')
('b', 'd', 'a', 'c')
('d', 'a', 'b', 'c')
('a', 'c', 'd', 'b')
('b', 'c', 'd', 'a')
('d', 'c', 'b', 'a')
The output is more uniform.
Check the following code snippet,
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:url})
crawl(url)
crawl() will be called only for unique domain.
Or you can use:
urls = ['http://domain1.com','http://domain1.com/page1','http://domain2.com']
crawl_for_urls = {}
for url in urls:
domain = base_url(url)
if domain not in crowl_for_urls:
crawl_for_urls.update({domain:[url]})
crawl(url)
else:
crawl_for_urls.get(domain, []).append(url)
This way you can categories the URL's based on domain and also can use crawl() for unique domain.
Given a dict of vocabulary: {'A': 3, 'B': 4, 'C': 5, 'AB':6} and a sentence, which should be segmented: ABCAB.
I need to create all possible combinations of this sentence such as
[['A', 'B', 'C', 'A', 'B'], ['A', 'B', 'C', 'AB'], ['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B']]
That's what I have:
def find_words(sentence):
for i in range(len(sentence)):
for word_length in range(1, max_word_length + 1):
word = sentence[i:i+word_length]
print(word)
if word not in test_dict:
continue
if i + word_length <= len(sentence):
if word.startswith(sentence[0]) and word not in words and word not in ''.join(words):
words.append(word)
else:
continue
next_position = i + word_length
if next_position >= len(sentence):
continue
else:
find_ngrams(sentence[next_position:])
return words
But it returns me only one list.
I was also looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.
Try all possible prefixes and recursively do the same for the rest of the sentence.
VOC = {'A', 'B', 'C', 'AB'} # could be a dict
def parse(snt):
if snt == '':
yield []
for w in VOC:
if snt.startswith(w):
for rest in parse(snt[len(w):]):
yield [w] + rest
print(list(parse('ABCAB')))
# [['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B'],
# ['A', 'B', 'C', 'AB'], ['A', 'B', 'C', 'A', 'B']]
Although not the most efficient solution, this should work:
from itertools import product
dic = {'A': 3, 'B': 4, 'C': 5, 'AB': 6}
choices = list(dic.keys())
prod = []
for a in range(1, len(choices)+2):
prod = prod + list(product(choices, repeat=a))
result = list(filter(lambda x: ''.join(x) == ''.join(choices), prod))
print(result)
# prints [('AB', 'C', 'AB'), ('A', 'B', 'C', 'AB'), ('AB', 'C', 'A', 'B'), ('A', 'B', 'C', 'A', 'B')]
Use itertools permutations to give all unique combinations.
d ={'A': 3, 'B': 4, 'C': 5, 'AB':6}
l = [k for k, v in d.items()]
print(list(itertools.permutations(l)))
[('A', 'B', 'C', 'AB'), ('A', 'B', 'AB', 'C'), ('A', 'C', 'B', 'AB'), ('A', 'C', 'AB', 'B'), ('A', 'AB', 'B', 'C'), ('A', 'AB', 'C', 'B'), ('B', 'A', 'C', 'AB'), ('B', 'A', 'AB', 'C'), ('B', 'C', 'A', 'AB'), ('B', 'C', 'AB', 'A'), ('B', 'AB', 'A', 'C'), ('B', 'AB', 'C', 'A'), ('C', 'A', 'B', 'AB'), ('C', 'A', 'AB', 'B'), ('C', 'B', 'A', 'AB'), ('C', 'B', 'AB', 'A'), ('C', 'AB', 'A', 'B'), ('C', 'AB', 'B', 'A'), ('AB', 'A', 'B', 'C'), ('AB', 'A', 'C', 'B'), ('AB', 'B', 'A', 'C'), ('AB', 'B', 'C', 'A'), ('AB', 'C', 'A', 'B'), ('AB', 'C', 'B', 'A')]
I have this code:
number = 2
size = 5
list_b = [("b","b","b")]
list_a = [("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a")]
for i in range(number):
list_a.insert(size,list_b)
print list_a
it gives me this:
[('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a')]
basically, it inserts 2 times the list_b in the position defined by size
I want a loop that repeats itself so that list_b is inserted the number of times defined in number but repeats size times. It difficult to explain, so here is the result that I want:
[('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('a', 'a', 'a'),
('b', 'b', 'b'),
('b', 'b', 'b'),...and so on]
EDIT
and if I had this:
list_a = [a, ] * 15
list_b = [b,]
s = 5
n = 2
I want to obtain this:
[b,b,a,a,a,a,a,b,b,b,b,a,a,a,a,a,b,b,b,b,a,a,a,a,a,b,b]
since this is an example and list_a, s and n will vary, how can I do this in one or two loops?
Thanks,
Favolas
For the sake of the argument, I'll call the ('a', 'a', 'a') => a and ('b', 'b', 'b') => b.
number=2
size=5
list_a=["a"]*20
list_b=["b"]
workfor=len(list_a)+(len(list_a)/size)*number*len(list_b)
i=0
while i<workfor:
i+=size
for times in range(number):
for elem in list_b:
list_a.insert(i,elem)
i+=len(list_b)
print list_a
Results in =>
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'a', 'b', 'b']
#!/usr/bin/python
number = 2
size = 5
list_b = [("b","b","b")]
list_a = [("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a"),("a","a","a")]
if __name__ == '__main__':
insertion_count = len(list_a) / size
for j in xrange(insertion_count):
# compute insertion position
pos = (j+1)*size + j * number
for i in range(number):
list_a.insert(pos,list_b)
print list_a
from itertools import chain, izip, repeat
list_a = [('a', 'a', 'a')] * 15
list_b = [('b', 'b', 'b')]
a5b2s = [iter(list_a)] * 5 + [repeat(*list_b)] * 2
list_a[:] = chain.from_iterable(izip(*a5b2s))
>>> s,n=5,2
>>> a=[1,]*17
>>> b=2
>>> for i in range(len(a)//s*s,0,-s):
for j in range(n):
a.insert(i,b)
>>> a
[1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1]