I'm constructing dictionaries lists in Python, with aim to iterate through each one individually and concatenate all different combinations. There are three different components to each "product" in a dictionary list:
1) a letter, e.g. 'A' (first part of the product code, unique to each product entry). Let's say the range here is:
['A', 'B', 'C']
2) a letter and number, e.g. 'S2' (2nd part, has several variations...might be 'D3' or 'E5' instead)
3) a period ('.') and a letter, e.g. '.X' (3rd part, unique to each product entry). Let's say the range here is:
['.X', '.Y', '.Z']
Since the 2nd part listed above has the most variations, my starting assumption is to construct dicts lists with the 1st and 3rd parts together, in order to reduce the number of different listsdicts, since they are uniquely paired, e.g. 'A.Z'...but, I would still need to split each entry then insert the 2nd part, between them, via some 'concatenate' command. So, question is: if I have another dict list with all variations of the 2nd part, what function(s) should I use to construct all variants of a product?
The total combination of examples:
ListOne = ['A', 'B', 'C']
ListTwo = ['D3', 'D4', 'D5', 'E3', 'E4', 'E5']
ListThr = ['.X', '.Y', '.Z']
I need to create new dicts lists as concatenations of all three dicts lists, e.g. 'AD3.X', but there are no variants for ListOne vs ListThr, it will always be 'A' matched to '.X' or 'B' and 'C' matched to '.Y'...ListTwo products concatenated between ListOne and ListThr products will need to be iterated so all possible combinations are output as a new dict list e.g.
ListOneNew = ['AD3.X', 'AD4.X', AD5.X', 'AE3.X', 'AE4.X', 'AE5.X']
ListTwoNew = ['BD3.Y', 'BD4.Y', 'BD5.Y', <and so on...>]
For simplicity's sake, should the script have a merged version of ListOne and ListThr e.g.
List = ['A.X', 'B.X', 'C.Z']
and then split & concatenate with ListTwo products, or just have three Lists and concatenate from there?
from itertools import product
result = [a[0] + a[1] + a[2] for a in list(product(DictOne, DictTwo, DictThr))]
With list comprehension
final = sorted([a+b+c for c in DictThr for b in DictTwo for a in DictOne])
Related
I have my data in the form of a list of lists of lists; each item in the main list is a pair of quartets of strings like [['compasso', 'mail', 'coscia', 'nuotata'], ['braciola', 'pianto', 'violino', 'bevuta']], so that the whole list looks like this:
[[['compasso', 'mail', 'coscia', 'nuotata'],
['braciola', 'pianto', 'violino', 'bevuta']],
[['compasso', 'pianto', 'racchetta', 'chiamata'],
['cornetto', 'fumata', 'femore', 'serenata']],
[['fagiolo', 'frustata', 'racchetta', 'sussurro'],
['foglia', 'urlo', 'cappello', 'complimento']],
...
[['kit-kat', 'trailer', 'piffero', 'ovazione'],
['gessetto', 'esplosione', 'ocarina', 'complimento']]]
I have to find a way to build all the possible lists of lists of lists where no string appears more than one time, that is, where no string is repeated. Or another acceptable outcome would be to find the biggest list of lists of lists that meets this constraint. To be clear, what I need is an output that is a list of lists of lists as the input, but where no string is repeated at all; every string should appear only once in the whole nested list in output. Moreover, the output list should keep the same structure of the input list, that is, it should have four items in each inner list. For now, what I did was:
items = set()
unique = []
for octect in octects:
quartet1, quartet2 = octect
if set(quartet1+quartet2).isdisjoint(items):
unique.append(octect)
for word in quartet1+quartet2:
items.add(word)
Where octects is my original list.
Although this solution returns a list where no string is repeated, it is not the biggest combination of the original list's items, and of course it gives me only this alternative.
Another option that came to my mind was to iterate over all the itertools.combinations(octects, n) and check whether the constraint is met, but this would not be very efficient computationally and I would have to decide a priori the number of items in the combination, so not ideal. However, the output I am looking for is of the kind that I would obtain with itertools.combinations: a combination of the original pairs of quartets, where no string is repeated. So a way to obtain that would be:
stimuli = []
for comb in combinations(octects, 40):
merged = set(chain.from_iterable(chain.from_iterable(comb))) # flatten nested list
if len(merged) == 320: # if len = 320 (8X40), it means that there are no duplicates
stimuli.append(comb)
print(comb)
But (a) it would not be computationally efficient and (b) I would be specifying the number of items, whitout any way to maximize it.
General example of input
input = [[['a','b','c','d'],
['e','f','g','h']], # (1) shares 'a' with 2
[['a','i','q','r'],
['s','t','u','v']], # (2) shares 'a' with 1, 'i' with 3, 'q', 'r', 's', 't' with 4
[['i','j','k','l'],
['m','n','o','p']], # (3) shares 'i' with 3
[['q','r','s','t'],
['u','v','w','x']]] # (4) shares 'q', 'r', 's', 't' with 2
Output
The biggest list of lists of lists that I can obtain from such input is this, without (2), that shares items with most lists.
output = [[['a','b','c','d'],
['e','f','g','h']],
[['i','j','k','l'],
['m','n','o','p']],
[['q','r','s','t'],
['u','v','w','x']]]
If I was simply iterating as in my first solution, I would have excluded (3) and (4).
Thanks for your help!
Is this what you are looking for?
lll = [[['compasso', 'mail', 'coscia', 'nuotata'],
['braciola', 'pianto', 'violino', 'bevuta']],
[['compasso', 'pianto', 'racchetta', 'chiamata'],
['cornetto', 'fumata', 'femore', 'serenata']],
[['fagiolo', 'frustata', 'racchetta', 'sussurro'],
['foglia', 'urlo', 'cappello', 'complimento']]]
import itertools
items = set()
for i, ll in enumerate(lll):
combined_set = set(itertools.chain(*ll))
if combined_set & items:
lll.pop(i)
items.update(combined_set)
The inplace deduped original list doesn't contain secondary lists with items that appeared before:
print(lll)
[[['compasso', 'mail', 'coscia', 'nuotata'],
['braciola', 'pianto', 'violino', 'bevuta']],
[['fagiolo', 'frustata', 'racchetta', 'sussurro'],
['foglia', 'urlo', 'cappello', 'complimento']]]
Let's suppose to have a list of strings, named strings, in Python and to execute this line:
lengths = [ len(value) for value in strings ]
Is the strings list order kept? I mean, can I be sure that lengths[i] corresponds to strings[i]?
I've tryed many times and it works but I'm not sure if my experiments were special cases or the rule.
Thanks in advance
For lists, yes. That is one of the fundamental properties of lists: that they're ordered.
It should be noted though that what you're doing though is known as "parallel arrays" (having several "arrays" to maintain a linked state), and is often considered to be poor practice. If you change one list, you must change the other in the same way, or they'll be out of sync, and then you have real problems.
A dictionary would likely be the better option here:
lengths_dict = {value:len(value) for value in strings}
print(lengths_dict["some_word"]) # Prints its length
Or maybe if you want lookups by index, a list of tuples:
lengths = [(value, len(value)) for value in strings]
word, length = lengths[1]
Yes, since list in python are sequences you can be sure that each length that you have in the list of the length is corresponding to the string length in the same index.
like the following code represents
a = ['a', 'ab', 'abc', 'abcd']
print([len(i) for i in a])
Output
[1, 2, 3, 4]
I have a list in python called "multiple_ids" with a bunch of ids and I have another list called "ids_singular" as well as another list called "alias".
"ids_singular" and "alias" are both the same size and the index of "ids_singular" corresponds to the index of "alias". What this means is that say the third value in the "alias" list is another way to represent the third value in "ids_singular" .
The list "miltiple_ids" is larger than the other two lists and includes all values in "ids_singular', but there are duplicates as well. Every id in "mutiple_ids" can be found in "ids_singular".
What I am looking to do is for code that will replace each item (id) in "multiple_ids" with the matching alias from the "alias" list based on the "ids_singular" list.
I have tried a double for loop where I first iterate through all the "multiple_ids", then iterate through all the "ids_singular" and if they are a match , create a new list that has the alias for the id based on the same index of "alias" list.
for i in (multiple_ids):
for j in range(len(ids_singular)):
if i==ids_singular[j]:
new_multiple_ids.append(alias[j])
print(new_multiple_ids)
When I run this code, nothing happens
I believe this is what you want:
multiple_ids = ['abc', 'def', 'xyz', 'def', 'xyz']
ids_singular = ['abc','def','xyz']
alias = ['a_abc','a_def', 'a_xyz']
d = dict(zip(ids_singular, alias))
result = [d[item] for item in multiple_ids]
print(result) $ -> ['a_abc', 'a_def', 'a_xyz', 'a_def', 'a_xyz']
I am working in Python (2.7.9) and am trying to filter a list of tuples by a list of elements of those tuples. In particular, my objects have the following form:
tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])]
filter = ['a', 'c']
I am new to Python and the easiest way to filter the tuples that I could discover was with the following list comprehension:
tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]
The resulting filtered list looks like:
tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]
Unfortunately, this list comprehension seems to be very inefficient. I suspect this is because my list of tuples is much larger than my filter, the list of strings. In particular, the filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.
The first elements of the 2-tuples are largely distinct, but there are a few instances of duplicate first elements (not sure how many, actually, but by comparison to the cardinality of the list it's not many).
My question: Is there a more efficient way to filter a list of tuples by a list of elements of those tuples?
(Apologies if this is off-topic or a dupe.)
Related question (which does not mention efficiency):
Filter a list of lists of tuples
In a comment you write:
The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.
in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.
Use a set instead:
filter_set = set(filter)
Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.
A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:
tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]
I'm new to Python and am still trying to tear myself away from C++ coding techniques while in Python, so please forgive me if this is a trivial question. I can't seem to find the most Pythonic way of doing this.
I have two lists of dicts. The individual dicts in both lists may contain nested dicts. (It's actually some Yelp data, if you're curious.) The first list of dicts contains entries like this:
{business_id': 'JwUE5GmEO-sH1FuwJgKBlQ',
'categories': ['Restaurants'],
'type': 'business'
...}
The second list of dicts contains entries like this:
{'business_id': 'vcNAWiLM4dR7D2nwwJ7nCA',
'date': '2010-03-22',
'review_id': 'RF6UnRTtG7tWMcrO2GEoAg',
'stars': 2,
'text': "This is a basic review",
...}
What I would like to do is extract all the entries in the second list that match specific categories in the first list. For example, if I'm interested in restaurants, I only want the entires in the second list where the business_id matches the business_id in the first list and the word Restaurants appears in the list of values for categories.
If I had these two lists as tables in SQL, I'd do a join on the business_id attribute then just a simple filter to get the rows I want (where Restaurants IN categories, or something similar).
These two lists are extremely large, so I'm running into both efficiency and memory space issues. Before I go and shove all of this into a SQL database, can anyone give me some pointers? I've messed around with Pandas some, so I do have some limited experience with that. I was having trouble with the merge process.
Suppose your lists are called l1 and l2:
All elements from l1:
[each for each in l1]
All elements from l1 with the Restaurant category:
[each for each in l1
if 'Restaurants' in each['categories']]
All elements from l2 matching id with elements from l1 with the Restaurant category:
[x for each in l1 for x in l2
if 'Restaurants' in each['categories']
and x['business_id'] == each['business_id'] ]
Let's define sample lists of dictionaries:
first = [
{'business_id':100, 'categories':['Restaurants']},
{'business_id':101, 'categories':['Printer']},
{'business_id':102, 'categories':['Restaurants']},
]
second = [
{'business_id':100, 'stars':5},
{'business_id':101, 'stars':4},
{'business_id':102, 'stars':3},
]
We can extract the items of interest in two steps. The first step is to collect the list of business ids that belong to restaurants:
ids = [d['business_id'] for d in first if 'Restaurants' in d['categories']]
The second step is to get the dicts that correspond to those ids:
[d for d in second if d['business_id'] in ids]
This results in:
[{'business_id': 100, 'stars': 5}, {'business_id': 102, 'stars': 3}]
Python programmers like using list comprehensions as a way to do both their logic and their design.
List comprehensions lead to terser and more compact expression. You're right to think of it quite a lot like a query language.
x = [comparison(a, b) for (a, b) in zip(A, B)]
x = [comparison(a, b) for (a, b) in itertools.product(A, B)]
x = [comparison(a, b) for a in A for b in B if test(a, b)]
x = [comparison(a, b) for (a, b) in X for X in Y if test(a, b, X)]
...are all patterns that I use.
This is pretty tricky, and I had fun with it. This is what I'd do:
def match_fields(business, review):
return business['business_id'] == review['business_id'] and 'Restaurants' in business['categories']
def search_businesses(review):
# the lambda binds the given review as an argument to match_fields
return any(lambda business: match_fields(business, review), business_list)
answer = filter(search_businesses, review_list)
This is the most readable way I found. I'm not terribly fond of list comprehensions that go past one line, and three lines is really pushing it. If you want this to look more terse, just use shorter variable names. I favor long ones for clarity.
I defined a function that returns true if an entry can be matched between lists, and a second function that helps me search through the review list. I then can say: get rid of any review that doesn't have a matching entry in the businesses list. This pattern works well with arbitrary checks between lists.
As a variation to the list comprehension only approaches, it may be more efficient to use a set and generator comprehension. This is especially true if the size of your first list is very large or if the total number of restaurants is very large.
restaurant_ids = set(biz for biz in first if 'Restaurants' in biz['categories'])
restaurant_data = [rest for rest in second if rest['id'] in restaurant_ids]
Note the brute force list comprehension approach is O(len(first)*len(second)), but it uses no additional memory storage whereas this approach is O(len(first)+len(second)) and uses O(number_of_restaurants) extra memory for the set.
You could do:
restaurant_ids = [biz['id'] for biz in list1 if 'Restaurants' in biz['categories']]
restaurant_data = [rest for rest in list2 if rest['id'] in restaurant_ids]
Then restaurant_data would contain all of the dictionaries from list2 that contain restaurant data.