Python: Ignore repetition in a loop - python

Let's say I have a function like this, to merge names in two lists:
def merge_name(list1="John",list2="Doe"):
def merge(name1=list1,name2=list2):
merge=name1+"-"+name2
data={"Merged":merge}
return data
d = pd.DataFrame()
for i,j in [(i,j) for i in list1 for j in list2]:
if i==j:
d=d
else:
x = merge(name1=i,name2=j)
ans=pd.DataFrame({"Merged":[x["Merged"]]})
d=pd.concat([d,ans])
return d
What I am interested in are unique combinations, i.e, "John-Doe" and "Doe-John" are the same for my purposes. So if I run something like this:
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
df=merge_name(list1=names1,list2=names2)
I will get:
John-Doe
John-Richard
John-Joana
Doe-John
Doe-Richard
Doe-Joana
Richard-John
Richard-Doe
Richard-Joana
The groups in bold are all repetitions. Essentially, every time it comes to the next i, it creates n-1 repeated groups, with n being the position in names1. Is there a way to avoid this, like drop the top name in "list2" every time j becomes the last element in the list?
Thanks in advance.
I have tried to update list2 while in loop but obviously that does not work

Below code can be useful
import pandas as pd
def merge_name(list1="John", list2="Doe"):
merged=[]
for i in list1:
for j in list2:
if (i!=j) and (f"{j} - {i}" not in merged):
merged.append(f"{i} - {j}")
df = pd.DataFrame(set(merged))
return df
names1 = ["John", "Doe", "Richard"]
names2 = ["John", "Doe", "Richard", "Joana"]
df = merge_name(list1=names1, list2=names2)
print(df)

Below is my solution with some explanations:
def combineName(listName):
res = []
for i in range(len(listName)):
for j in range(i+1, len(listName)):
res.append(listName[i] + "-" + listName[j])
return res
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
listName = list(set(names1 + names2))
print(listName)
print(combineName(listName))
First, you should create a simple list without repetitions. This way you only get unique elements in your list. To do this, I used a set.
I take care to transform my set into a list because later I go through the structure in a given order, which is not supposed to be true for a set.
Secondly, the function creates all the combinations. There are two loops, and you notice that the second loop has a special range.
Indeed, you do not want repetitions such as "John-Doe" and "Doe-John".
Each combination is created at a unique time!

Related

Create words by combining all elements in a set of lists with each other

I have 6 different lists, which I need to combine and create sequences/words for all possible combinations.
These are my lists:
product = ['SOC', 'SOCCAMP', 'ETI', 'CARDASS', 'VRS', 'PRS', 'INT', 'GRS', 'VeloVe', 'HOME']
fam = ['IND', 'FAM']
area = ['EUR', 'WOR']
type = ['STD', 'PLU']
assist = ['MOT', 'NMOT']
All of the elements in all of the lists need to be combined in words.
The result will be a list of all the elements possible.
I will have among all of the elements f.e. the following words:
('SOC', 'SOCIND', 'SOCFAM', 'SOCFAMMOT', 'SOCMOTFAM'...)
I thus combine each element of a precise list with all the elements of the other lists.
Up to now I managed to combine them with a series of loops:
combos = []
##1##
for i in range(len(product)):
combos.append(str(product[i]))
##2##
for a in range(len(product)):
for b in range(len(fam)):
combos.append(str(product[a]) + str(fam[b]))
##3##
for a in range(len(product)):
for b in range(len(fam)):
for c in range(len(area)):
combos.append(str(product[a]) + str(fam[b]) + str(area[c]))
##4##
for a in range(len(product)):
for b in range(len(fam)):
for c in range(len(area)):
for d in range(len(type)):
combos.append(str(product[a]) + str(fam[b]) + str(area[c]) + str(type[d]))
##5##
for a in range(len(product)):
for b in range(len(fam)):
for c in range(len(area)):
for d in range(len(type)):
for e in range(len(assist)):
combos.append(str(product[a]) + str(fam[b]) + str(area[c]) + str(type[d]) + str(assist[e]))
This code manages to combine the words in a list of combinations but solely in the precise order the lists are mentioned:
['SOC', 'SOCCAMP', 'ETI', 'CARDASS', 'VRS', 'PRS', 'INT', 'GRS', 'VeloVe', 'HOME', 'SOCIND', 'SOCFAM', 'SOCCAMPIND', 'SOCCAMPFAM', 'ETIIND', 'ETIFAM', 'CARDASSIND', 'CARDASSFAM', 'VRSIND', 'VRSFAM', 'PRSIND', 'PRSFAM', 'INTIND', 'INTFAM', 'GRSIND', 'GRSFAM', 'VeloVeIND', 'VeloVeFAM', 'HOMEIND', 'HOMEFAM', ...]
So, 'SOCINDEUR' is a combination in this list but 'SOCEURIND' is not.
Is there a smart way to avoid writing down another 100 loops to look for all the possible combinations?
You can make use of various tools provided by itertools.
Overall, one solution is:
source_lists = product, fam, area, type, assist
combos = [
''.join(prod)
for l in range(1, len(source_lists))
for subset in itertools.permutations(source_lists, l)
for prod in itertools.product(*subset)
]
This code is effectively equivalent of:
combos = []
source_lists = product, fam, area, type, assist
for l in range(1, len(source_lists)):
for subset in itertools.permutations(source_lists, l):
for prod in itertools.product(*subset):
combos.append(''.join(prod))
The first loop selects how many different source lists you want to select, so it will first produce the results from single list, i.e. the "original" input. Then it moves to combining two etc.
The second loop select which source lists and in which order we will combine (and goes over all possible permutations).
Finally the third and last loop takes the selected source list and produces all the possible combinations (or, better said "product") of these lists.
Finally this produces the tuples of results, since you want one single string per result, we need to join them.
The magic of itertools!
from itertools import product, permutations
def concat_combinations(sequence):
new_seq = []
for i,j in enumerate(sequence):
if i == 0:
new_str = j
new_seq.append(new_str)
else:
new_str += j
new_seq.append(new_str)
return new_seq
def final_list(*iterables)->list:
s = set()
for seq in list(product(*iterables)):
[s.update(concat_combinations(i)) for i in permutations(seq)]
return sorted(s,key=lambda x: len(x))

How do I use a while loop to access all the 2nd elements of lists which are the values stored in a dictionary?

If I have a dictionary like this, filled with similar lists, how can I apply a while loo tp extract a list that prints that second element:
racoona_valence={}
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"],\}
I need to print the part that says "BOBB7" for 2nd element of the lists in a larger dictionary. There are ten key-value pairs in it, so I am starting it like so, but unsure what to do because all the examples I can find don't relate to my problem:
n=10
gene_list = []
while n>0:
Any help greatly appreciated.
Well, there's a bunch of ways to do it depending on how well-structured your data is.
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"], "rs13283414": ["7:87345874365-839479328749+","BOBB4"]}
output = []
for key in racoona_valence.keys():
output.append(racoona_valence[key][1])
print(output)
other_output = []
for key, value in racoona_valence.items():
other_output.append(value[1])
print(other_output)
list_comprehension = [value[1] for value in racoona_valence.values()]
print(list_comprehension)
n = len(racoona_valence.values())-1
counter = 0
gene_list = []
while counter<=n:
gene_list.append(list(racoona_valence.values())[n][1])
counter += 1
print(gene_list)
Here is a list comprehension that does what you want:
second_element = [x[1] for x in racoona_valence.values()]
Here is a for loop that does what you want:
second_element = []
for value in racoona_valence.values():
second_element.append(value[1])
Here is a while loop that does what you want:
# don't use a while loop to loop over iterables, it's a bad idea
i = 0
second_element = []
dict_values = list(racoona_valence.values())
while i < len(dict_values):
second_element.append(dict_values[i][1])
i += 1
Regardless of which approach you use, you can see the results by doing the following:
for item in second_element:
print(item)
For the example that you gave, this is the output:
BOBB7

How to remove list elements within a loop effectively in python

I have a code as follows.
for item in my_list:
print(item[0])
temp = []
current_index = my_list.index(item)
garbage_list = creategarbageterms(item[0])
for ele in my_list:
if my_list.index(ele) != current_index:
for garbage_word in garbage_list:
if garbage_word in ele:
print("concepts: ", item, ele)
temp.append(ele)
print(temp)
Now, I want to remove the ele from mylist when it gets appended to temp (so, that it won't get processed in the main loop, as it is a garbage word).
I know it is bad to remove elements straightly from the list, when it is in a loop. Thus, I am interested in knowing if there is any efficient way of doing this?
For example, if mylist is as follows;
mylist = [["tim_tam", 879.3000000000001], ["yummy_tim_tam", 315.0], ["pudding", 298.2],
["chocolate_pudding", 218.4], ["biscuits", 178.20000000000002], ["berry_tim_tam", 171.9],
["tiramusu", 158.4], ["ice_cream", 141.6], ["vanilla_ice_cream", 122.39999999999999]]
1st iteration
for the first element tim_tam, I get garbage words such as yummy_tim_tam and berry_tim_tam. So they will get added to my temp list.
Now I want to remove yummy_tim_tam and berry_tim_tam from the list (because they have already added to temp), so that it won't execute from the beginning.
2nd iteration
Now, since yummy_tim_tam is no longer in the list this will execute pudding. For pudding I get a diffrent set of garbage words such as chocolate_pudding, biscuits, tiramu. So, they will get added to temp and will get removed.
3rd iteration
ice_cream will be selected. and the process will go on.
My final objective is to get three separate lists as follows.
["tim_tam", 879.3000000000001], ["yummy_tim_tam", 315.0], ["berry_tim_tam", 171.9] , ["pudding", 298.2]
["chocolate_pudding", 218.4], ["biscuits", 178.20000000000002], ["tiramusu", 158.4]
["ice_cream", 141.6], ["vanilla_ice_cream", 122.39999999999999]
This code produces what you want:
my_list = [['tim_tam', 879.3], ['yummy_tim_tam', 315.0], ['pudding', 298.2],
['chocolate_pudding', 218.4], ['biscuits', 178.2], ['berry_tim_tam', 171.9],
['tiramusu', 158.4], ['ice_cream', 141.6], ['vanilla_ice_cream', 122.39]
]
creategarbageterms = {'tim_tam' : ['tim_tam','yummy_tim_tam', 'berry_tim_tam'],
'pudding': ['pudding', 'chocolate_pudding', 'biscuits', 'tiramusu'],
'ice_cream': ['ice_cream', 'vanilla_ice_cream']}
all_data = {}
temp = []
for idx1, item in enumerate(my_list):
if item[0] in temp: continue
all_data[idx1] = [item]
garbage_list = creategarbageterms[item[0]]
for idx2, ele in enumerate(my_list):
if idx1 != idx2:
for garbage_word in garbage_list:
if garbage_word in ele:
temp.append(ele[0])
all_data[idx1].append(ele)
for item in all_data.values():
print('-', item)
This produces:
- [['tim_tam', 879.3], ['yummy_tim_tam', 315.0], ['berry_tim_tam', 171.9]]
- [['pudding', 298.2], ['chocolate_pudding', 218.4], ['biscuits', 178.2], ['tiramusu', 158.4]]
- [['ice_cream', 141.6], ['vanilla_ice_cream', 122.39]]
Note that for the purpose of the example I created a mock creategarbageterms function (as a dictionary) that produces the term lists as you defined it in your post. Note the use of a defaultdict which allows unlimited number of iterations, that is, unlimited number of final lists produced.
I would propose to do it like this:
mylist = [["tim_tam", 879.3000000000001],
["yummy_tim_tam", 315.0],
["pudding", 298.2],
["chocolate_pudding", 218.4],
["biscuits", 178.20000000000002],
["berry_tim_tam", 171.9],
["tiramusu", 158.4],
["ice_cream", 141.6],
["vanilla_ice_cream", 122.39999999999999]]
d = set() # remembers unique keys, first one in wins
for i in mylist:
shouldAdd = True
for key in d:
if i[0].find(key) != -1: # if this key is part of any key in the set
shouldAdd = False # do not add it
if not d or shouldAdd: # empty set or unique: add to set
d.add(i[0])
myCleanList = [x for x in mylist if x[0] in d] # clean list to use only keys in set
print(myCleanList)
Output:
[['tim_tam', 879.3000000000001],
['pudding', 298.2],
['biscuits', 178.20000000000002],
['tiramusu', 158.4],
['ice_cream', 141.6]]
If the order of things in the list is not important, you could use a dictionary directly - and create a list from the dict.
If you need sublists, create them:
similarThings = [ [x for x in mylist if x[0].find(y) != -1] for y in d]
print(similarThings)
Output:
[
[['tim_tam', 879.3000000000001], ['yummy_tim_tam', 315.0], ['berry_tim_tam', 171.9]],
[['tiramusu', 158.4]],
[['ice_cream', 141.6], ['vanilla_ice_cream', 122.39999999999999]],
[['pudding', 298.2], ['chocolate_pudding', 218.4]],
[['biscuits', 178.20000000000002]]
]
As #joaquin pointed out in the comment, I am missing the creategarbageterms() functions that groups tiramusu and biscuits with pudding to fit the question 100% - my answer is advocating "do not modify lists in interations, use appropriate set or dict filter it to the groups. Unique keys here are keys that are not parts of later mentioned keys.
You want to have an outer loop that's looping through a list, and an inner loop that can modify that same list.
I saw you got suggestions in the comments to simply not remove entries during the inner loop at all, but instead check if terms already are in temp. This is possible, and may be easier to read, but is not necessarily the best solution with respect to processing time.
I also see you received an answer from Patrick using dictionaries. This is probably the cleanest solution for your specific use-case, but does not address the more general question in your title which is specifically about removing items in a list while looping through it. If for whatever reason this is really necessary, I would propose the following:
idx = 0
while idx < len(my_list)
item = my_list[idx]
print(item[0])
temp = []
garbage_list = creategarbageterms(item[0])
ele_idx = 0
while ele_idx < len(my_list):
if ele_idx != idx:
ele = my_list[ele_idx]
for garbage_word in garbage_list:
if garbage_word in ele:
print("concepts: ", item, ele)
temp.append(ele)
del my_list[ele_idx]
ele_idx += 1
print(temp)
idx += 1
The key insight here is that, by using a while loop instead of a for loop, you can take more detailed, ''manual'' control of the control flow of the program, and more safely do ''unconventional'' things in your loop. I'd only recommend doing this if you really have to for whatever reason though. This solution is closer to the literal question you asked, and closer to your original own code, but maybe not the easiest to read / most Pythonic code.

iterate over several collections in parallel

I am trying to create a list of objects (from a class defined earlier) through a loop. The structure looks something like:
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in ticker_symbols:
stock = Share(i)
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
stock_object = Company(pe, ps)
stock_list.append(stock_object)
I would however want to add one more attribute to the Company-objects (stock_object) through the loop. The attribute would be a value from another list, like (arbitrary numbers) [5, 10, 20] where the first attribute would go to the first object, the second to the second object etc.Is it possible to do something like:
for i, j in ticker_symbols, list2:
#dostuff
? Could not get this sort of nested loop to work on my own. Thankful for any help.
I believe that all you have to do is change the for loop.
Instead of "for i in ticker_symbols:" you should loop like
"for i in range(len(ticker_symbols))" and then use the index i to do whatever you want with the second list.
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in range(len(ticker_symbols):
stock = Share(ticker_symbols[i])
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
# And then you can write
px = whatever list2[i]
stock_object = Company(pe, ps, px)
stock_list.append(stock_object)
Some people say that using index to iterate is not good practice, but I don't think so specially if the code works.
Try:
for i, j in zip(ticker_symbols, list2):
Or
for (k, i) in enumerate(ticker_symbols):
j = list2[k]
Equivalently:
for index in range(len(ticker_symbols)):
i = ticker_symbols[index]
j = list2[index]

looping dictionaries of {tuple:NumPy.array}

i have a set of dictionaries k of the form {(i,j):NumPy.array} over which I want to loop the NumPy.arrays for a certain evaluation.
I made the dictionarries as follows:
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
so i can always change the set of data i want to evaluate in my bigger set of code by changeing the original list of strings. However, this means i have to call for my dictionaries as eval(k) for k in datarr.
As a result, the loop i want to do looks like this for the moment :
for i in filarr:
for j in buiarr:
for l in datarrdif:
a = eval(l)[(i, j)]
a[abs(a)<.01] = float('NaN')
eval(l).update({(i, j):a})
but is there a much nicer way to write this ? I tried following, but this didn't work:
[eval(l)[(i, j)][abs(eval(l)[(i, j)])<.01 for i in filarr for j in buiarr for k in datarrdiff] = float('NaN')`
Thx in advance
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
Why don't you create them as a dictionary of dictionaries?
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
data = dict((name, {}) for name in datarr)
Then you can avoid all the eval().
for i in filarr:
for j in buiarr:
for l in datarr:
a = data[l][(i, j)]
np.putmask(a, np.abs(a)<.01, np.nan)
data[l].update({(i, j):a})
or probably just:
for arr in data.itervalues():
np.putmask(arr, np.abs(arr)<.01, np.nan)
if you want to set all elements of all dictionary values where abs(element) < .01 to NaN .

Categories

Resources