looping dictionaries of {tuple:NumPy.array} - python

i have a set of dictionaries k of the form {(i,j):NumPy.array} over which I want to loop the NumPy.arrays for a certain evaluation.
I made the dictionarries as follows:
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
so i can always change the set of data i want to evaluate in my bigger set of code by changeing the original list of strings. However, this means i have to call for my dictionaries as eval(k) for k in datarr.
As a result, the loop i want to do looks like this for the moment :
for i in filarr:
for j in buiarr:
for l in datarrdif:
a = eval(l)[(i, j)]
a[abs(a)<.01] = float('NaN')
eval(l).update({(i, j):a})
but is there a much nicer way to write this ? I tried following, but this didn't work:
[eval(l)[(i, j)][abs(eval(l)[(i, j)])<.01 for i in filarr for j in buiarr for k in datarrdiff] = float('NaN')`
Thx in advance

datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
for i in range(len(dat)): exec(datarr[i]+'={}')
Why don't you create them as a dictionary of dictionaries?
datarr = ['PowUse', 'PowHea', 'PowSol', 'Top']
data = dict((name, {}) for name in datarr)
Then you can avoid all the eval().
for i in filarr:
for j in buiarr:
for l in datarr:
a = data[l][(i, j)]
np.putmask(a, np.abs(a)<.01, np.nan)
data[l].update({(i, j):a})
or probably just:
for arr in data.itervalues():
np.putmask(arr, np.abs(arr)<.01, np.nan)
if you want to set all elements of all dictionary values where abs(element) < .01 to NaN .

Related

Python: Ignore repetition in a loop

Let's say I have a function like this, to merge names in two lists:
def merge_name(list1="John",list2="Doe"):
def merge(name1=list1,name2=list2):
merge=name1+"-"+name2
data={"Merged":merge}
return data
d = pd.DataFrame()
for i,j in [(i,j) for i in list1 for j in list2]:
if i==j:
d=d
else:
x = merge(name1=i,name2=j)
ans=pd.DataFrame({"Merged":[x["Merged"]]})
d=pd.concat([d,ans])
return d
What I am interested in are unique combinations, i.e, "John-Doe" and "Doe-John" are the same for my purposes. So if I run something like this:
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
df=merge_name(list1=names1,list2=names2)
I will get:
John-Doe
John-Richard
John-Joana
Doe-John
Doe-Richard
Doe-Joana
Richard-John
Richard-Doe
Richard-Joana
The groups in bold are all repetitions. Essentially, every time it comes to the next i, it creates n-1 repeated groups, with n being the position in names1. Is there a way to avoid this, like drop the top name in "list2" every time j becomes the last element in the list?
Thanks in advance.
I have tried to update list2 while in loop but obviously that does not work
Below code can be useful
import pandas as pd
def merge_name(list1="John", list2="Doe"):
merged=[]
for i in list1:
for j in list2:
if (i!=j) and (f"{j} - {i}" not in merged):
merged.append(f"{i} - {j}")
df = pd.DataFrame(set(merged))
return df
names1 = ["John", "Doe", "Richard"]
names2 = ["John", "Doe", "Richard", "Joana"]
df = merge_name(list1=names1, list2=names2)
print(df)
Below is my solution with some explanations:
def combineName(listName):
res = []
for i in range(len(listName)):
for j in range(i+1, len(listName)):
res.append(listName[i] + "-" + listName[j])
return res
names1=["John","Doe","Richard"]
names2=["John","Doe","Richard","Joana"]
listName = list(set(names1 + names2))
print(listName)
print(combineName(listName))
First, you should create a simple list without repetitions. This way you only get unique elements in your list. To do this, I used a set.
I take care to transform my set into a list because later I go through the structure in a given order, which is not supposed to be true for a set.
Secondly, the function creates all the combinations. There are two loops, and you notice that the second loop has a special range.
Indeed, you do not want repetitions such as "John-Doe" and "Doe-John".
Each combination is created at a unique time!

How do I use a while loop to access all the 2nd elements of lists which are the values stored in a dictionary?

If I have a dictionary like this, filled with similar lists, how can I apply a while loo tp extract a list that prints that second element:
racoona_valence={}
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"],\}
I need to print the part that says "BOBB7" for 2nd element of the lists in a larger dictionary. There are ten key-value pairs in it, so I am starting it like so, but unsure what to do because all the examples I can find don't relate to my problem:
n=10
gene_list = []
while n>0:
Any help greatly appreciated.
Well, there's a bunch of ways to do it depending on how well-structured your data is.
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"], "rs13283414": ["7:87345874365-839479328749+","BOBB4"]}
output = []
for key in racoona_valence.keys():
output.append(racoona_valence[key][1])
print(output)
other_output = []
for key, value in racoona_valence.items():
other_output.append(value[1])
print(other_output)
list_comprehension = [value[1] for value in racoona_valence.values()]
print(list_comprehension)
n = len(racoona_valence.values())-1
counter = 0
gene_list = []
while counter<=n:
gene_list.append(list(racoona_valence.values())[n][1])
counter += 1
print(gene_list)
Here is a list comprehension that does what you want:
second_element = [x[1] for x in racoona_valence.values()]
Here is a for loop that does what you want:
second_element = []
for value in racoona_valence.values():
second_element.append(value[1])
Here is a while loop that does what you want:
# don't use a while loop to loop over iterables, it's a bad idea
i = 0
second_element = []
dict_values = list(racoona_valence.values())
while i < len(dict_values):
second_element.append(dict_values[i][1])
i += 1
Regardless of which approach you use, you can see the results by doing the following:
for item in second_element:
print(item)
For the example that you gave, this is the output:
BOBB7

how to make this changing dataframe faster?

list_nn = [k for k in list(df['job_keyword'].unique()) if not str(k).isdigit()]
i = 0
for k in list_nn:
df.loc[df.job_keyword == k ,'job_keyword'] = i
df.loc[df.user_keyword == k , 'user_keyword'] = i
i+=1
it's a loop through the data frame column and if its match with it change with number
it takes more than 3 minutes is there a way to make this faster?
it's look though the entire dataframe I want to make it faster
You can try something like the following:
df[df.job_keyword.isin(list_nn)] = df[df.job_keyword.isin(list_nn)].index
df[df.user_keyword.isin(list_nn)] = df[df.job_keyword.isin(list_nn)].index

iterate over several collections in parallel

I am trying to create a list of objects (from a class defined earlier) through a loop. The structure looks something like:
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in ticker_symbols:
stock = Share(i)
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
stock_object = Company(pe, ps)
stock_list.append(stock_object)
I would however want to add one more attribute to the Company-objects (stock_object) through the loop. The attribute would be a value from another list, like (arbitrary numbers) [5, 10, 20] where the first attribute would go to the first object, the second to the second object etc.Is it possible to do something like:
for i, j in ticker_symbols, list2:
#dostuff
? Could not get this sort of nested loop to work on my own. Thankful for any help.
I believe that all you have to do is change the for loop.
Instead of "for i in ticker_symbols:" you should loop like
"for i in range(len(ticker_symbols))" and then use the index i to do whatever you want with the second list.
ticker_symbols = ["AZN", "AAPL", "YHOO"]
stock_list = []
for i in range(len(ticker_symbols):
stock = Share(ticker_symbols[i])
pe = stock.get_price_earnings_ratio()
ps = stock.get_price_sales()
# And then you can write
px = whatever list2[i]
stock_object = Company(pe, ps, px)
stock_list.append(stock_object)
Some people say that using index to iterate is not good practice, but I don't think so specially if the code works.
Try:
for i, j in zip(ticker_symbols, list2):
Or
for (k, i) in enumerate(ticker_symbols):
j = list2[k]
Equivalently:
for index in range(len(ticker_symbols)):
i = ticker_symbols[index]
j = list2[index]

Calculating means of values for subgroups of keys in python dictionary

I have a dictionary which looks like this:
cq={'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00,
'B2_B2M_02':4.30, 'B2_B2M_02':2.40 etc.}
I need to calculate mean of triplets, where the keys[2:] agree. So, I would ideally like to get another dictionary which will be:
new={'_B2M_01': 2.47, '_B2M_02': 3.9}
The data is/should be in triplets so in theory I could just get the means of the consecutive values, but first of all, I have it in a dictionary so the keys/values will likely get reordered, besides I'd rather stick to the names, as a quality check for the triplets assigned to names (I will later add a bit showing error message when there will be more than three per group).
I've tried creating a dictionary where the keys would be _B2M_01 and _B2M_02 and then loop through the original dictionary to first append all the values that are assigned to these groups of keys so I could later calculate an average, but I am getting errors even in the first step and anyway, I am not sure if this is the most effective way to do this...
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
trips=set([x[2:] for x in cq.keys()])
new={}
for each in trips:
for k,v in cq.iteritems():
if k[2:]==each:
new[each].append(v)
Traceback (most recent call last):
File "<pyshell#28>", line 4, in <module>
new[each].append(v)
KeyError: '_B2M_01'
I would be very grateful for any suggestions. It seems like a fairly easy operation but I got stuck.
An alternative result which would be even better would be to get a dictionary which contains all the names used as in cq, but with values being the means of the group. So the end result would be:
final={'A1_B2M_01':2.47, 'A2_B2M_01':2.47, 'A3_B2M_01':2.47, 'B1_B2M_02':3.9,
'B2_B2M_02':3.9, 'B2_B2M_02':3.9}
Something like this should work. You can probably make it a little more elegant.
cq = {'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00, 'B2_B2M_02':4.30, 'B2_B2M_02':2.40 }
sum = {}
count = {}
mean = {}
for k in cq:
if k[2:] in sum:
sum[k[2:]] += cq[k]
count[k[2:]] += 1
else:
sum[k[2:]] = cq[k]
count[k[2:]] = 1
for k in sum:
mean[k] = sum[k] / count[k]
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
sums = dict()
for k, v in cq.iteritems():
_, p2 = k.split('_', 1)
if p2 not in sums:
sums[p2] = [0, 0]
sums[p2][0] += v
sums[p2][1] += 1
res = {}
for k, v in sums.iteritems():
res[k] = v[0]/float(v[1])
print res
also could be done with one iteration
Grouping:
SEPARATOR = '_'
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
groups = {}
for key in cq:
group_key = SEPARATOR.join(key.split(SEPARATOR)[1:])
if group_key in groups:
groups[group_key].append(cq[key])
else:
groups[group_key] = [cq[key]]
Generate means:
def means(groups):
for group, group_vals in groups.iteritems():
yield (group, float(sum(group_vals)) / len(group_vals),)
print list(means(groups))

Categories

Resources