python print multiple value of key in different lines - python

I have a dictionary with each key containing multiple values (list):
one = [1,2,3]
two = [1,2,3]
three= [1,2,3]
It was obtained with the following line of code:
output_file.write('{0}\t{1}\n'.format(key,"\t".join(value)))
So my final printed output looks like this:
one 1 2 3
two 1 2 3
three 1 2 3
My goal now is to have the output looking like this instead:
one 1
one 2
one 3
two 1
two 2
…
Any tips?

You can add the key itself as delimiter
#key = "one"
#value = ['1','2','3']
print(key+'\t'+'\n{0}\t'.format(key).join(value))
output
one 1
one 2
one 3

You could do this with nested for-loops:
for key, value_list in my_dict.iteritems():
for value in value_list:
output_file.write("{}\t{}\n".format(key, value))

this may also work...
#key = "one"
#value = ['1','2','3']
print '\n'.join(map(lambda i: key+'\t'+str(i), value))

Related

Drop duplicate lists within a nested list value in a column

I have a pandas dataframe with nested lists as values in a column as follows:
sample_df = pd.DataFrame({'single_proj_name': [['jsfk'],['fhjk'],['ERRW'],['SJBAK']],
'single_item_list': [['ABC_123'],['DEF123'],['FAS324'],['HSJD123']],
'single_id':[[1234],[5678],[91011],[121314]],
'multi_proj_name':[['AAA','VVVV','SASD'],['QEWWQ','SFA','JKKK','fhjk'],['ERRW','TTTT'],['SJBAK','YYYY']],
'multi_item_list':[[['XYZAV','ADS23','ABC_123'],['XYZAV','ADS23','ABC_123']],['XYZAV','DEF123','ABC_123','SAJKF'],['QWER12','FAS324'],[['JFAJKA','HSJD123'],['JFAJKA','HSJD123']]],
'multi_id':[[[2167,2147,29481],[2167,2147,29481]],[2313,57567,2321,7898],[1123,8775],[[5237,43512],[5237,43512]]]})
As you can see above, in some columns, the same list is repeated twice or more.
So, I would like to remove the duplicated list and only retain one copy of the list.
I was trying something like the below:
for i, (single, multi_item, multi_id) in enumerate(zip(sample_df['single_item_list'],sample_df['multi_item_list'],sample_df['multi_id'])):
if (any(isinstance(i, list) for i in multi_item)) == False:
for j, item_list in enumerate(multi_item):
if single[0] in item_list:
pos = item_list.index(single[0])
sample_df.at[i,'multi_item_list'] = [item_list]
sample_df.at[i,'multi_id'] = [multi_id[j]]
else:
print("under nested list")
for j, item_list in enumerate(zip(multi_item,multi_id)):
if single[0] in multi_item[j]:
pos = multi_item[j].index(single[0])
sample_df.at[i,'multi_item_list'][j] = single[0]
sample_df.at[i,'multi_id'][j] = multi_id[j][pos]
else:
sample_df.at[i,'multi_item_list'][j] = np.nan
sample_df.at[i,'multi_id'][j] = np.nan
But this assigns NA to the whole column value. I expect to remove that specific list (within a nested list).
I expect my output to be like as below:
In the data it looks like removing duplicates is equivalent to keeping the first element in any list of lists while any standard lists are kept as they are. If this is true, then you can solve it as follows:
def get_first_list(x):
if isinstance(x[0], list):
return [x[0]]
return x
for c in ['multi_item_list', 'multi_id']:
sample_df[c] = sample_df[c].apply(get_first_list)
Result:
single_proj_name single_item_list single_id multi_proj_name multi_item_list multi_id
0 [jsfk] [ABC_123] [1234] [AAA, VVVV, SASD] [[XYZAV, ADS23, ABC_123]] [[2167, 2147, 29481]]
1 [fhjk] [DEF123] [5678] [QEWWQ, SFA, JKKK, fhjk] [XYZAV, DEF123, ABC_123, SAJKF] [2313, 57567, 2321, 7898]
2 [ERRW] [FAS324] [91011] [ERRW, TTTT] [QWER12, FAS324] [1123, 8775]
3 [SJBAK] [HSJD123] [121314] [SJBAK, YYYY] [[JFAJKA, HSJD123]] [[5237, 43512]]
To handle the case where there can be more than a single unique list the get_first_list method can be adjusted to:
def get_first_list(x):
if isinstance(x[0], list):
new_x = []
for i in x:
if i not in new_x:
new_x.append(i)
return new_x
return x
This will keep the order of the sublists while removing any sublist duplicates.
Shortly with np.unique function:
cols = ['multi_item_list', 'multi_id']
sample_df[cols] = sample_df[cols].apply(lambda x: [np.unique(a, axis=0) if type(a[0]) == list else a for a in x.values])
In [382]: sample_df
Out[382]:
single_proj_name single_item_list single_id multi_proj_name \
0 [jsfk] [ABC_123] [1234] [AAA, VVVV, SASD]
1 [fhjk] [DEF123] [5678] [QEWWQ, SFA, JKKK, fhjk]
2 [ERRW] [FAS324] [91011] [ERRW, TTTT]
3 [SJBAK] [HSJD123] [121314] [SJBAK, YYYY]
multi_item_list multi_id
0 [[XYZAV, ADS23, ABC_123]] [[2167, 2147, 29481]]
1 [XYZAV, DEF123, ABC_123, SAJKF] [2313, 57567, 2321, 7898]
2 [QWER12, FAS324] [1123, 8775]
3 [[JFAJKA, HSJD123]] [[5237, 43512]]

Python - find lowest value across dictionaries

I have a dictionary that is structured as follows:
MSE = {}
MSE[1] = {}
MSE[2] = {}
MSE[3] = {}
That is, the dictionary itself consists of a number of dictionaries. These look as follows:
MSE[1][1] = 5
MSE[1][2] = 3
MSE[1][2] = 7
MSE[2][1] = 4
MSE[2][2] = 3
MSE[2][2] = 7
MSE[3][1] = 1
MSE[3][2] = 1
MSE[3][2] = 2
I want to find the lowest of these values across all the different dictionaries. How do I do that?
The values of a dictionary d:
d.values()
The minimum of an iterable, like the result of .values() of a dictionary:
min(d.values())
So, you want the minimum of all the minimums for each dictionary in some dictionary (say, MSE):
min(min(d.values()) for d in MSE.values())
This loops over all of the values in MSE, which in your case are dictionaries. It finds the minimum value for each and then takes the minimum out of all of those.
You can do it like this:
minm = min([min(i.values()) for i in MSE.values()])

using a dictionary with a file

So I have a file with letters and numbers related to them in it which is written as a list like this:
a 1
b 2
c 3
d 4
etc
I also have another file with the letters in it and a number of times to multiple them by so its like this:
a 3 b 5
c 6 d 2
so basically it means that I want to get the value of A from the original file and multiply it by 3 and then get B from the other file and multiply it by 5 etc.
I have made a dictionary of the original file but I don't know how to retrieve the number to use it to multiply. python essentially needs to go through the file being used to multipy and then see the A and get the value from the other file that corresponds to it and to then multiply it by 3.
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
(key, val) = line.split()
d[key] = int(val)
print(d)
d = {}
with open("numbers.txt") as numbers:
for line in numbers:
pairs = line.split()
for i in range(0,len(pairs),2):
d[pairs[i]] = int(pairs[i+1])
print(d)

Incorrect output dictionary from user's input

I need output to be in the form
{0: {1:11,2:13}, 1: {0:11,3:14}}
But it comes out to
{0: {1:['11'],2:['13']}, 1: {0:['11'],3:['14']}}
using this
graph = {}
N,w = map(int,raw_input().split())
# print N, w
for x in range(0,C):
i,j,c = raw_input().split()
graph.setdefault(int(i), {}).setdefault(int(j),[]).append(w)
print graph
on INPUT
1st line: Ignore N=4, while C=4 is the number of lines.
2nd line: i,j are vertices, w is the edge weight.
4 4
0 1 11
0 2 13
1 0 11
1 3 14
You are setting lists as values inside your nested dictionary in the following line -
graph.setdefault(int(i), {}).setdefault(int(j),[]).append(w)
This is why you are getting values inside list, if you are 100% sure that the key:value pairs inside the nested dictionary would always be unique, then you can simply set the value to the key. Example -
graph.setdefault(int(i), {})[int(j)] = w

In python, how do I find the sum of values in a dictionary? Where each key has multiple values

My data is tab delimited and looks like this:
Name Count Sample
Dog .0001 1
Dog .00003 1
Dog .0001 2
Cat .0004 1
Cat .0002 1
Cat .0003 2
Cat .0002 2
After i define my variables unid as the first column merged with the 3rd column (ex Dog_1) and num as the Count for that line, i append each num into a dictionary under the unid (using Python 2.7), like so:
for line in K:
sp = line.split("\t")
name = sp[0]
unid = sp[3][:-2] +"_"+ sp[0]
num = int(Decimal(sp[1]))
if not dict1.has_key(unid):
dict1[unid] = []
dict1[unid].append(num)
I try to sum it with this:
dictTot = sum(dict1.values())
But i get this error message:
TypeError: unsupported operand type(s) for +: 'int' and 'list'
How can I sum these values such that I can retrieve Cat_1: .0006, Cat_2: .0005 etc?
Sorry everyone, as I know my ? is not great. But as stated by Jacob below,
"dictTot = sum(sum(value) for value in dict1.values())" sums all of the sums, but what I am looking for is to sum each group of values under each key independently so I can find out how many Cats there are in sample 1 and so on. Perhaps sum is not right for this? Sorry, as evident I am not a Python extraordinaire.
That isn't how sum works. You're trying to get an integer (or numeric value type) by "adding" a bunch of lists, so the built-in function freaks out. Try this instead:
dictTot = sum(sum(value) for value in dict1.values())
That will sum all the sums, which is what you want (I think).
EDIT
Apparently you want to sum all the values in each element of the list. For that purpose, you can use a dictionary comprehension:
dictTot = {key:sum(l_values) for key, l_values in dict1.items()}
I basically rewrote the whole thing...
K = "Dog .0001 1\n Dog .00003 1\n Dog .0001 2\n Cat .0004 1\n Cat .0002 1\n Cat .0003 2\n Cat .0002 2"
dict1 = {}
for line in K.split("\n"):
sp = line.split()
name = sp[0]
unid = "_".join([sp[0] , sp[2][-2:]])
num = float(sp[1])
if not dict1.has_key(unid):
dict1[unid] = [num,]
else :
dict1[unid].append(num)
print(dict1)
dictTot = sum([sum(x) for x in dict1.values()])
print(dictTot)
the final dict is
{'Dog_2': [0.0001],
'Dog_1': [0.0001, 3e-05],
'Cat_1': [0.0004, 0.0002],
'Cat_2': [0.0003, 0.0002]}
the sum is
0.00133
the values are lists, so you want to loop them to sum individually.
EDIT
apparently now you want "Cat_1: .0006, Cat_2: .0005 etc", so upon dict1, you can do
for key in dict1.iterkeys():
dict1[key] = sum(dict1[key])
now dict1 becomes
{'Dog_2': 0.0001,
'Dog_1': 0.00013,
'Cat_1': 0.0006,
'Cat_2': 0.0005}
In order to sum all the values, you must first join all the lists together into one iterable that sum() can process. Here are two ways to do this:
dictTot = sum(sum(dict1.values(), []))
And the slightly more verbose, but more readable:
from itertools import chain
dictTot = sum(chain.from_iterable(dict1.values()))
sum() actually takes two arguments. The second argument, start defaults to 0. Hence the error message you're getting about adding an int to list. In essence, it's doing this: 0 + [1, 2, 3] + [1, 2].... In my first example, I set the default start value to an empty list. The result is a single list. Now that I have all the values in a single list, I can sum() the result to obtain the answer.
EDIT
In response to your update:
You can do this with a generator expression:
dictTot = {key: sum(value) for key, value in dictTot.items()}
or if you are using < Python 2.7:
dictTot = dict((key, sum(value)) for key, value in dictTot.iteritems())
Answer:
dict((k,sum(v)) for k,v in dict1.iteritems())
yea, change the int(Decimal('.0001')) and use a defaultdict
+1 for a question with downvotes and then four answers that missed the oneliner answer
EDIT oops I missed that #Joel Cornett had it so props there too
This works:
d={}
for line in K:
sp = line.strip().split()
unid = sp[0]+"_"+sp[-1]
num = decimal.Decimal(sp[1])
d.setdefault(unid,[]).append(num)
print({k:sum(v) for k, v in d.items()})
Prints:
{'Dog_1': Decimal('0.00013'),
'Cat_2': Decimal('0.0005'),
'Cat_1': Decimal('0.0006'),
'Dog_2': Decimal('0.0001')}

Categories

Resources