dict.values() doesn't provide all the values in python - python

The dict.values() doesn't provide all the values which are retrieved inside a for loop. I use a for loop to retrieve values from a text file.
test = {}
with open(input_file, "r") as test:
for line in test:
value = line.split()[5]
value = int(value)
test[value] = value
print (value)
test_list = test.values()
print (str(test_list))
The value and test_value doesn't contain equal number of data
The output is as followed:
From printing "value":
88
53
28
28
24
16
16
12
12
11
8
8
8
8
6
6
6
4
4
4
4
4
4
4
4
4
4
4
4
2
2
2
2
2
From printing test_list:
list values:dict_values([16, 24, 2, 4, 53, 8, 88, 12, 6, 11, 28])
Is there any way to include the duplicate values too, to the list?

This line:
test[value] = value
Doesn't add a new value to test if it's a duplicate, it simply overwrites the old value. So any duplicates get removed. The values() call is truly returning everything that remains in the dict.

Dictionary keys cannot contain duplicates. When you are doing test[value] = value the old value at the key value is overwritten. Thus you get a limited set of values only.
A sample test can be
>>> {1:10}
{1: 10}
>>> {1:10,1:20}
{1: 20}
Here you can see, the duplicate key is overwritten with the new value
POST COMMENT EDIT
As you said you want a list of values, you can have a statement l = [] at the start and have l.append(value) at the place where you have test[value] = value

This is because python dictionaries cannot have duplicate values. Everytime you run test[value] = value, it replaces an existing value or adds it if it's not in the dictionary yet.
For example:
>>> d = {}
>>> d['a'] = 'b'
>>> d
{'a': 'b'}
>>> d['a'] = 'c'
>>> d
{'a': 'c'}
I'd suggest making this into a list, like:
output = []
with open(input_file, "r") as test:
for line in test:
value = line.split()[5]
value = int(value)
output.append(value)
print (value)
print (str(output))

Related

Sorting on multiple keys from heterogenous tuple in values of a python dictionary [duplicate]

This question already has answers here:
How do I sort a dictionary by value?
(34 answers)
Closed 1 year ago.
Input:
{'Thiem': (3, 0, 10, 104, 11, 106),
'Medvedev': (1, 2, 11, 106, 10, 104),
'Barty': (0, 2, 8, 74, 9, 76),
'Osaka': (0, 4, 9, 76, 8, 74)}
The expected output should be sorted based on Values of Dict, in the order of attributes in values tuple. Like, firstly on 1st field value(desc), if matching then on 2nd value(desc), till 4th field(desc) and Ascending on 5th & 6th field. I tried using sorted() method in a couple of ways.
output:
Thiem 3 0 10 104 11 106
Medvedev 1 2 11 106 10 104
Osaka 0 4 9 76 8 74
Barty 0 2 8 74 9 76
Kindly assist or suggest an approach.
Edit:
Updated description for more clarity. Below is the code i tried:
>>> results=[]
>>> for (k,v) in d.items():
results.append(v)
>>> results.sort(key= lambda x: (x[4],x[5]))
>>> results.sort(key= lambda x: (x[0],x[1],x[2],x[3]), reverse=True)
I believe you are trying to compare the first number (element) of every tuple with one another, and the key with the greatest number should go on top (0 index). In case the numbers were equal; you would instead compare the second number of every tuple, and so on... till you reach the final element in that tuple. If so, then:
def dic_sorted(dic):
for r in range(len(dic) - 1):
i = 0
values = list(dic.values())
while values[r][i] == values[r + 1][i]:
i += 1
if values[r][i] < values[r + 1][i]:
key_shift(dic, values[r])
return dic
def key_shift(dic, v1):
keys = list(dic.keys())
values = list(dic.values())
temp_key = keys[values.index(v1)]
del dic[temp_key]
dic[temp_key] = v1
for i in range(5): # Number of iterations depends on the complexity of your dictionary
dic_sorted(data)
print(data)

How to find duplicates from a Pandas dataframe based upon the values in other columns?

I have a Pandas Df-
A=
[period store item
1 32 'A'
1 34 'A'
1 32 'B'
1 34 'B'
2 42 'X'
2 44 'X'
2 42 'Y'
2 44 'Y']
I need to implement something like this:
If an item has the same set of stores as any other item for that particular period then those items are duplicate.
So in this case A and B are duplicates as they have the same stores for the respective periods.
I have tried converting this into a nested dictionary using this:
dicta = {p: g.groupby('items')['store'].apply(tuple).to_dict()
for p, g in mkt.groupby('period')}
Which is returning me a dictionary like this:
dicta = {1: {'A': (32, 34),'B': (32, 34)}, 2: {'X': (42, 44),'Y': (42, 44)}}
...
So in the end I want a dictionary like this.
{1:(A,B),2:(X,Y)}
Although, I am not able to find any logic how to find the duplicate items.
Is there any other method that can be done to find those duplicate items
You can simply use .duplicated. Make sure to pass ['period', 'store'] as subset and keep as False so all the rows will be returned.
print(A[A.duplicated(subset=['period', 'store'], keep=False)])
Outputs
period store item
0 1 32 A
1 1 34 A
2 1 32 B
3 1 34 B
4 2 42 X
5 2 44 X
6 2 42 Y
7 2 44 Y
Note that according to the logic you specified all the rows are duplicates.
EDIT After OP elaborated on the expected format, I suggest
duplicates = A[A.duplicated(subset=['period', 'store'], keep=False)]
output = {g: tuple(df['item'].unique()) for g, df in duplicates.groupby('period')}
Then output is {1: ('A', 'B'), 2: ('X', 'Y')}.

Python: sort tab-separated key in dict by both columns

I have a dictionary with tab separated keys.
d = {}
d["1\t1"] = "abc"
d["10\t1"] = "def"
d["1\t10"] = "ghi"
d["2\t5"] = "xyz"
d["1\t4"] = 0
How can I sort these keys after first and second column?
I cannot use this
for s in sorted(d):
print s
because my keys are strings.
I want to return this:
1 1
1 4
1 10
2 5
10 1
How can this be achieved? I am not even sure if dictionaries are the right data structures.
i assume the second column means after \t
sorted(d.items(),key= lambda x: (int(x[0].split('\t')[0]),int(x[0].split('\t')[-1])))
output:
[('1\t1', 'abc'), ('1\t4', 0), ('1\t10', 'ghi'), ('2\t5', 'xyz'), ('10\t1', 'def')]
print out:
for k,_ in sorted(d.items(),key= lambda x: (int(x[0].split('\t')[0]),int(x[0].split('\t')[-1]))):
... print k.split('\t')[0], k.split('\t')[1]
1 1
1 4
1 10
2 5
10 1
Alternative (and probably much easier solution) for my own problem (thanks Kevin):
d = {}
d[1,1] = "abc"
d[10,1] = "def"
d[1,10] = "ghi"
d[2,5] = "xyz"
d[1,4] = 0
for k in sorted(d):
print(k)
Only included small revision of input data.

How does this python nested for loops work?

Can anyone explain why the output of the following nested loop is {1:6, 2:6, 3:6}?
>>> {x:y for x in [1, 2, 3] for y in [4, 5, 6]}
{1:6, 2:6, 3:6}
my_dict = {x:y for x in [1,2,3] for y in [4,5,6]}
is the same is creating it as follows
my_dict = {}
for x in [1,2,3]:
for y in [4,5,6]:
my_dict[x] = y
Which would look like this if you unroll the loops:
my_dict = {}
my_dict[1] = 4
my_dict[1] = 5
my_dict[1] = 6
my_dict[2] = 4
my_dict[2] = 5
my_dict[2] = 6
my_dict[3] = 4
my_dict[3] = 5
my_dict[3] = 6
You are effectively inserting nine key value pairs into the dictionary. However, each time you insert a pair with a key that already exists it overwrites the previous value. Thus you only ended up with the last insert for each key where the value was six.
The difference is you are making a dictionary vs a list. In your own example, you are effectively constructing a dictionary and because you set a different value for the same key 3 times, the last value sticks.
You are effectively doing:
dict[1] = 4
dict[1] = 5
dict[1] = 6
...
dict[3] = 4
dict[3] = 5
dict[3] = 6
So the last value sticks.
If the expectation was to create {1:4, 2:5, 3:6}, try this:
{x[0]:x[1] for x in zip([1,2,3], [4,5,6])}

How to copy unique keys and values from another dictionary in Python

I have a dataframe df with transactions where the values in the column Col can be repeated. I use Counter dictionary1 to count the frequency for each Col value, then I would like to run a for loop on a subset of the data and obtain a value pit. I want to create a new dictionary dict1 where the key is the key from dictionary1 and the value is the value of pit. This is the code I have so far:
dictionary1 = Counter(df['Col'])
dict1 = defaultdict(int)
for i in range(len(dictionary1)):
temp = df[df['Col'] == dictionary1.keys()[i]]
b = temp['IsBuy'].sum()
n = temp['IsBuy'].count()
pit = b/n
dict1[dictionary1.keys()[i]] = pit
My question is, how can i assign the key and value for dict1 based on the key of dictionary1 and the value obtained from the calculation of pit. In other words, what is the correct way to write the last line of code in the above script.
Thank you.
Since you're using pandas, I should point out that the problem you're facing is common enough that there's a built-in way to do it. We call collecting "similar" data into groups and then performing operations on them a groupby operation. It's probably wortwhile reading the tutorial section on the groupby split-apply-combine idiom -- there are lots of neat things you can do!
The pandorable way to compute the pit values would be something like
df.groupby("Col")["IsBuy"].mean()
For example:
>>> # make dummy data
>>> N = 10**4
>>> df = pd.DataFrame({"Col": np.random.randint(1, 10, N), "IsBuy": np.random.choice([True, False], N)})
>>> df.head()
Col IsBuy
0 3 False
1 6 True
2 6 True
3 1 True
4 5 True
>>> df.groupby("Col")["IsBuy"].mean()
Col
1 0.511709
2 0.495697
3 0.489796
4 0.510658
5 0.507491
6 0.513183
7 0.522936
8 0.488688
9 0.490498
Name: IsBuy, dtype: float64
which you could turn into a dictionary from a Series if you insisted:
>>> df.groupby("Col")["IsBuy"].mean().to_dict()
{1: 0.51170858629661753, 2: 0.49569707401032703, 3: 0.48979591836734693, 4: 0.51065801668211308, 5: 0.50749063670411987, 6: 0.51318267419962338, 7: 0.52293577981651373, 8: 0.48868778280542985, 9: 0.49049773755656106}

Categories

Resources