Separating values into dictionaries and counting them in Python - python

I’m trying to take a list of numbers, separate the values into dictionaries based on the int and float types, and then count the number of their occurrences. I’m having issues with the logic.
With the ideal output looking like so:
'int' : [1 : 3, 2 : 4, 5 : 1, 6 : 2],
'float' : [1.0 : 2, 2.3 : 4, 3.4 : 4]
This is what I have so far, and I keep pounding my head:
values = [1, 2.0, 3.0, 1, 1, 3, 4.0, 2, 1.0]
types = {
'int': [],
'float': []
}
for obj in values:
if isinstance(obj, int):
types['int'].append(obj)
elif isinstance(obj, float):
types['float'].append(obj)
for v in types:
if v not in types['int']:
counts = 0
counts[v] += 1
elif v not in types['float']:
counts = 0
counts[v] += 1
print(counts)
With the ideal output being:
'int' : [1 : 3, 2 : 4, 5 : 1, 6 : 2],
'float' : [1.0 : 2, 2.3 : 4, 3.4 : 4]

First half seems fine, but second half can improve.
try:
for k,v in types.items():
print(k, len(v))
Sample input:
values = [1.1,2,3.1]
Gives this output:
int 1
float 2

How about something like this:
This leverages the type function to give you the keys for the resulting array ('int' and 'float') without having to specify those strings.
numbers = [1,1,1,2.1,2.1,2.2, 3,3.1]
def tally_by_type(numbers):
result = {}
for v in numbers:
k = type(v).__name__
result[k] = result.get(k, {})
result[k][v] = result[k].get(v,0)
result[k][v] = result[k][v] + 1
return result
tally_by_type(numbers)
{'int': {1: 3, 3: 1}, 'float': {2.1: 2, 2.2: 1, 3.1: 1}}
Interestingly, this also works if you have strings in there
tally_by_type([1,2,3,3,3,'a','a','b'])
{'int': {1: 1, 2: 1, 3: 3}, 'str': {'a': 2, 'b': 1}}

you can try something like this:
ints={}
floats={}
list=[3.5,27,87.8,1.02,66]
for val in list:
if isinstance(val,int):
ints[str(val)]=val
elif isinstance(val,float):
floats[str(val)]=val
print("ints dictionary\n",ints,"number of instances",len(ints))
print("floats dictionary\n",floats,"number of instances",len(floats))
which prints:
ints dictionary
{'27': 27, '66': 66} number of instances 2
floats dictionary
{'3.5': 3.5, '87.8': 87.8, '1.02': 1.02} number of instances 3
I did not quite get what dictionary keys you want to use though, assumed you don't really need them.

Related

Python - Is there more efficient way to convert the string values in Dict to get unique numbers for each str

I have a data which consist of 2 million records, I am trying to convert the values to number and save it in a dictionary. Then use that dictionary to use it as a lookup. All to reduce the size of the file.
the data looks like this
[{
'a' : ['one','two'],
'b' : 'fine',
'c' : ['help']},
{
'a' : ['four','hen'],
'b' : 'happy',
'c' : ['mouse']},
{
'a' : ['two','hen'],
'b' : 'fine'}.......]
def convertDataToNumber(newdata):
dataR = []
dataRD = {}
result=[]
cin = 0
ctr = 1
# all_keys = {k for d in newdata for k in d.keys()}
for d in newdata:
for key,val in d.items():
if isinstance(val,type([])):
for l in val:
if l not in dataR:
dataR.append(l)
dataRD[(dataR[cin])] = ctr
ctr = ctr + 1
cin = cin + 1
d[key] = [dataRD.get(x,x) for x in d[key]]
if isinstance(val,str):
if val not in dataR:
dataR.append(val)
dataRD[(dataR[cin])] = ctr
ctr = ctr + 1
cin = cin + 1
d[key] = [dataRD.get(x,x) for x in [d[key]]]
return dataRD,newdata
Is there a better way to convert the values to numbers.
currently it is taking around 1 hour to execute this operation.
output:
[{'a' : [1,2],
'b':[3],
'c':[4]},
{'a' : [5,6],
'b':[7],
'c':[8]},
{'a':[2,6],
'b':[3]}]
You can create dict for saving each str with a unique ID and append a new string in dict and use the store str and number. (With this approach we iterate over dict one-time and on each iterate over item of dict and seeing each str if exist in dict use the number if not exist store a new number for that str.)
def cnvrt_num(v, mem_cat):
if isinstance(v, list):
res = []
for i in v:
mem_cat[i] = mem_cat.get(i, len(mem_cat)+1)
res.append(mem_cat[i])
else:
mem_cat[v] = mem_cat.get(v, len(mem_cat)+1)
res = [mem_cat[v]]
return res
mem_cat = {}
for dct in lst:
for k,v in dct.items():
dct[k] = cnvrt_num(v, mem_cat)
print(mem_cat)
# {'one': 1, 'two': 2, 'fine': 3, 'help': 4, 'four': 5, 'hen': 6, 'happy': 7, 'mouse': 8}
print(lst)
[
{'a': [1, 2], 'b': [3], 'c': [4]},
{'a': [5, 6], 'b': [7], 'c': [8]},
{'a': [2, 6], 'b': [3]}
]
Input:
lst = [
{'a' : ['one','two'],'b' : 'fine','c' : ['help']},
{'a' : ['four','hen'],'b' : 'happy','c' : ['mouse']},
{'a' : ['two','hen'],'b' : 'fine'}]

Set tuple pair of (x, y) coordinates into dict as key with id value

The data looks like this:
d = {'location_id': [1, 2, 3, 4, 5], 'x': [47.43715, 48.213889, 46.631111, 46.551111, 47.356628], 'y': [11.880689, 14.274444, 14.371, 13.665556, 11.705181]}
df = pd.DataFrame(data=d)
print(df)
location_id x y
0 1 47.43715 11.880689
1 2 48.213889 14.274444
2 3 46.631111 14.371
3 4 46.551111 13.665556
4 5 47.356628 11.705181
Expected output:
{(47.43715, 11.880689): 1, (48.213889, 14.274444): 2, (46.631111, 14.371): 3, ...}
So i can simply access ID providing point coordinates.
What i have tried:
dict(zip(df['x'].astype('float'), df['y'].astype('float'), zip(df['location_id'])))
Error: ValueError: dictionary update sequence element #0 has length 3; 2 is required
or
dict(zip(tuple(df['x'].astype('float'), df['y'].astype('float')), zip(df['location_id'])))
TypeError: tuple expected at most 1 arguments, got 2
I have Googled for it a while, but I am not very clear about it. Thank you for any assistance.
I think this
result = dict(zip(zip(df['x'], df['y']), df['location_id']))
should give you what you want? Result:
{(47.43715, 11.880689): 1,
(48.213889, 14.274444): 2,
(46.631111, 14.371): 3,
(46.551111, 13.665556): 4,
(47.356628, 11.705181): 5}
I didn't use a dataframe, is this what you wanted?
my_dict = {}
d = {'location_id': [1, 2, 3, 4, 5], 'x': [47.43715, 48.213889, 46.631111, 46.551111, 47.356628], 'y': [11.880689, 14.274444, 14.371, 13.665556, 11.705181]}
for i in range(len(d['location_id'])):
my_dict[ (d['x'][i] , d['y'][i]) ] = d['location_id'][i]
You can set x and y column as index then export location_id column to dictionary
d = df.set_index(['x', 'y'])['location_id'].to_dict()
print(d)
{(47.43715, 11.880689): 1, (48.213889, 14.274444): 2, (46.631111, 14.371): 3, (46.551111, 13.665556): 4, (47.356628, 11.705181): 5}

Return key with highest value

I have the following graph:
graph = {0 : {5:6, 4:8},
1 : {4:11},
2 : {3: 9, 0:12},
3 : {},
4 : {5:3},
5 : {2: 7, 3:4}}
I am trying to return the key that has the highest value in this graph. The expected output in this case would be 2 as key 2 has the highest value of 12.
Any help on how I can achieve this would be greatly appreciated.
Find the key whose maximum value is maximal:
max((k for k in graph), key=lambda k: max(graph[k].values(), default=float("-inf")))
The empty elements are disqualified by the ridiculous maximum. Alternately, you can just pre-filter such keys:
max((k for k in graph if graph[k]), key=lambda k: max(graph[k].values()))
Assuming it's all positive numbers
graph = {0 : {5:6, 4:8},
1 : {4:11},
2 : {3: 9, 0:12},
3 : {},
4 : {5:3},
5 : {2: 7, 3:4}}
highestKey = 0
max = 0
for key, value in graph.items():
for key2, value2 in value.items():
if (max < value2):
max = value2
highestKey = key
print(highestKey)
You can also create (max_weight, key) tuples for each key and get the max of those:
max_val = max((max(e.values()), k) for k, e in graph.items() if e)
# (12, 2)
print(max_val[1])
# 2
Note that we don't need a custom key function for max here because the first value in the tuple is the one we want max to consider.
The recursive solution is below. Does not make assumptions about depth of your tree. Only assumes that data types are either int, float or dict
import type
def getLargest(d):
def getLargestRecursive(d):
if type(d) == “dict”:
getLargestRecursive(d)
elif not largest or d > largest:
largest = d
largest = None
getLargestRecursive(d)
return largest
largestValues = [getLargest(k) for k in graph.keys]
answer = largestValues.index(max(largestValues))
You can also use dict comprehension to flat the dictionary and then print the max key,
graph = {0 : {5:6, 4:8},
1 : {4:11},
2 : {3: 9, 0:12},
3 : {},
4 : {5:3},
5 : {2: 7, 3:4}}
flat_dcit = {k:a for k, v in graph.items() for a in v.values()}
print(max(flat_dcit.keys(), key=(lambda k: flat_dcit[k])))
# output,
2
You can also try flattening your dictionary into a list of tuples then take the max of the tuple with the highest second value:
from operator import itemgetter
graph = {
0: {5: 6, 4: 8},
1: {4: 11},
2: {3: 9, 0: 12},
3: {},
4: {5: 3},
5: {2: 7, 3: 4},
}
result = max(((k, v) for k in graph for v in graph[k].values()), key=itemgetter(1))
print(result)
# (2, 12)
print(result[0])
# 2

Dictionary Containing list data, filter based on value in list

I have test data which is gathered based on multiple inputs, and results in a single output. I'm currently storing this data in a dictionary whose keys are my parameter/ results labels, and whose values are the test conditions and results. I would like to be able to filter the data so I can generate plots based on isolated conditions.
In my example below, my test conditions would be 'a' and 'b', and the result of the experiment would be 'c'. I want to filter my data so I get a dictionary with the same key, value structure and only my filtered results. However my current dictionary comprehension returns an empty dictionary. Any advice to get the desired result?
Current Code:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filtered_data = {k:v for k,v in data.iteritems() if v in data['b'] >= 20}
Desired Result:
{'a': [0, 1, 2], 'b': [20, 20, 20], 'c': [2.3, 2.9, 3.4]}
Current Result:
{}
Also, is this dictionary of lists a good schema to store data of this type, given that I'm going to want to filter the results, or is there a better way to accomplish this?
use this:
k:[v[i] for i,x in enumerate(v) if data['b'][i] >= 20] for k,v in data.items()}
Desired Result:
{'a': [0, 1, 2], 'c': [2.3, 2.9, 3.4], 'b': [20, 20, 20]}
Consider using the pandas module for this type of work.
import pandas as pd
df = pd.DataFrame(data)
df = df[df["b"] >= 20]
print(df)
It appears like this will give you what you want. You are using the dictionary key to represent the column name and the values are just rows in a given column, so it is amenable to using a dataframe.
Result:
a b c
3 0 20 2.3
4 1 20 2.9
5 2 20 3.4
Are all of the dictionary value lists in matching orders? If so, you could just look at whichever list you want to filter by, say 'b' in this case, find the values you want, and then either use those indices or the same slice on the other values in the dictionary.
For example:
matching_indices = []
for i in data['b']:
if data['b'][i] >= 20:
matching_indices.append(i)
new_dict = {}
for key in data:
for item in matching_indices:
new_dict[key] = data[key][item]
You could probably figure a dictionary comprehension for it if you wanted. Hopefully this is clear.
you can change this into a method which would give it more flexibility. Your current logic means that dataset a and c are neglected because there are no values greater than or equal to 20:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filter_vals = ['a', 'b']
new_d = {}
for k, v in data.iteritems():
if k in filter_vals:
new_d[k] = [i for i in v if i >= 20]
print new_d
Now i'm not a big fan if many if statements, but something like this is straight forward and can be called many times
def my_filter(operator, condition, filter_vals, my_dict):
new_d = {}
for k, v in my_dict.iteritems():
if k in filter_vals:
if operator == '>':
new_d[k] = [i for i in v if i > condition]
elif operator == '<':
new_d[k] = [i for i in v if i < condition]
elif operator == '<=':
new_d[k] = [i for i in v if i <= condition]
elif operator == '>=':
new_d[k] = [i for i in v if i >= condition]
return new_d
I agree with the pandas approach above.
If for some reason you hate pandas or are an old school computer scientist, tuples are a good way to tore relational data. In your example, the a, b, and c lists are columns rather than rows. For tuples, you would want to store the rows as:
data = {'a':(0,10,1.3),'b':(1,10,1.9),'c':(2,10,2.3),'d':(0,20,2.3),'e':(1,20,2.9),'f':(2,20,3.4)}
where the tuples are stored in the (condition1, condition2, outcome) format you described and you can call a single test or filter a set as you describe. From there you can get a filtered set of results as follows:
filtered_data = {k:v for k,v in data.iteritems() if v[1]>=20}
which returns:
{'d': (0, 20, 2.3), 'e': (1, 20, 2.9), 'f': (2, 20, 3.4)}

Python: Sum values in a dictionary based on condition

I have a dictionary that has Key:Values.
The values are integers. I would like to get a sum of the values based on a condition...say all values > 0 (i.e).
I've tried few variations, but nothing seems to work unfortunately.
Try using the values method on the dictionary (which returns a generator in Python 3.x), iterating through each value and summing if it is greater than 0 (or whatever your condition is):
In [1]: d = {'one': 1, 'two': 2, 'twenty': 20, 'negative 4': -4}
In [2]: sum(v for v in d.values() if v > 0)
Out[2]: 23
>>> a = {'a' : 5, 'b': 8}
>>> sum(value for _, value in a.items() if value > 0)

Categories

Resources