Extend dictionary values by same key for SVM training data - python

Hi I'm quite new to Python and Machine learning, I want to extract SVM's x and y from two dictionaries.
the two dictionaries look like:
DIC_01
{'A': ['Low'],
'B': ['High']}
DIC_02
{'A': [2623.83740234375,
-1392.9608154296875,
416.20831298828125],
'B': [1231.1268310546875,
-963.231201171875,
1823.742431640625]}
About the data: The keys of the dictionaries are my 'keywords'. DIC_01 was converted from a dataframe, its values are keyword's probability of sales. DIC_02 is the vectors to represent the keyword.
I want to organise this dictionary to SVM training data format. x is the value of DIC_02, y is the value of DIC_01.
I don't know what's the most efficient way to do this task. At the moment I'm thinking...
step 1: merge values with the same keys
{'A': [2623.83740234375,
-1392.9608154296875,
416.20831298828125],['Low'],
'B': [1231.1268310546875,
-963.231201171875,
1823.742431640625],['High']}
step 2: extract the first and second value as SVM's x and y then train the model.
Thank you!

Hi is that what you want to do?
DIC_01 = {'A': ['Low'],
'B': ['High']}
DIC_02 = {'A': [2623.83740234375,
-1392.9608154296875,
416.20831298828125],
'B': [1231.1268310546875,
-963.231201171875,
1823.742431640625]}
smv_X = []
smv_Y = []
for e in DIC_01:
smv_X.append(DIC_02[e])
smv_Y.append(DIC_01[e][0])
print(smv_X) # [[2623.83740234375, -1392.9608154296875, 416.20831298828125], [1231.1268310546875, -963.231201171875, 1823.742431640625]]
print(smv_Y) # ['Low', 'High']
for k,v in DIC_01.items():
# k = key
# v = value

Related

How do I access list values in a dictionary based on key matches in Python?

{'A': [5.0, 6.20], 'B': [1.92, 3.35], 'C': [3.21, 7.0], 'D': [2.18, 9.90]}
I will then manipulate the numbers according to the key matches.
So for example, A, I'd take those numbers and plug-into an equation accordingly.
x/100 * y/100 = 5.0/100 * 6.20/100
Note that this is part of a function that returns values.
You can use a dict comprehension to do this for each key.
{k:(x/100) * (y/100) for k,(x,y) in d.items()}
{'A': 0.0031000000000000003,
'B': 0.0006432,
'C': 0.002247,
'D': 0.0021582000000000003}
Accessing a single key's value in a dictionary is as simple as just d['A'] or d.get('A')
Read more about dict comprehensions here.
EDIT: Thanks for the cleaner code suggestion #Olvin Roght
After accessing the dictionary values with their key, in this example 'A' then access the list values with their index. Index [0] of the list holds 'val1' for example. Which you can then assign to a variable and do your math.
dict
Code
dic = {'A':['val1', 'val2']}
x = dic['A'][0]
y = dic['A'][1]
print(x, y)
Result
val1 val2

Compare the values of two dictionarie in Python and return the key for the values with a difference bigger than 2

So, I have two dictionary files in binary, and I need to compare the values between them.
The keys have the same name, but the values differ.
I managed to read them and turn the values into integers, but now I want to compare the value from the first dictionary with the value from the second dictionary and only print or return the key that have a difference between values of min 2.
I tried using this code, but I only get the keys that are different in values, but I don't want the ones with a difference lower than 2.
for key in primary:
if (key in secondary and primary[key] != secondary[key]):
faulty_sensors_values[key] = primary[key]
print(faulty_sensors_values)```
for key in primary:
if (key in secondary and abs(primary[key]-secondary[key]) >= 2):
faulty_sensors_values[key] = primary[key]
print(faulty_sensors_values)
Try like this:
a = {'a': 2, 'b': 4, 'c': 8}
b = {'a': 1, 'b': 1, 'c': 3}
for x,y in zip(a,b):
if ((a[x]-b[y])>2):
print(x)
Will print:
b
c

Is there a way to randomly shuffle keys and values in a Python Dictionary, but the result can't have any of the original key value pairs?

I would like to shuffle the key value pairs in this dictionary so that the outcome has no original key value pairs. Starting dictionary:
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
Example of unwanted outcome:
my_dict_shuffled = {'Z':'a',
'K':'k', <-- Original key value pair
'A':'p',
'P':'z'}
Example of wanted outcome:
my_dict_shuffled = {'Z':'a',
'A':'k',
'K':'p',
'P':'z'}
I have tried while loops and for loops with no luck. Please help! Thanks in advance.
Here's a fool-proof algorithm I learned from a Numberphile video :)
import itertools
import random
my_dict = {'A': 'a',
'K': 'k',
'P': 'p',
'Z': 'z'}
# Shuffle the keys and values.
my_dict_items = list(my_dict.items())
random.shuffle(my_dict_items)
shuffled_keys, shuffled_values = zip(*my_dict_items)
# Offset the shuffled values by one.
shuffled_values = itertools.cycle(shuffled_values)
next(shuffled_values, None) # Offset the values by one.
# Guaranteed to have each value paired to a random different key!
my_random_dict = dict(zip(shuffled_keys, shuffled_values))
Disclaimer (thanks for mentioning, #jf328): this will not generate all possible permutations! It will only generate permutations with exactly one "cycle". Put simply, the algorithm will never give you the following outcome:
{'A': 'k',
'K': 'a',
'P': 'z',
'Z': 'p'}
However, I imagine you can extend this solution by building a random list of sub-cycles:
(2, 2, 3) => concat(zip(*items[0:2]), zip(*items[2:4]), zip(*items[4:7]))
A shuffle which doesn't leave any element in the same place is called a derangement. Essentially, there are two parts to this problem: first to generate a derangement of the keys, and then to build the new dictionary.
We can randomly generate a derangement by shuffling until we get one; on average it should only take 2-3 tries even for large dictionaries, but this is a Las Vegas algorithm in the sense that there's a tiny probability it could take a much longer time to run than expected. The upside is that this trivially guarantees that all derangements are equally likely.
from random import shuffle
def derangement(keys):
if len(keys) == 1:
raise ValueError('No derangement is possible')
new_keys = list(keys)
while any(x == y for x, y in zip(keys, new_keys)):
shuffle(new_keys)
return new_keys
def shuffle_dict(d):
return { x: d[y] for x, y in zip(d, derangement(d)) }
Usage:
>>> shuffle_dict({ 'a': 1, 'b': 2, 'c': 3 })
{'a': 2, 'b': 3, 'c': 1}
theonewhocodes, does this work, if you don't have a right answer, can you update your question with a second use case?
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
while True:
new_dict = dict(zip(list(my_dict.keys()), random.sample(list(my_dict.values()),len(my_dict))))
if new_dict.items() & my_dict.items():
continue
else:
break
print(my_dict)
print(new_dict)

Count of values of all categorical variable using Python

I have a dataset with a large number of columns, how do I calculate the frequency of values of all categorical variables in Python? I don't want frequency for one or two specific columns rather I need the frequency of all variables type="category".
Use selected_dtypes() for selecting the columns with type=category, and use sum() method to calculate the frequencies:
df.select_dtypes(include='category').sum()
output:
col_cat1 9
col_cat2 21
Not entirely sure I know what you mean, but if you just want to keep a running count of frequencies, dictionaries are a great way to do this.
E.g. if we use the dummy data ['A', 'A', 'B', 'A', 'C', 'C']
category_counts = {}
for category in categories:
try:
category_counts[category] += 1
except:
category_counts[category] = 1
print(category_counts)
returns:
{'A': 3, 'B': 1, 'C': 2}
EDIT: so if you want a count of the categories of each column the code only changes slightly to:
table = [['Male/Female','M','M','F','M',"F"],['Age','10-20','30-40','10-20','20-30','10-20']]
category_counts = {}
for column in table:
category_counts[column[0]] = {}
for data in column[1:]:
try:
category_counts[column[0]][data] += 1
except:
category_counts[column[0]][data] = 1
print(category_counts)
Which prints:
{'Male/Female': {'M': 3, 'F': 2}, 'Age': {'10-20': 3, '30-40': 1, '20-30': 1}}
But I'm unsure how you're currently storing your data

python dictionary, assign random values the sum of which is a certain value

I am making a python program in which random values are generated n times, to be used as parameter values for model simulation.
I have a dictionary defining the boundaries for each parameter, for example:
parameters = {'A': random.uniform(1,10), 'B': random.uniform(20,40)}
I want to add some parameters the sum of which has to be 1, something like:
params = {'C1': random.uniform(0.0,1.0), 'C2': 1 - params['C1']}
This latter obviously doesn't work producing the KeyError: 'C1'
I also tried something like:
params = {'A': random.uniform(1,10), 'B': random.uniform(20,40), 'C': {'C1': None,'C2': None}}
def class_fractions():
for key in params['C']:
if key == 'C1':
params['C'][key] = random.uniform(0.0,1.0)
if key == 'C2':
params['C'][key] = 1.0 - params['C'][key]
but after calling the function I get the TypeError
TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'
Any suggestion?
The issue with your code is because dict are not ordered in Python. When you do:
for key in params['C']
You are getting C2 key before than C1. Actually you do not even need to iterate the dict simply set values in the dict like:
def class_fractions():
params['C']['C1'] = random.uniform(0.0,1.0)
params['C']['C2'] = 1.0 - params['C']['C1']
Yo do not even need separate function as it updates only the same dict, simply you do it like:
params = {'A': random.uniform(1,10), 'B': random.uniform(20,40)} # create partial dict
c_dict = {'C1': random.uniform(1,10)} # create sub-dict to store random value
c_dict['C2'] = 1 - c_dict['C1'] # get value that you want
params['C'] = c_dict # add entry into parent dict
From your use of the for loop it looks like you may really have many more parameters than just two. In that case you can generate a list of values filled with random numbers, and then scale it to sum to one, as described in this other answer. Then iterate over a zipped view of your dictionary keys and the list items and assign the items to the dictionary.
Or, operating directly on the dictionary:
params = {k: random.uniform(0, 1) for k in ('C1', 'C2', 'C3')}
total = sum(params.values())
params = {k: (v / total) for k, v in params.items()}
If you really want ot use your class_fraction, and initialize params at once, then need to use OrderDict:
import random
from collections import OrderedDict
params = {'A': random.uniform(1,10), 'B': random.uniform(20,40), 'C':OrderedDict([('C1', None),('C2', None)])}
def class_fractions():
for key in params['C']:
if key == 'C1':
params['C'][key] = random.uniform(0.0,1.0)
if key == 'C2':
params['C'][key] = 1.0 - params['C']['C1']
class_fractions()
print(params)
Results in:
{'B': 37.3088618142464, 'A': 2.274415152225316, 'C': OrderedDict([('C1', 0.12703200100786471), ('C2', 0.8729679989921353)])}

Categories

Resources