How to recode numerical to categorical data - python

I am new to Python, I am studying it for data science purposes. Right now, I am trying to recode some numerical data (1,2,3 etc) into categories. It requires a little loop in the end, but I can't get that right. It causes a key error 3.
The dataset has 21 columns.
Can anyone help?
Thanks!!
for col_dic in code_list:
col = col_dic[0]
dic = col_dic[1]
values[col] = [dic[x] for x in values[col]]

It is rather hard to understand what exactly you want to see in result, but the cause of this error is clear:
You are iterating through a list of lists. Every col_dic contains a col = col_dic[0] (string like 'property_type') and dic = col_dic[1] (dictionary). In the last line you are writing info to a values dict (I suppose you created it before). This error is appearing because dic doesn't contain a particular key from the values[col]. For example:
values[col] is equal to {1: [], 2: [], 3: []} and dic is equal to {1: 'One', 2: 'Two'}. When you are iterating through values[col], you are trying to find a key 3 in dic. But it contains no 3 key, so the error is appearing. You should check that dic contains this key like this:
values_list = []
for v in values[col]:
if v in dic:
values_list.append(dic[v])
values[col] = values_list
Also note that your keys are strings represents integers. Your error can appear when you try to find a key '3' (string) in a dict contains keys like 3 (integers). So I suggest you to convert your keys to strings: str(key).

Related

Getting data from a dictionary Python

I'm using python dictionaries from counting the repeated objects from an array.
I use a function obtained from this forum from counting the objects, and the obtained result is on the next format: {object: nÂșelements, ...).
My issue is that the function doesn't return the dictionary objects differentiated by keys and I don't know how to get the objects.
count_assistence_issues = {x:list(assistances_issues).count(x) for x in list(assistances_issues)}
count_positive_issues = {x:list(positive_issues).count(x) for x in list(positive_issues)}
count_negative_issues = {x:list(negative_issues).count(x) for x in list(negative_issues)}
print(count_assistence_issues)
print(count_positive_issues)
print(count_negative_issues)
This is the output obtained:
{school.issue(10,): 1, school.issue(13,): 1}
{school.issue(12,): 1}
{school.issue(14,): 2}
And this is the output I need to obtain:
{{issue: school.issue(10,), count: 1},
{issue: school.issue(13,), count: 1}}
{{issue: school.issue(12,), count: 1}}
{{issue: school.issue(14,), count: 2}}
Anyone know how to differenciate by keys the elements of the array using the function?
Or any other function for counting the repeated objects for obtaining a dictionary with the format {'issue': issue,'count': count)
Thanks for reading!
Given your input and output. I would consider the following.
1) Merge all your counts into a single dictionary
#assuming that what diffrentitaes your issues is a unique ID/key/value etc.
#meaning that no issues are subsets of the other. IF they are this will need some tweaking
issue_count = {}
issue_count.update(count_assistence_issues)
issue_count.update(count_positive_issues)
issue_count.update(count_positive_issues)
Getting the counts is then simply:
issue_count[school.issue(n,)]
The key is your array. If you want an alternative. You could make a list of keys or dict of your keys. You can make this as verbose as you want.
key_issues = {"issue1":school.issue(1,),"issue2":school.issue(2,)....}
This then allows you to call your counts by:
issue_count[key_issues["issue1"]]
If you want to use the "count" field. You would need to fix your counter to give you a dict of your issue with field count but that's another question.

iter through the dict store the key value and iter again to look for similar word in dict and delete form dict eg(Light1on,Light1off) in Python

[I had problem on how to iter through dict to find a pair of similar words and output it then the delete from dict]
My intention is to generate a random output label then store it into dictionary then iter through the dictionary and store the first key in the list or some sort then iter through the dictionary to search for similar key eg Light1on and Light1off has Light1 in it and get the value for both of the key to store into a table in its respective columns.
such as
Dict = {Light1on,Light2on,Light1off...}
store value equal to Light1on the iter through the dictionary to get eg Light1 off then store its Light1on:value1 and Light1off:value2 into a table or DF with columns name: On:value1 off:value2
As I dont know how to insert the code as code i can only provide the image sry for the trouble,its my first time asking question here thx.
from collections import defaultdict
import difflib, random
olist = []
input = 10
olist1 = ['Light1on','Light2on','Fan1on','Kettle1on','Heater1on']
olist2 = ['Light2off','Kettle1off','Light1off','Fan1off','Heater1off']
events = list(range(input + 1))
for i in range(len(olist1)):
output1 = random.choice(olist1)
print(output1,'1')
olist1.remove(output1)
output2 = random.choice(olist2)
print(output2,'2')
olist2.remove(output2)
olist.append(output1)
olist.append(output2)
print(olist,'3')
outputList = {olist[i]:events[i] for i in range(10)}
print (str(outputList),'4')
# Iterating through the keys finding a pair match
for s in range(5):
for i in outputList:
if i == list(outputList)[0]:
skeys = difflib.get_close_matches(i, outputList, n=2, cutoff=0.75)
print(skeys,'5')
del outputList[skeys]
# Modified Dictionary
difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animaltion'])
['animal']
Updated: I was unable to delete the pair of similar from the list(Dictionary) after founding par in the dictionary
You're probably getting an error about a dictionary changing size during iteration. That's because you're deleting keys from a dictionary you're iterating over, and Python doesn't like that:
d = {1:2, 3:4}
for i in d:
del d[i]
That will throw:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
To work around that, one solution is to store a list of the keys you want to delete, then delete all those keys after you've finished iterating:
keys_to_delete = []
d = {1:2, 3:4}
for i in d:
if i%2 == 1:
keys_to_delete.append(i)
for i in keys_to_delete:
del d[i]
Ta-da! Same effect, but this way avoids the error.
Also, your code above doesn't call the difflib.get_close_matches function properly. You can use print(help(difflib.get_close_matches)) to see how you are meant to call that function. You need to provide a second argument that indicates the items to which you wish to compare your first argument for possible matches.
All of that said, I have a feeling that you can accomplish your fundamental goals much more simply. If you spend a few minutes describing what you're really trying to do (this shouldn't involve any references to data types, it should just involve a description of your data and your goals), then I bet someone on this site can help you solve that problem much more simply!

python: badly behaving dict inside a function- erroneous TypeError

I have dicts that I need to clean, e.g.
dict = {
'sensor1': [list of numbers from sensor 1 pertaining to measurements on different days],
'sensor2': [list of numbers from from sensor 2 pertaining to measurements from different days],
etc. }
Some days have bad values, and I would like to generate a new dict with the all the sensor values from that bad day to be erased by using an upper limit on the values of one of the keys:
def clean_high(dict_name,key_string,limit):
'''clean all the keys to eliminate the bad values from the arrays'''
new_dict = dict_name
for key in new_dict: new_dict[key] = new_dict[key][new_dict[key_string]<limit]
return new_dict
If I run all the lines separately in IPython, it works. The bad days are eliminated, and the good ones are kept. These are both type numpy.ndarray: new_dict[key] and new_dict[key][new_dict[key_string]<limit]
But, when I run clean_high(), I get the error:
TypeError: only integer arrays with one element can be converted to an index
What?
Inside of clean_high(), the type for new_dict[key] is a string, not an array.
Why would the type change? Is there a better way to modify my dictionary?
Do not modify a dictionary while iterating over it. According to the python documentation: "Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries". Instead, create a new dictionary and modify it while iterating over the old one.
def clean_high(dict_name,key_string,limit):
'''clean all the keys to eliminate the bad values from the arrays'''
new_dict = {}
for key in dict_name:
new_dict[key] = dict_name[key][dict_name[key_string]<limit]
return new_dict

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

Adding Multiple Values to a Single Key in Python Dictionary

Python dictionaries really have me today. I've been pouring over stack, trying to find a way to do a simple append of a new value to an existing key in a python dictionary adn I'm failing at every attempt and using the same syntaxes I see on here.
This is what i am trying to do:
#cursor seach a xls file
definitionQuery_Dict = {}
for row in arcpy.SearchCursor(xls):
# set some source paths from strings in the xls file
dataSourcePath = str(row.getValue("workspace_path")) + "\\" + str(row.getValue("dataSource"))
dataSource = row.getValue("dataSource")
# add items to dictionary. The keys are the dayasource table and the values will be definition (SQL) queries. First test is to see if a defintion query exists in the row and if it does, we want to add the key,value pair to a dictionary.
if row.getValue("Definition_Query") <> None:
# if key already exists, then append a new value to the value list
if row.getValue("dataSource") in definitionQuery_Dict:
definitionQuery_Dict[row.getValue("dataSource")].append(row.getValue("Definition_Query"))
else:
# otherwise, add a new key, value pair
definitionQuery_Dict[row.getValue("dataSource")] = row.getValue("Definition_Query")
I get an attribute error:
AttributeError: 'unicode' object has no attribute 'append'
But I believe I am doing the same as the answer provided here
I've tried various other methods with no luck with various other error messages. i know this is probably simple and maybe I couldn't find the right source on the web, but I'm stuck. Anyone care to help?
Thanks,
Mike
The issue is that you're originally setting the value to be a string (ie the result of row.getValue) but then trying to append it if it already exists. You need to set the original value to a list containing a single string. Change the last line to this:
definitionQuery_Dict[row.getValue("dataSource")] = [row.getValue("Definition_Query")]
(notice the brackets round the value).
ndpu has a good point with the use of defaultdict: but if you're using that, you should always do append - ie replace the whole if/else statement with the append you're currently doing in the if clause.
Your dictionary has keys and values. If you want to add to the values as you go, then each value has to be a type that can be extended/expanded, like a list or another dictionary. Currently each value in your dictionary is a string, where what you want instead is a list containing strings. If you use lists, you can do something like:
mydict = {}
records = [('a', 2), ('b', 3), ('a', 4)]
for key, data in records:
# If this is a new key, create a list to store
# the values
if not key in mydict:
mydict[key] = []
mydict[key].append(data)
Output:
mydict
Out[4]: {'a': [2, 4], 'b': [3]}
Note that even though 'b' only has one value, that single value still has to be put in a list, so that it can be added to later on.
Use collections.defaultdict:
from collections import defaultdict
definitionQuery_Dict = defaultdict(list)
# ...

Categories

Resources