This question already has answers here:
Pandas DENSE RANK
(4 answers)
How to use Pandas to replace column entries in DataFrame and create dictionary new-old values
(2 answers)
Closed 2 months ago.
d = {'col': ['ana', 'ben', 'carl', 'dennis', 'earl', ...]}
df = pd.DataFrame(data = d)
I have an example dataframe here. Usually, if there are more than 5 unique values, OHE will not be used (correct me if I'm wrong).
Instead, mapping using a dictionary is used.
An example dictionary would be
dict = {'ana': 1, 'ben': 2, 'carl':, 3, ...}
Is there a library or any way to make this automatic (though manual mapping may be better as you know which values are mapped to which number)?
EDIT 1
Using ascii_lowercase, I am able to map single letter strings to integers. But as shown above, what if my strings are not single letters?
original question
You can generate the dictionary programatically using ascii.lowercase and enumerate in a dictionary comprehension:
from string import ascii_lowercase
dic = {k:v for v,k in enumerate(ascii_lowercase, start=1)}
Output:
{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}
Then you can just map:
df['col'].map(dic)
edit: dictionary from an arbitrary Series of values
You can use pandas.factorize:
v,k = pd.factorize(df['col'])
dic = dict(zip(k, v+1))
Output: {'ana': 1, 'ben': 2, 'carl': 3, 'dennis': 4, 'earl': 5}
Related
number of dictionaries in a list random from 2 to 15
dictionary length is random
for example: [{'d': 15, 'c': 9, 'g': 18, 'm': 33, 's': 10}, {'a': 9, 'h': 50, 'r': 15}]
I would like to use list comprehension, and I started from this:
import string
letter_count = dict((key, 0) for key in
string.ascii_lowercase) print(letter_count)
number_of_dictionaries = 3 # should be random
list_of_dictionaries = [dict() for number in range(number_of_dictionaries)]
I have no idea how to make random key and letters not in order.
Since the letters will be the keys, you need to select non-repeating random letters (which you can do using random.sample). the dictionary can be created by pairing up the letters with a list of random numbers using zip():
import string
import random
size = 10
keys = random.sample(string.ascii_lowercase,size)
values = (random.randint(1,50) for _ in range(size))
result = dict(zip(keys,values))
print(result)
{'r': 43, 't': 40, 'o': 18, 'm': 8, 'f': 47, 'a': 21, 'y': 50, 'b': 44,
'i': 42, 'w': 31}
To get multiple dictionaries in a list, you can combine this approach in a loop selecting a random size for each (in range 1-26).
dictList = []
for _ in range(random.randint(2,15)): # random number of dictionaries
size = random.randint(1,26) # random dictionary size
keys = random.sample(string.ascii_lowercase,size) # random letters
values = (random.randint(1,50) for _ in range(size)) # random numbers
oneDict = dict(zip(keys,values)) # assemble dict.
dictList.append(oneDict) # add it to the list
print(dictList)
[{'u': 14, 'j': 49},
{'y': 32},
{'y': 7, 'c': 26},
{'p': 11, 'k': 20, 'n': 6},
{'h': 4, 'f': 35, 'w': 19, 'n': 19, 'g': 25, 'p': 4, 'k': 36},
{'h': 47}]
I want to sum values of the same key: H, C, O, N, S according to dictionary composition for the string input which is a combination of letter A, C, D, E.
composition = {
'A': {'H': 5, 'C': 3, 'O': 1, 'N': 1},
'C': {'H': 5, 'C': 3, 'O': 1, 'N': 1, 'S': 1},
'D': {'H': 5, 'C': 4, 'O': 3, 'N': 1},
'E': {'H': 7, 'C': 5, 'O': 3, 'N': 1},
}
string_input = ['ACDE', 'CCCDA']
The expected result should be
out = {
'ACDE' : {'H': 22, 'C': 15, 'O': 8, 'N': 4, 'S': 1},
'CCCDA' : {'H': 15, 'C': 9, 'O': 3, 'N': 3, 'S': 3},
}
I am trying to use Counter but stuck at unsupported operand type(s) for +: 'int' and 'Counter'
from collections import Counter
for each in string_input:
out = sum(Counter(composition[aa]) for aa in each)
sum() has a starting value, from which it starts the sum. This also provides a default if there are no values to sum in the first argument. That starting value is 0, an integer.
From the sum() function documentation:
sum(iterable[, start])
Sums start and the items of an iterable from left to right and returns the total. start defaults to 0.
When summing Counter objects, give it an empty Counter() to start off with:
sum((Counter(composition[aa]) for aa in each), Counter())
If you then assign the result to a key in a dictionary assigned to out you get your expected result as Counter instances:
>>> out = {}
>>> for each in string_input:
... out[each] = sum((Counter(composition[aa]) for aa in each), Counter())
...
>>> out
{'ACDE': Counter({'H': 22, 'C': 15, 'O': 8, 'N': 4, 'S': 1}), 'CCCDA': Counter({'H': 25, 'C': 16, 'O': 7, 'N': 5, 'S': 3})}
Three nested for loops should do the work.
out = {}
for x in string_input: # for each string in list
current = out[x] = {}
for char in x: # for each character in the string
cur_composition=composition[char]
for val in cur_composition: # for all the entry in the composition dictionary for that character
current[char]= cur_composition[val] if val not in current[char] else cur_composition[val]+current[char]
I am looking for a simpler way to create this python dictionary. May I know if enumerate function can help?
a_dict = {'a':0, 'b':1, 'c':2, ....}
Using enumerate you can use a generator expression and string.ascii_lowercasewithin within dict :
>>> import string
>>> dict((j,i) for i,j in enumerate(string.ascii_lowercase))
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}
As enumerate returns tuples in format (index,element) you can loop over it and just change the indices with element then convert it to dict.
I would use an infinite number generator like itertools.count to generate numbers from 0. Then zip the ascii characters with count and create the tuples needed for the dictionary generation.
>>> from itertools import count, izip
>>> import string
>>>
>>> dict(izip(string.ascii_lowercase, count()))
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}
you may simply use string.ascii_lowercase which is string of all lowercase characters, then you zip the lowercase letters and list of number ranging from 0 to len(string.ascii_lowercase) and then convert them to dict.
However you may want to use some other set of alphabet as string.ascii_letters, string.ascii_uppercase , string.letters, string.punctuation, etc.
You can easily filter the keys that you want in your dictionary either by concatenating the above mentioned strings as string.ascii_lowercase+string.ascii_uppercase would give us a string containing first the 26 lowercase alphabets and then 26 uppercase alphabets, you may also apply slicing methods to get desired set of characters, like string.ascii_lowercase[0:15] would give you "abcdefghijklmn"
import string
alphabets = string.ascii_lowercase
print dict(zip(alphabets, range(len(alphabets))))
>>> {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}
Assuming the .... means you want the alphabet:
{chr(i+97): i for i in range(26)}
I am trying to create a python dictionary with the following values in python 2.7.3:
'A':1
'B':2
'C':3
.
.
.
.
'Z':26
using either of the following lines:
theDict = {x:y for x in map(chr,range(65,91)) for y in range(1,27)}
or
theDict = {x:y for x in map(chr,range(65,91)) for y in list(range(1,27))}
In both cases, I get the following result:
'A':26
'B':26
'C':26
.
.
.
.
'Z':26
I don't understand why the second for is not generating the numbers 1-26. Maybe it is, but if so, I don't understand why I am only getting 26 for the value of each key. If I don't create a dictionary (i.e. change x:y with just x or y), x = capital letters and y = 1-26.
Can someone explain what I am doing wrong and suggest a possible approach to get the result that I want.
Why it's wrong: Your list comprehension is nested. It's effectively something like this:
d = {}
for x in map(chr, range(65, 91)):
for y in range(1,27):
d[x] = y
As you can see, this isn't what you want. What it does is set y to 1, then walk through the alphabet, setting all letters to 1 i.e. {'A':1, 'B':1, 'C':1, ...}. Then it does it again for 2,3,4, all the way to 26. Since it's a dict, later settings overwrite earlier settings, and you see your result.
There are several options here, but in general, the solution to iterate over multiple companion lists is a pattern more like this:
[some_expr(a,b,c) for a,b,c in zip((a,list,of,values), (b, list, of, values), (c, list, of values))]
The zip pulls one value from each of the sublists and makes it into a tuple for each iteration. In other words, it converts 3 lists of 4 items each, into 4 lists of 3 items each (in the above). In your example, you have 2 lists of 26 items, when you want 26 pairs; zip will do that for you.
Try:
>>> {chr(k):k-64 for k in range(65,91)}
{'A': 1, 'C': 3, 'B': 2, 'E': 5, 'D': 4, 'G': 7, 'F': 6, 'I': 9, 'H': 8, 'K': 11, 'J': 10, 'M': 13, 'L': 12, 'O': 15, 'N': 14, 'Q': 17, 'P': 16, 'S': 19, 'R': 18, 'U': 21, 'T': 20, 'W': 23, 'V': 22, 'Y': 25, 'X': 24, 'Z': 26}
Or, if you want to do what you are doing use zip:
>>> {x:y for x,y in zip(map(chr,range(65,91)),range(1,27))}
{'A': 1, 'C': 3, 'B': 2, 'E': 5, 'D': 4, 'G': 7, 'F': 6, 'I': 9, 'H': 8, 'K': 11, 'J': 10, 'M': 13, 'L': 12, 'O': 15, 'N': 14, 'Q': 17, 'P': 16, 'S': 19, 'R': 18, 'U': 21, 'T': 20, 'W': 23, 'V': 22, 'Y': 25, 'X': 24, 'Z': 26}
The reason yours is not working, is that your comprehension is executing the inner the outer loop times. ie, try this in the shell:
>>> [(chr(outter), inner) for outter in range(65,91) for inner in range(1,27)]
[('A', 1), ('A', 2), ('A', 3), ('A', 4),... ('A', 26),
...
...
('Z', 1), ('Z', 2), ('Z', 3), ('Z', 4), ..., ('Z', 26)]
So if you do:
>>> len([(chr(outter), inner) for outter in range(65,91) for inner in range(1,27)])
676
You can see that it is executing 26x26 times (26x26=676)
Since a dict will just update with the new value, the last value for each letter is used:
>>> dict([(chr(outter), inner) for outter in range(65,91) for inner in range(1,27)])
{'A': 26, 'C': 26, 'B': 26, 'E': 26, 'D': 26, 'G': 26, 'F': 26, 'I': 26, 'H': 26, 'K': 26, 'J': 26, 'M': 26, 'L': 26, 'O': 26, 'N': 26, 'Q': 26, 'P': 26, 'S': 26, 'R': 26, 'U': 26, 'T': 26, 'W': 26, 'V': 26, 'Y': 26, 'X': 26, 'Z': 26}
Which shows why you are getting what you are getting.
You can try the following:
theDict = {chr(y):y - 64 for y in range(65, 91)}
print theDict
Output:
{'A': 1, 'C': 3, 'B': 2, 'E': 5, 'D': 4, 'G': 7, 'F': 6, 'I': 9, 'H': 8, 'K': 11, 'J': 10, 'M': 13, 'L': 12, 'O': 15, 'N': 14, 'Q': 17, 'P': 16, 'S': 19, 'R': 18, 'U': 21, 'T': 20, 'W': 23, 'V': 22, 'Y': 25, 'X': 24, 'Z': 26}
I'm looking of creating a dictionary in python
which its keys are the chars '0' to '9' , afterwards keys from 'a' to 'z',
and their ids should be a counter from 0 to 36
like this:
dict = {'0':0, '1':1, '2':2, ....., '9':9, 'a':10, .... , 'x':33, 'y':34, 'z':35}
I manage to write this
dict = {}
for i in range(10):
dict[str(i)] = i
ord_a = ord('a')
for i in range(0,26):
dict[chr(ord_a + i)] = i+10
Can you help me with a better way to implement it?
And one more thing, print(dict) returns an unsorted object:
{'d': 13, 'e': 14, 'f': 15, 'g': 16, 'r': 27, 'a': 10, 'b': 11,
'c': 12, 'l': 21, 'm': 22, 'n': 23, 'o': 24, 'h': 17, 'i': 18,
'j': 19, 'k': 20, '4': 4, '5': 5, '6': 6, '7': 7, '0': 0, '1': 1,
'2': 2, '3': 3, '8': 8, '9': 9, 'z': 35, 't': 29, 'u': 30,
'x': 33, 'v': 31, 'y': 34, 'w': 32, 's': 28, 'p': 25, 'q': 26}
Why's that? I actually initialize it quite sorted, no?
import string
keys = string.digits+string.ascii_lowercase
values = range(len(keys))
d = dict(zip(keys,values))
dicts have unordered keys. To have ordered keys, use a collections.OrderedDict. (Also, never name a variable dict or list, etc., since this prevents you from easily accessing the Python built-in of the same name. The built-in is useful, as you can see above.)