Python dictionary comprehension with Pandas - python

I am trying to create a dictionary from two columns of a DataFrame (df)
mydict={x :y for x in df['Names'] for y in df['Births']}
But all of the values are the same(the last value in the column)!
{'Bob': 973, 'Jessica': 973, 'John': 973, 'Mary': 973, 'Mel': 973}
I checked the column and it has many other values, what am I doing wrong?

I think Abdou hit the nail on the head with dict(zip(dff['Names'], dff['Births'])), but if you want to do it with a dict comprehension you can do this:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(
...: [{'Births': 971, 'Names': 'Bob'},
...: {'Births': 972, 'Names': 'Jessica'},
...: {'Births': 973, 'Names': 'John'},
...: {'Births': 974, 'Names': 'Mary'},
...: {'Births': 975, 'Names': 'Mel'}])
In [3]: {d['Names']: d['Births'] for d in df.to_dict(orient='records')}
Out[3]: {'Bob': 971, 'Jessica': 972, 'John': 973, 'Mary': 974, 'Mel': 975}

try
my_dict = {row.Names:row.Births for (index,row) in df.iterows()}

Related

Sum of two specific values in two different dicts with the same keys

I am a beginner in Python and have the following problem:
I have two loops, that produce two dictionaries that look like this (of course, much longer):
dict1 = {'Manubar': ['string', 'string2', 'string3', 222, 23, 45], 'Schorsch': ['string', 'string2', 'string3', 122, 65, 44]}
dict2 = {'Manubar': ['string', 'string2', 543, 21, 34], 'Schorsch': ['string', 'string2', 354, 10, 65]}
I would now like to sum / multiply the last digits of the same key in dict1 and dict2 and create an new dict that should look like this:
dict3 = {'Manubar': ['string', 'string2', 'string3', 222, 23, **79** ], 'Schorsch': ['string', 'string2', 'string3', 122, 65, **109**]}
I tried to merge the two dicts, but that simply overwrites the values.
How do I get to sum the last digits and then put them in a new dict?
Slightly more verbose, but bit more efficient than the one-liner proposed in the comments:
dict3 = {}
for k in dict1:
L = dict1[k].copy()
L[-1] += dict2[k][-1]
dict3[k] = L
Which gives:
{'Manubar': ['string', 'string2', 'string3', 222, 23, 79], 'Schorsch': ['string', 'string2', 'string3', 122, 65, 109]}
In case curious, here are the perf comparsion results on my machine, using timeit:
long way: 0.275188707979396
comprehension way: 0.3881134579423815
An alternative:
import pandas as pd
dict1 = {'Manubar': ['string', 'string2', 'string3', 222, 23, 45], 'Schorsch': ['string', 'string2', 'string3', 122, 65, 44]}
dict2 = {'Manubar': ['string', 'string2', 543, 21, 34], 'Schorsch': ['string', 'string2', 354, 10, 65]}
df1 = pd.DataFrame.from_dict(dict1, orient='index')
df2 = pd.DataFrame.from_dict(dict2, orient='index')
df1.loc[:,5] += df2.loc[:,4]
{i: list(v.values()) for i, v in df1.to_dict('index').items()}
While writing this question, I came up with this:
dict3 ={}
keylist=list(dict1.keys())
xxx = 0
for _ in keylist:
key = keylist[xxx]
dict3.update({key: [dict1[key][5] + dict2[key][4]]})
xxx += 1
print(dict3)
But it feels a bit sluggish.
How do I improve my code?

make a matrix from a list of dictionaries in python3

I have a list of dictionaries like this example:
example:
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
for every single dictionary in the list I would like to sort the items based on the the Keys from A to F. and then make a list of lists (of the sorted dictionary) but only from the values of dictionary. here is the expected output:
expected output:
res = [[38799, 12953, 3742, 848, 140, 66], [23551, 8192, 2319, 568, 87, 33]]
to do so I made the following code in python:
res = []
for i in range(len(a)):
for e in sorted(a[i].keys()):
res.append(a[i][e])
but it does not return what I want. do you know how to fix it?
You want to put the result of from the dictionaries to an array, before adding to the final results
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
res = []
for i in range(len(a)):
sub_res = []
for e in sorted(a[i].keys()):
sub_res.append(a[i][e])
res.append(sub_res)
A shorter version of this would be:
res = [ [i[e] for e in sorted(i.keys())] for i in a ]
Use List comprehension. Avoid using loops.
y = [[i[key]for key in sorted(i.keys())] for i in x]
To sort items you can use built-in function sorted():
a = [{'C': 3742, 'A': 38799, 'F': 66, 'D': 848, 'B': 12953, 'E': 140}, {'C': 2319, 'A': 23551, 'F': 33, 'D': 568, 'B': 8192, 'E': 87}]
b = [[i[k] for k in sorted(i)] for i in a]
Add a list instead of adding individual elements in res list.
res = []
for i in range(len(a)):
temp = []
for e in sorted(a[i].keys()):
temp.append(a[i][e])
res.append(temp)
Here is another method using the function items of dict:
>>> [[i[1] for i in sorted(e.items())] for e in a]
[[38799, 12953, 3742, 848, 140, 66], [23551, 8192, 2319, 568, 87, 33]]
>>>
It sorts the values by keys.

How to get a mapping of country codes to international number prefixes in Python? [duplicate]

This question already has answers here:
Get country name from Country code in python?
(3 answers)
Closed 4 years ago.
I'm interested in getting a mapping of country codes to international phone number prefixes, like so:
{'US': '+1', 'GB': '+44', 'DE': '+49', ...}
One library that probably contains this information is python-phonenumbers. However, after a quick perusal of the source code I wasn't able to find where this information is stored. For example, the shortdata/region_DE.py module looks like this:
"""Auto-generated file, do not edit by hand. DE metadata"""
from ..phonemetadata import NumberFormat, PhoneNumberDesc, PhoneMetadata
PHONE_METADATA_DE = PhoneMetadata(id='DE', country_code=None, international_prefix=None,
general_desc=PhoneNumberDesc(national_number_pattern='1\\d{2,5}', possible_length=(3, 6)),
toll_free=PhoneNumberDesc(national_number_pattern='116\\d{3}', example_number='116000', possible_length=(6,)),
emergency=PhoneNumberDesc(national_number_pattern='11[02]', example_number='112', possible_length=(3,)),
short_code=PhoneNumberDesc(national_number_pattern='11(?:[025]|6(?:00[06]|1(?:1[17]|23)))', example_number='115', possible_length=(3, 6)),
short_data=True)
It seems like the country_code and international_prefix fields are None. How can I get such a mapping (possibly with a different library)?
You can get the mapping you want using pycountry and phonenumbers, along with a simple dictionary comprehension:
import phonenumbers as pn
import pycountry
dct = {c.alpha_2: pn.country_code_for_region(c.alpha_2) for c in pycountry.countries}
print(dct)
Output:
{'SK': 421, 'KI': 686, 'LV': 371, 'GH': 233, 'JP': 81, 'SA': 966, 'TD': 235, 'SX': 1, 'CY': 357, 'CH': 41, 'EG': 20, 'PA': 507, 'KP': 850, 'CO': 57, 'GW': 245, 'KG': 996, 'AW': 297, 'FM': 691, 'SB': 677, 'HR': 385, 'PY': 595, 'BG': 359, 'IQ': 964, 'ID': 62, 'GQ': 240, 'CA': 1, 'CG': 242, 'MO': 853, 'SL': 232, 'LA': 856, 'OM': 968, 'MP': 1, 'DK': 45, 'FI': 358, 'DO': 1, 'BM': 1, 'GN': 224, 'NE': 227, 'ER': 291, 'DE': 49, 'UM': 0, 'CM': 237, 'PR': 1, 'RO': 40, 'AZ': 994, 'DZ': 213, 'BW': 267, 'MK': 389, 'HN': 504, 'IS': 354, 'SJ': 47, 'ME': 382, 'NR': 674, 'AD': 376, 'BY': 375, 'RE': 262, 'PG': 675, 'SO': 252, 'NO': 47, 'CC': 61, 'EE': 372, 'BN': 673, 'AU': 61, 'HM': 0, 'ML': 223, 'BD': 880, 'GE': 995, 'US': 1, 'UY': 598, 'SM': 378, 'NG': 234, 'BE': 32, 'KY': 1, 'AR': 54, 'CR': 506, 'VA': 39, 'YE': 967, 'TR': 90, 'CV': 238, 'DM': 1, 'ZM': 260, 'BR': 55, 'MG': 261, 'BL': 590, 'FJ': 679, 'SH': 290, 'KN': 1, 'ZA': 27, 'CF': 236, 'ZW': 263, 'PL': 48, 'SV': 503, 'QA': 974, 'MN': 976, 'SE': 46, 'JE': 44, 'PS': 970, 'MZ': 258, 'TK': 690, 'PM': 508, 'CW': 599, 'HK': 852, 'LB': 961, 'SY': 963, 'LC': 1, 'IE': 353, 'RW': 250, 'NL': 31, 'MA': 212, 'GM': 220, 'IR': 98, 'AT': 43, 'SZ': 268, 'GT': 502, 'MT': 356, 'BQ': 599, 'MX': 52, 'NC': 687, 'CK': 682, 'SI': 386, 'VE': 58, 'IM': 44, 'AM': 374, 'SD': 249, 'LY': 218, 'LI': 423, 'TN': 216, 'UG': 256, 'RU': 7, 'DJ': 253, 'IL': 972, 'TM': 993, 'BF': 226, 'GF': 594, 'TO': 676, 'GI': 350, 'MH': 692, 'UZ': 998, 'PF': 689, 'KZ': 7, 'GA': 241, 'PE': 51, 'TV': 688, 'BT': 975, 'MQ': 596, 'MF': 590, 'AF': 93, 'IN': 91, 'AX': 358, 'BH': 973, 'JM': 1, 'MY': 60, 'BO': 591, 'AI': 1, 'SR': 597, 'ET': 251, 'ES': 34, 'TF': 0, 'GU': 1, 'BJ': 229, 'SS': 211, 'KE': 254, 'BZ': 501, 'IO': 246, 'MU': 230, 'CL': 56, 'MD': 373, 'LU': 352, 'TJ': 992, 'EC': 593, 'VG': 1, 'NZ': 64, 'VU': 678, 'FO': 298, 'LR': 231, 'AL': 355, 'GB': 44, 'AS': 1, 'IT': 39, 'TC': 1, 'TW': 886, 'BI': 257, 'HU': 36, 'TL': 670, 'GG': 44, 'PN': 0, 'SG': 65, 'LS': 266, 'KH': 855, 'FR': 33, 'BV': 0, 'CX': 61, 'AE': 971, 'LT': 370, 'PT': 351, 'KR': 82, 'BB': 1, 'TG': 228, 'AQ': 0, 'EH': 212, 'AG': 1, 'VN': 84, 'CI': 225, 'BS': 1, 'GL': 299, 'MW': 265, 'NU': 683, 'NF': 672, 'LK': 94, 'MS': 1, 'GP': 590, 'NP': 977, 'PW': 680, 'PK': 92, 'WF': 681, 'BA': 387, 'KM': 269, 'JO': 962, 'CU': 53, 'GR': 30, 'YT': 262, 'RS': 381, 'NA': 264, 'ST': 239, 'SC': 248, 'CN': 86, 'CD': 243, 'GS': 0, 'KW': 965, 'MM': 95, 'AO': 244, 'MV': 960, 'UA': 380, 'TT': 1, 'FK': 500, 'WS': 685, 'CZ': 420, 'PH': 63, 'VI': 1, 'TZ': 255, 'MR': 222, 'MC': 377, 'SN': 221, 'HT': 509, 'VC': 1, 'NI': 505, 'GD': 1, 'GY': 592, 'TH': 66}
I have just found a python library that must be perfect for your problem.
It's called PhoneISO3166.
This is the github link: GitHub phoneiso3166

Highlighting multiple cells in different colors with Pandas

Imagine we have a dataframe and I want to color different cells:
Cells ['Arizona','company'](1st), ['Texas','size'](1099) as green.
Cells ['Florida','veterans'](26), ['Maine','armored'](0) as red.
What's a good way to do it?
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters', 'origin'])
df = df.set_index('origin')
df.head()
(http://chrisalbon.com/python/pandas_indexing_selecting.html)
You can use slicing in Style with parameter subset and function Styler.applymap for elementwise styles, run code in jupyter notebook:
import pandas as pd
import numpy as np
def red(val):
color = 'red'
return 'background-color: %s' % color
def green(val):
color = 'green'
return 'background-color: %s' % color
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size', 'veterans', 'readiness', 'armored', 'deserters', 'origin'])
df = df.set_index('origin')
print (df)
df.style.applymap(green, subset=pd.IndexSlice['Arizona':'Texas', 'company': 'size'])
.applymap(red, subset=pd.IndexSlice['Florida':'Maine', 'veterans': 'armored'])
If need change only some values in DataFrame, you can use Styler.apply with axis=None for tablewise styles, also the function must return a DataFrame with the same index and column labels:
def create_colors(x):
#copy df to new - original data are not changed
df1 = x.copy()
#select all values to default value - no color
df1.loc[:,:] = 'background-color: '
#overwrite values with green and red color
df1.loc['Arizona', 'company'] = 'background-color: green'
df1.loc['Texas', 'size'] = 'background-color: green'
df1.loc['Florida', 'veterans'] = 'background-color: red'
df1.loc['Maine', 'armored'] = 'background-color: red'
#return color df
return df1
df.style.apply(create_colors, axis=None)
Try this example from http://melissagymrek.com/python/2014/01/12/ipython-tables.html
from ipywidgets import *
import pandas as pd
df = pd.DataFrame({"x":[1,2,3], "y":[6,4,3], "z":["testing","pretty","tables"], "f":[0.023432, 0.234321,0.5555]})
pt = PrettyTable(df)
pt
# Set cell style using a CellStyle object
pt = PrettyTable(df, tstyle=TableStyle(theme="theme1"), center=True)
cs = CellStyle()
cs.set("background-color", "red")
cs.set("color", "white")
pt.set_cell_style(style=cs)
pt
http://melissagymrek.com/python/2014/01/12/ipython-tables.html

Converting some columns from pandas dataframe to list of lists

I have a dataframe. I would like some of the data to be converted to a list of list. The columns I'm interested in are the index, Name, and Births. My code works, but it seems inefficient and for some reason the letter L is added to the end of each index.
My code:
import pandas as pd
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
indexes = df.index.values.tolist()
mylist = [[x] for x in indexes]
for x in mylist:
x.extend([df.ix[x[0],'Names'], df.ix[x[0],'Births']])
print mylist
Desired Output:
[[0, 'Bob', 968], [1, 'Jessica', 341], [2, 'Mary', 77], [3, 'John', 578], [4, 'Mel', 434]]
Why not just use .values.tolist() as you mentioned?
import pandas as pd
# your data
# =================================================
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
# nested list
# ============================
df.reset_index()[['index', 'Names', 'Births']].values.tolist()
Out[46]:
[[0, 'Bob', 968],
[1, 'Jessica', 341],
[2, 'Mary', 77],
[3, 'John', 578],
[4, 'Mel', 434]]
Ok, this works (based on Jianxun Li's answer and comments):
import pandas as pd
# Data
data = [['Bob', 968, 'Male'], ['Jessica', 341, 'Female'], ['Mary', 77, 'Female'], ['John', 578, 'Male'], ['Mel', 434, 'Female']]
headers = ['Names', 'Births', 'Gender']
df = pd.DataFrame(data = data, columns=headers)
# Output
print df.reset_index()[['index', 'Names', 'Births']].values.astype(str).tolist()
Thank you Jianxun Li, this also helped me :-)
In general, one can use the following to transform the complete dataframe into a list of lists (which is what I needed):
df.values.astype(str).tolist()

Categories

Resources