Dictionary of dictionaries from list of lists with year keys - python

I have a list of list like this:
[['2014', 'MALE', 'WHITE NON HISPANIC', 'Zachary', '90', '39'],
['2014', 'MALE', 'WHITE NON HISPANIC', 'Zev', '49', '65']]
I want to converted in a dictionary like this:
{{2012: {1: 'David',
2: 'Joseph',
3: 'Michael',
4: 'Moshe'},
2013: {1: 'David',
2: 'Joseph',
3: 'Michael',
4: 'Moshe'},
I'm trying to do a list comprehension like this:
boy_names = {row[0]:{i:row[3]} for i,row in enumerate(records) if row[1]=='MALE'}
But the result I'm getting is like:
{'2011': {7851: 'Zev'}, '2012': {9855: 'Zev'},
'2013': {11886: 'Zev'}, '2014': {13961: 'Zev'}}
If I'm right, I think I'm taking the last value and its row number from enumerate by the year key, but no idea how to solve it.

You can use the length of the sub-dict under the year key to calculate the next incremental numeric key for the sub-dict under the current year. Use the dict.setdefault method to default the value of a new key to an empty dict:
boy_names = {}
for year, _, _, name, _, _ in records:
record = boy_names.setdefault(int(year), {})
record[len(record) + 1] = name

I believe you need
data = [['2014', 'MALE', 'WHITE NON HISPANIC', 'Zachary', '90', '39'],
['2014', 'MALE', 'WHITE NON HISPANIC', 'Zev', '49', '65']]
result = {}
for i in data: #Iterate sub-list
result.setdefault(i[0], []).append(i[3]) #Form DICT
result = {k: dict(enumerate(v, 1)) for k, v in result.items()} #Use enumerate to get index number
print(result)
# {'2014': {1: 'Zachary', 2: 'Zev'}}

Related

create JSON from list

i have list
li = ['Peter', '22', 'DE']
and i want to create json from this list so i need to add to every parameter in list a name, so outpur will be something like this
li = ['name':'Peter', 'age':'22', 'nationality':'DE']
i = 0
while i < len(li):
li[i].insert(0,'name:')
i += 1
print(li)
this adding to every added name a coma...how can i add to this list without comma?
because output from this is :
['name','Peter', '22', 'DE']
li is the list object which you are iterating over so any operations on that will act as operations performed over the list.
insert() will insert elements in the existing list.
You can use zip() with dict() after creating the keys for the dictionary to get the desired output:
li = ['Peter', '22', 'DE']
keys = ['name', 'age', 'nationality']
di = dict(zip(keys, li))
Update: You can use list comprehension for list of lists:
li = [ ['Peter', '22', 'DE'], ['John', '28', 'GB'] ]
keys = ['name', 'age', 'nationality']
di = [dict(zip(keys, l)) for l in li]
print(di)
You could use zip, to combine all values of both lists and convert it to a dict:
li = ['Peter', '22', 'DE']
keys = ['name', 'age', 'nationality']
print(dict(zip(keys, li)))
Out:
{'name': 'Peter', 'age': '22', 'nationality': 'DE'}
Generate Keys and then map those keys with value present in the list to create a dictionary convert dictionary to JSON
ky = ['Name', 'Age', 'Nationality']
li = ['Peter', '22', 'DE']
data = {k:v for (k,v) in zip(ky, li)}
What you need is not a list, but a dictionary. Dictionaries are collections paired with keys and values:
my_dict = {"name" : "Peter", "age" : 22, "nationality" : "DE"}
If you need to build a dictionary over a list, you can do it with zip() method:
li = ['Peter', '22', 'DE']
keys = ["name", 'age', 'nationality']
dictionary = dict(zip(keys, li)) # {'name': 'Peter', 'age': '22', 'nationality': 'DE'}
If you really need to work with JSON files, then i suggest looking into this link for a clear explanation.

Replace empty values of a dictionary with NaN

I have a dictionary with missing values (the key is there, but the associated value is empty). For example I want the dictionary below:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
to be changed to this form:
dct = {'ID':NaN, 'gender':'male', 'age':'20', 'weight':NaN, 'height':'5.7'}
How can I write that in the most time-efficient way?
You can use a dictionary comprehension. Also as was noted in the comments, naming something dict in Python is not good practice.:
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
dct = {k: None if not v else v for k, v in dct.items() }
print(dct)
Output:
{'ID': None, 'gender': 'male', 'age': '20', 'weight': None, 'height': '5.7'}
Just replace None with whatever you want it to default to.
In your question, you want to replace with NaN.
You can use any of the following:
float('nan') if you are using Python 2.x, or with Python <3.5
math.nan for Python 3.5+
numpy.nan using numpy
You can use implicit syntax with boolean or expression:
In [1]: dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
In [2]: {k: v or None for k, v in dct.items()}
Out[2]: {'ID': None, 'age': '20', 'gender': 'male', 'height': '5.7', 'weight': None}
But be aware that in The Zen of Python it's said:
Explicit is better than implicit.
You can create a class object to represent NaN:
class NaN:
def __init__(self, default=None):
self.val = default
def __repr__(self):
return 'NaN'
dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'}
new_d = {a:NaN() if not b else b for a, b in dct.items()}
Output:
{'gender': 'male', 'age': '20', 'ID': NaN, 'weight': NaN, 'height': '5.7'}
You can use a for loop to iterate over all of they keys and values in the Dictionary.
dct = {'ID': '', 'gender': 'male', 'age': '20', 'weight': '', 'height': '5.7'}
for key, value in dct.items():
if value == '':
dct[key] = 'NaN'
print(dct)
You created your dictionary with a series of key value pairs.
I used a for loop and the .items() method to iterate over each key value pair in your dictionary.
if the value of the key/value pair is an empty string, We change the that particular value to 'NaN' and leave the rest unchanged.
When we print the new dictionary we get this output:
{'ID': 'NaN', 'gender': 'male', 'age': '20', 'weight': 'NaN', 'height': '5.7'}
This is time efficient because it is a quick loop, so long as you are okay with not 'NaN' values being strings. I am not sure if you are looking for them to be strings, however, you can change the value from 'NaN' to None very simply if that is what you are looking for. The for loop is relatively efficient in terms of time since it will iterate over each value quickly.

Grouping similar values in a dictionary

I'm new to programming and would appreciate if someone can help with the following in Python/Pandas.
I have a dictionary that has a list as the values. I'd like to be able to group together keys that have similar values. I've seen similar questions on here, but the catch in this case is i want to disregard the order of the values for example:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
jack and charles have the same values but in different order. I'd like an output that will give the value irrespective of order. In this case, the output would be written to a csv as
['20','male','soccer']: jack, charles
['26','male','tennis']: brian
['19','basketball','male']: zulu
Using frozensets, apply, groupby + agg:
s = pd.DataFrame(classmates).T.apply(frozenset, 1)
s2 = pd.Series(s.index.values, index=s)\
.groupby(level=0).agg(lambda x: list(x))
s2
(soccer, 20, male) [charles, jack]
(26, male, tennis) [brian]
(basketball, male, 19) [zulu]
dtype: object
You can invert the dictionary in the way you want with the following code:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
out_dict = {}
for key, value in classmates.items():
current_list = out_dict.get(tuple(sorted(value)), [])
current_list.append(key)
out_dict[tuple(sorted(value))] = current_list
print(out_dict)
This prints
{('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}
from collections import defaultdict
ans = defaultdict(list)
classmates={'jack':['20','male','soccer'],
'brian':['26','male','tennis'],
'charles':['male','soccer','20'],
'zulu':['19','basketball','male']
}
for k, v in classmates.items():
sorted_tuple = tuple(sorted(v))
ans[sorted_tuple].append(k)
# ans is: a dict you desired
# defaultdict(<class 'list'>, {('20', 'male', 'soccer'): ['jack','charles'],
# ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']})
for k, v in ans.items():
print(k, ':', v)
# output:
# ('20', 'male', 'soccer') : ['jack', 'charles']
# ('26', 'male', 'tennis') : ['brian']
# ('19', 'basketball', 'male') : ['zulu']
First of all convert your dictionary to a pandas dataframe.
df= pd.DataFrame.from_dict(classmates,orient='index')
Then sort it in ascending order by age.
df=df.sort_values(by=0,ascending=True)
Here 0 is a default column name. You can rename this column name.
You could do this in one line:
print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})
or
Here is detailed solution :
dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
'zulu': ['19', 'basketball', 'male']}
sorted_dict = {}
for key,value in dict_1.items():
sorted_1 = sorted(value)
sorted_dict[key] = sorted_1
tracking_of_duplicate = []
final_dict = {}
for key1,value1 in sorted_dict.items():
if value1 not in tracking_of_duplicate:
tracking_of_duplicate.append(value1)
final_dict[tuple(value1)] = [key1]
else:
final_dict[tuple(value1)].append(key1)
print(final_dict)

Matching and Appending

I'm trying to figure out how to run 1 list through another list, and whenever the first names match, append it to the new list if it exists
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
So it looks something like this.
list3 = [["Ryan","Canada","10"],["John","United States","30"],["Jake","Spain","15"]
So far I haven't really been able to even come close, so even the smallest guidance would be much appreciated. Thanks.
You could transform them into dictionaries and then use a list comprehension:
dic1 = dict(list1)
dic2 = dict(list2)
list3 = [[k,dic2[k],dic1[k]] for k in dic2 if k in dic1]
If ordering isn't a concern, the most straightforward way is to convert the lists into more suitable data structures: dictionaries.
ages = dict(list1)
countries = dict(list2)
That'll make it a cinch to combine the pieces of data:
>>> {name: [ages[name], countries[name]] for name in ages.keys() & countries.keys()}
{'Ryan': ['10', 'Canada'], 'Jake': ['15', 'Spain'], 'John': ['30', 'United States']}
Or even better, use nested dicts:
>>> {name: {'age': ages[name], 'country': countries[name]} for name in ages.keys() & countries.keys()}
{'Ryan': {'country': 'Canada', 'age': '10'},
'Jake': {'country': 'Spain', 'age': '15'},
'John': {'country': 'United States', 'age': '30'}}
If the names are unique you can make list1 into a dictionary and then loop through list2 adding items from this dictionary.
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
list1_dict = dict(list1)
output = [item + [list1_dict[item[0]]] for item in list2]
If not, then you need to decide how to deal with cases of duplicate names.
You can use a set and an OrderedDict to combine the common names and keep order:
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
from collections import OrderedDict
# get set of names from list2
names = set(name for name,_ in list2)
# create an OrderedDict using name as key and full sublist as value
# filtering out names that are not also in list2
d = OrderedDict((sub[0], sub) for sub in list1 if sub[0] in names)
for name, country in list2:
if name in d:
# add country from each sublist with common name
d[name].append(country)
print(d.values()) # list(d.values()) for python3
[['Ryan', '10', 'Canada'], ['John', '30', 'United States'], ['Jake', '15', 'Spain']]
If list2 always has common names you can remove the if name in d:

Use of dictionary in Python

I'm writing a concept learning programs, where I need to convert from index to the name of categories.
For example:
# binary concept learning
# candidate eliminaton learning algorithm
import numpy as np
import csv
def main():
d1={0:'0', 1:'Japan', 2: 'USA', 3: 'Korea', 4: 'Germany', 5:'?'}
d2={0:'0', 1:'Honda', 2: 'Chrysler', 3: 'Toyota', 4:'?'}
d3={0:'0', 1:'Blue', 2:'Green', 3: 'Red', 4:'White', 5:'?'}
d4={0:'0', 1:1970,2:1980, 3:1990, 4:2000, 5:'?'}
d5={0:'0', 1:'Economy', 2:'Sports', 3:'SUV', 4:'?'}
a=[0,1,2,3,4]
print a
if __name__=="__main__":
main()
So [0,1,2,3,4] should convert to ['0', 'Honda', 'Green', '1990', '?']. What is the most pythonic way to do this?
I think you need a basic dictionary crash course:
this is a proper dictionary:
>>>d1 = { 'tires' : 'yoko', 'manufacturer': 'honda', 'vtec' : 'no' }
You can call invidual things in the dictionary easily:
>>>d1['tires']
'yoko'
>>>d1['vtec'] = 'yes' #mad vtec yo
>>>d1['vtec']
'yes'
Dictionaries are broken up into two different sections, the key and the value
testDict = {'key':'value'}
You were using a dictionary the exact same way as a list:
>>>test = {0:"thing0", 1:"thing1"} #dictionary
>>>test[0]
'thing0'
which is pretty much the exact same as saying
>>>test = ['thing0','thing1'] #list
>>>test[0]
'thing0'
in your particular case, you may want to either format your dictionaries properly ( i would suggest something like masterdictionary = {'country': ['germany','france','USA','japan], 'manufacturer': ['honda','ferrarri','hoopty'] } etcetera because you could call each individual item you wanted a lot easier
with that same dictionary:
>>>masterdictionary['country'][1]
'germany'
which is
dictionaryName['key'][iteminlistindex]
of course there is nothing preventing you from putting dictionaries as values inside of dictionaries.... inside values of other dictionaries...
You can do:
data = [d1,d2,d3,d4,d5]
print [d[key] for key, d in zip(a, data)]
The function zip() can be used to combine to iterables; lists in this case.
You've already got the answer to your direct question, but you may wish to consider re-structuring the data. To me, the following makes a lot more sense, and will enable you to more easily index into it for what you asked, and for any possible later queries:
from pprint import pprint
items = [[el.get(i, '?') for el in (d1,d2,d3,d4,d5)] for i in range(6)]
pprint(items)
[['0', '0', '0', '0', '0'],
['Japan', 'Honda', 'Blue', 1970, 'Economy'],
['USA', 'Chrysler', 'Green', 1980, 'Sports'],
['Korea', 'Toyota', 'Red', 1990, 'SUV'],
['Germany', '?', 'White', 2000, '?'],
['?', '?', '?', '?', '?']]
I would use a list of dicts d = [d1, d2, d3, d4, d5], and then a list comprehension:
[d[i][key] for i, key in enumerate(a)]
To make the whole thing more readable, use nested dictionaries - each of your dictionaries seems to represent something you could give a more descriptive name than d1 or d2:
data = {'country': {0: 'Japan', 1: 'USA' ... }, 'brand': {0: 'Honda', ...}, ...}
car = {'country': 1, 'brand': 2 ... }
[data[attribute][key] for attribute, key in car.items()]
Note this would not necessarily be in order if that is important, though I think there is an ordered dictionary type.
As suggested by the comment, a dictionary with contiguous integers as keys can be replaced by a list:
data = {'country': ['Japan', 'USA', ...], 'brand': ['Honda', ...], ...}
If you need to keep d1, d2, etc. as is:
newA = [locals()["d%d"%(i+1)][a_value] for i,a_value in enumerate(a)]
Pretty ugly, and fragile, but it should work with your existing code.
You don't need a dictionary for this at all. Lists in python automatically support indexing.
def main():
d1=['0','Japan','USA','Korea','Germany',"?"]
d2=['0','Honda','Chrysler','Toyota','?']
d3=['0','Blue','Green','Red','White','?']
d4=['0', 1970,1980,1990,2000,'?']
d5=['0','Economy','Sports','SUV','?']
ds = [d1, d2, d3, d4, d5] #This holds all your lists
#This is what range is for
a=range(5)
#Find the nth index from the nth list, seems to be what you want
print [ds[n][n] for n in a] #This is a list comprehension, look it up.

Categories

Resources