Read two column CSV as dict with 1st column as key

Read two column CSV as dict with 1st column as key - python

I have a CSV with two columns, column one is the team dedicated to a particular building in our project.
The second column is the actual building number.
What I am looking for is a dictionary with the first column as the key and the buildings that belong to that team in the list.
I have tried various forms of csv.reader and csv.DictReader along with different for loops to rewrite the data to another dictionary, but I cannot get the structure I want.
CSV:
team,bldg,
3,204,
3,250,
3,1437,
2,1440,
1,1450,
The structure of the dictionary would be as follows:
dict["1"] = ["1450"]
dict["2"] = ["1440"]
dict["3"] = ["204", "250", "1437"]

This works:
import csv
result={}
with open('/tmp/test.csv','r') as f:
red=csv.DictReader(f)
for d in red:
result.setdefault(d['team'],[]).append(d['bldg'])
#results={'1': ['1450'], '3': ['204', '250', '1437'], '2': ['1440']}

The useful collections.defaultdict in the standard library makes short work of this task:
import csv
import collections as co
dd = co.defaultdict(list)
with open('/path/to/your.csv'),'rb') as fin:
dr = csv.DictReader(fin)
for line in dr:
dd[line['team']].append(line['bldg'])
# defaultdict(<type 'list'>, {'1': ['1450'], '3': ['204', '250', '1437'], '2': ['1440']})
http://docs.python.org/2/library/collections.html#collections.defaultdict
The first argument provides the initial value for the default_factory
attribute; it defaults to None.

Related

Create a nested dict containing list from a file

For example, for the txt file of
Math, Calculus, 5
Math, Vector, 3
Language, English, 4
Language, Spanish, 4
into the dictionary of:
data={'Math':{'name':[Calculus, Vector], 'score':[5,3]}, 'Language':{'name':[English, Spanish], 'score':[4,4]}}
I am having trouble with appending value to create list inside the smaller dict. I'm very new to this and I would not understand importing command. Thank you so much for all your help!

For each line, find the 3 values, then add them to a dict structure
from pathlib import Path
result = {}
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
if subject_type not in result:
result[subject_type] = {'name': [], 'score': []}
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
You can simplify it with the use of a defaultdict that creates the mapping if the key isn't already present
result = defaultdict(lambda: {'name': [], 'score': []}) # from collections import defaultdict
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
With pandas.DataFrame you can directly the formatted data and output the format you want
import pandas as pd
df = pd.read_csv("test.txt", sep=", ", engine="python", names=['key', 'name', 'score'])
df = df.groupby('key').agg(list)
result = df.to_dict(orient='index')

From your data:
data={'Math':{'name':['Calculus', 'Vector'], 'score':[5,3]},
'Language':{'name':['English', 'Spanish'], 'score':[4,4]}}
If you want to append to the list inside your dictionary, you can do:
data['Math']['name'].append('Algebra')
data['Math']['score'].append(4)
If you want to add a new dictionary, you can do:
data['Science'] = {'name':['Chemisty', 'Biology'], 'score':[2,3]}
I am not sure if that is what you wanted but I hope it helps!

Is there a way to sort a dictionary from the outside in

I'm trying to create an event manager in which a dictionary stores the events like this
my_dict = {'2020':
{'9': {'8': ['School ']},
'11': {'13': ['Doctors ']},
'8': {'31': ['Interview']}
},
'2021': {}}
In which the outer key is the year the middle key is a month and the most inner key is a date which leads to a list of events.
I'm trying to first sort it so that the months are in order then sort it again so that the days are in order. Thanks in advance

Use-case
DevOrangeCrush wishes to sort on keys in a nested dictionary where the nesting occurs on multiple levels
Solution
Normalize the data so that the dates match ISO8601 format, for easier sorting
In plain English, this means make sure you always use two digits for month and date, and always use four digits for year
Re-normalize the original dictionary data structure into a single list of dictionaries, where each dictionary represents a row, and the list represents an outer containing table
this is known as an Array of Hashes in perl-speak
this is known as a list of objects in JSON-speak
Once your data is restructured you are solving a much more well-known, well-documented, and more obvious problem, how to sort a simple list of dictionaries (which is already documented in the See also section of this answer).
Example
import pprint
## original data is formatted as a nested dictionary, which is clumsy
my_dict = {'2020':
{'9': {'8': ['School ']}, '11':
{'13': ['Doctors ']},'8':
{'31': ['Interview']}}, '2021': {}
}
## we want the data formatted as a standard table (aka list of dictionary)
## this is the most common format for this kind of data as you would see in
## databases and spreadsheets
mydata_table = []
ddtemp = dict()
for year in my_dict:
for month in my_dict[year].keys():
ddtemp['month'] = '{0:02d}'.format(*[int(month)])
ddtemp['year'] = year
for day in my_dict[year][month].keys():
ddtemp['day'] = '{0:02d}'.format(*[int(day)])
mydata_row = dict()
mydata_row['year'] = '{year}'.format(**ddtemp)
mydata_row['month'] = '{month}'.format(**ddtemp)
mydata_row['day'] = '{day}'.format(**ddtemp)
mydata_row['task_list'] = my_dict[year][month][day]
mydata_row['date'] = '{year}-{month}-{day}'.format(**ddtemp)
mydata_table.append(mydata_row)
pass
pass
pass
## output result is now easily sorted and there is no data loss
## you will have to modify this if you want to deal with years that
## do not have any associated task_list data
pprint.pprint(mydata_table)
'''
## now we have something that can be sorted using well-known python idioms
## and easily manipulated using data-table semantics
## (search, sort, filter-by, group-by, select, project ... etc)
[
{'date': '2020-09-08','day': '08',
'month': '09','task_list': ['School '],'year': '2020'},
{'date': '2020-11-13','day': '13',
'month': '11','task_list': ['Doctors '],'year': '2020'},
{'date': '2020-08-31','day': '31',
'month': '08','task_list': ['Interview'],'year': '2020'},
]
'''
See also
How to sort a python list-of-dictionary
How to sort objects by multiple keys
Why you should use ISO8601 date format
ISO8601 vs timestamp

To get sorted events data, you can do something like this:
def sort_events(my_dict):
new_events_data = dict()
for year, month_data in my_dict.items():
new_month_data = dict()
for month, day_data in month_data.items():
sorted_day_data = sorted(day_data.items(), key=lambda kv: int(kv[0]))
new_month_data[month] = OrderedDict(sorted_day_data)
sorted_months_data = sorted(new_month_data.items(), key=lambda kv: int(kv[0]))
new_events_data[year] = OrderedDict(sorted_months_data)
return new_events_data
Output:
{'2020': OrderedDict([('8', OrderedDict([('31', ['Interview'])])),
('9', OrderedDict([('8', ['School '])])),
('11', OrderedDict([('13', ['Doctors '])]))]),
'2021': OrderedDict()}

A simple dict can't be ordered, you could do it using a OrderedDict but if you simply need to get it sorted while iterating on it do like this
for year in sorted(map(int, my_dict)):
year_dict = my_dict[str(year)]
for month in sorted(map(int, year_dict)):
month_dict = year_dict[str(month)]
for day in sorted(map(int, month_dict)):
events = month_dict[str(day)]
for event in events:
print(year, month, day, event)
Online Demo
The conversion to int is to ensure right ordering between the numbers, without you'll get 1, 10, 11, .., 2, 20, 21

A dictionary in Python does not have an order, you might want to try the OrderedDict class from the collections Module which remembers the order of insertion.
Of course you would have to sort and reinsert the elements whenever you insert a new element which should be placed before any of the existing elements.
If you care about order maybe a different data structure works better. For example a list of lists.

How to combine two dictionaries using python

import csv
def partytoyear():
party_in_power = {}
with open("presidents.txt") as f:
reader = csv.reader(f)
for row in reader:
party = row[1]
for year in row[2:]:
party_in_power[year] = party
print(party_in_power)
return party_in_power
partytoyear()
def statistics():
with open("BLS_private.csv") as f:
statistics = {}
reader = csv.DictReader(f)
for row in reader:
statistics = row
print(statistics)
return statistics
statistics()
These two functions return two dictionaries.
Here is a sample of the first dictionary:
'Democrat', '1981': 'Republican', '1982': 'Republican', '1983'
Sample of the second dictionary:
'2012', '110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'
The first dictionary associates a year and the political party. The next dictionary associates the year with job statistics.
I need to combine these two dictionaries, so I can have the party inside the dictionary with the job statistics.
I would like the dictioary to look like this:
'Democrat, '2012','110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'
How would I go about doing this? I've looked at the syntax for update() but that didn't work for my program

You can’t have a dictionary in that manor in python it’s syntactically wrong but you can have each value be a collection such as a list. Here’s a comprehension that does just that using dict lookups:
first_dict = {'Democrat': '1981': 'Republican': '1982': 'Republican': '1983', ...}
second_dict = {'2012': ['110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'], ...}
result = {party: [year, *second_dict[year] for party, year in first_dict.items()}
Pseudo result dict structure:
{'Party Name': [year, stats, ...], ...}

Filter Pandas DataFrames Using Dynamic URL Query String

Currently i am having an question in python pandas. I want to filter a dataframe using url query string dynamically.
For eg:
CSV:
url: http://example.com/filter?Name=Sam&Age=21&Gender=male
Hardcoded:
filtered_data = data[
(data['Name'] == 'Sam') &
(data['Age'] == 21) &
(data['Gender'] == 'male')
];
I don't want to hard code the filter keys like before because the csv file changes anytime with different column headers.
Any suggestions

The easiest way to create this filter dynamically is probably to use np.all.
For example:
import numpy as np
query = {'Name': 'Sam', 'Age': 21, 'Gender': 'male'}
filters = [data[k] == v for k, v in query.items()]
filter_data = data[np.all(filters, axis=0)]

use df.query. For example
df = pd.read_csv(url)
conditions = "Name == 'Sam' and Age == 21 and Gender == 'Male'"
filtered_data = df.query(conditions)
You can build the conditions string dynamically using string formatting like
conditions = " and ".join("{} == {}".format(col, val)
for col, val in zip(df.columns, values)

Typically, your web framework will return the arguments in a dict-like structure. Let's say your args are like this:
args = {
'Name': ['Sam'],
'Age': ['21'], # Note that Age is a string
'Gender': ['male']
}
You can filter your dataset successively like this:
for key, values in args.items():
data = data[data[key].isin(values)]
However, this is likely not to match any data for Age, which may have been loaded as an integer. In that case, you could load the CSV file as a string via pd.read_csv(filename, dtype=object), or convert to string before comparison:
for key, values in args.items():
data = data[data[key].astype(str).isin(values)]
Incidentally, this will also match multiple values. For example, take the URL http://example.com/filter?Name=Sam&Name=Ben&Age=21&Gender=male -- which leads to the structure:
args = {
'Name': ['Sam', 'Ben'], # There are 2 names
'Age': ['21'],
'Gender': ['male']
}
In this case, both Ben and Sam will be matched, since we're using .isin to match.

Keep some keys in my list with comprehension?

I have a big list that I pulled in from a .csv:
CSV_PATH = 'myfile.csv'
CSV_OBJ = csv.DictReader(open(CSV_PATH, 'r'))
CSV_LIST = list(CSV_OBJ)
And I only want to keep some of the columns in it:
KEEP_COLS = ['Name', 'Year', 'Total Allocations', 'Enrollment']'
It seems from Removing multiple keys from a dictionary safely like this ought to work:
BETTER = {k: v for k, v in CSV_LIST if k not in KEEP_COLS}
But I get an error: ValueError: too many values to unpack What am I missing here? I could write a loop that runs through CSV_LIST and produces BETTER by keeping only what I want, but I suspect that using comprehension is more pythonic.
As requested, a chunk of CSV_LIST
{'EIN': '77-0000091',
'FR': '28.4',
'Name': 'Org A',
'Enrollment': '506',
'Total Allocations': '$34214',
'geo_latitude': '37.9381775755',
'geo_longitude': '-122.3146910612',
'Year': '2009'},
{'EIN': '77-0000091',
'FR': '28.4',
'Name': 'Org A',
'Enrollment': '506',
'Total Allocations': '$34214',
'geo_latitude': '37.9381775755',
'geo_longitude': '-122.3146910612',
'Year': '2010'}
At the commandline I can do csvcut -c 'Name','Year','Total Allocations','Enrollment' myfile.csv > better_myfile.csv but that's definitely not pythonic.

Your dictionary comprehension is fine, but since you have a list of dictionaries, you have to create a list comprehension using that dictionary comprehension for the individual list items. Also, since you want to keep those columns, I guess you should drop that not. Try this:
[{k: v for k, v in d.items() if k in KEEP_COLS} for d in CSV_LIST]

An alternative is to use
CSV_LIST = map(operator.itemgetter(*KEEP_LIST), CSV_OBJ)
This will create a list of tuples with the desired columns.

The issue is that CSV_LIST is a list, not a single dict. #tobias explained how to unpack it correctly.
However, if you're worried about being Pythonic, why are you processing a DictReader into a list of dictionaries and then filtering out all but a few keys? Without knowing your use case I can't be sure, but it's likely that it would be cleaner and simpler to just use the DictReader row-by-row the way it was intended to be used:
with open(CSV_PATH, 'r') as f:
for row in csv.DictReader(f):
process(row['Name'],row['Year'],row['Total Allocations'],row['Enrollment'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read two column CSV as dict with 1st column as key - python

This works: import csv result={} with open('/tmp/test.csv','r') as f: red=csv.DictReader(f) for d in red: result.setdefault(d['team'],[]).append(d['bldg']) #results={'1': ['1450'], '3': ['204', '250', '1437'], '2': ['1440']}

Related

Create a nested dict containing list from a file

Is there a way to sort a dictionary from the outside in

How to combine two dictionaries using python

Filter Pandas DataFrames Using Dynamic URL Query String

Keep some keys in my list with comprehension?

Categories

Resources