Simplify the use of multiple hashings in Python - python

I have a CSV file with about 700 rows and 3 columns, containing label, rgb and string information, e.g.:
str; rgb; label; color
bones; "['255','255','255']"; 2; (241,214,145)
Aorta; "['255','0','0']"; 17; (216,101,79)
VenaCava; "['0','0','255']"; 16; (0,151,206)
I'd like to create a simple method to convert one unique input to one unique output.
One solution would be to hash all ROIDisplayColor entries with corresponding label entries as dictionary e.g. rgb2label:
with open("c:\my_file.csv") as csv_file:
rgb2label, label2rgb = {}, {} # rgb2str, label2str, str2label...
for row in csv.reader(csv_file):
rgb2label[row[1]] = row[2]
label2rgb[row[2]] = row[1]
This could simply be used as follows:
>>> rgb2label[ "['255','255','255']"]
'2'
>>> label2rgb['2']
"['255','255','255']"
The application is sumple but requires an unique unique dictionary for every relation (rgb2label,rgb2str,str2rgb,str2label, etc...).
Does a more compact solution with the same ease of use exist?

Here you're limiting yourself to one-to-one dictionaries, so you end up with loads of them (4^2=16 here).
You could instead use one-to-many dictionaries, so you'll have only 4:
for row in csv.reader(csv_file):
rgb[row[1]] = row
label[row[2]] = row
That you would use like this:
>>> rgb[ "['255','255','255']"][2]
'2'
>>> label['2'][1]
"['255','255','255']"
You could make this clearer by turning your row into a dict as well:
for row in csv.reader(csv_file):
name, rgb, label, color = row
d = {"rgb": rgb, "label": label}
rgb[row[1]] = d
label[row[2]] = d
That you would use like this:
>>> rgb[ "['255','255','255']"]["label"]
'2'
>>> label['2']["rgb"]
"['255','255','255']"

Related

add multiple values to one key, but defaultdict only allows 2

In the CSV I'm reading from, there are multiple rows for each ID:
ID,timestamp,name,text
444,2022-03-01T11:05:00.000Z,Amrita Patel,Hello
444,2022-03-01T11:06:00.000Z,Amrita Patel,Nice to meet you
555,2022-03-01T12:05:00.000Z,Zach Do,Good afternoon
555,2022-03-01T11:06:00.000Z,Zach Do,I like oranges
555,2022-03-01T11:07:00.000Z,Zach Do,definitely
I need to extract each such that I will have one file per ID, with the timestamp, name, and text in that file. For example, for ID 444, it will have 2 timestamps and 2 different texts in it, along with the name.
I'm able to get the text designated to the proper ID, using this code:
from collections import defaultdict
d = {}
l = []
list_of_lists = []
for k in csv_file:
l.append([k['ID'],k['text']])
list_of_lists.append(l)
for key, val in list_of_lists[0]:
d.setdefault(key, []).append(val)
The problem is that this isn't enough, I need to add in the other values to the one ID key. If I try:
l.append([k['ID'],[k['text'],k['name']]])
I get
ValueError: too many values to unpack
Just use a list for value instead,
{key: [value1, value2], ...}

assign var in for loop

I have created the for loop to "extract" column from list of lists, now I want to assign this list to a variable.
How to do it?
I have the following for loop:
j = 1
for i in range(len(table)):
row = table[i]
print(row[j])
and the table looks like:
NAME
Bart First
Maria Great
Theresa Green
I would like to do some other "operations" on that column and I guess would be easy to use functions if that column is assign to variable...but I have no idea how to do it. (I must not use numpy or pandas for this).
Solution with minmal change of original: create list before loop and append to it i.e.:
j = 1
list1 = []
for i in range(len(table)):
row = table[i]
list1.append(row[j])
print(list1)
Note that you might use for to access element directly rather than using index, i.e. loop might be replace with
for row in table:
list1.append(row[j])
Wrt "I guess would be easy to use functions if that column is assign to variable": You could use a namedtuple which is a Python built-in to give these things "names". Essentially, you would be converting each row into a tuple (instead of a list) and each part of that tuple would be accessible by name as well as index instead of just index.
For that you'd first need to assign each row of table to a namedtuple. Without more details in your post about what table contains, apart from NAME, I'll make assumptions about your data:
import collections
# example `table` data
table = [[1, 'Bart First', 30, 'UK'],
[2, 'Maria Great', 25, 'US'],
[3, 'Theresa Green', 20, 'PL']]
# the namedtuple "structure" which will hold each record:
Person = collections.namedtuple('Person', 'id name age country')
people = [] # list of records
for row in table:
person = Person(*row) # convert it to a namedtuple
people.append(person)
# or the four lines above in one line:
people = [Person(*row) for row in table]
# or assign it back to `table`:
# table = [Person(*row) for row in table]
people would now look like:
[Person(id=1, name='Bart First', age=30, country='UK'),
Person(id=2, name='Maria Great', age=25, country='US'),
Person(id=3, name='Theresa Green', age=20, country='PL')]
Next, to get just the names:
all_names = []
for person in people:
all_names.append(person.name)
print(all_names)
# output:
# ['Bart First', 'Maria Great', 'Theresa Green']
# or as a list comprehension:
all_names = [person.name for person in people]
Since you mentioned you can't use pandas or numpy, that would prevent you from doing certain things like sum(people.age) or people.age.sum() but you could instead do
>>> sum(person.age for person in people)
75
And if you still need to use the index, then you can get the country data (4th column, index=3):
>>> col = 3
>>> for person in people:
... print(person[col])
...
UK
US
PL

Trying to Access keys in Dict from their values

I'm importing a CSV to a dictionary, where there are a number of houses labelled (I.E. 1A, 1B,...)
Rows are labelled containing some item such as 'coffee' and etc. In the table is data indicating how much of each item each house hold needs.
Excel screenshot
What I am trying to do it check the values of the key value pairs in the dictionary for anything that isn't blank (containing either 1 or 2), and then take the key value pair and the 'PRODUCT NUMBER' (from the csv) and append those into a new list.
I want to create a shopping list that will contain what item I need, with what quantity, to which household.
the column containing 'week' is not important for this
I import the CSV into python as a dictionary like this:
import csv
import pprint
from typing import List, Dict
input_file_1 = csv.DictReader(open("DATA CWK SHOPPING DATA WEEK 1 FILE B.xlsb.csv"))
table: List[Dict[str, int]] = [] #list
for row in input_file_1:
string_row: Dict[str, int] = {} #dictionary
for column in row:
string_row[column] = row[column]
table.append(string_row)
I found on 'geeksforgeeks' how to access the pair by its value. however when I try this in my dictionary, it only seems to be able to search for the last row.
# creating a new dictionary
my_dict ={"java":100, "python":112, "c":11}
# list out keys and values separately
key_list = list(my_dict.keys())
val_list = list(my_dict.values())
# print key with val 100
position = val_list.index(100)
print(key_list[position])
I also tried to do a for in range loop, but that didn't seem to work either:
for row in table:
if row["PRODUCT NUMBER"] == '1' and row["Week"] == '1':
for i in range(8):
if string_row.values() != ' ':
print(row[i])
Please, if I am unclear anywhere, please let me know and I will clear it up!!
Here is a loop I made that should do what you want.
values = list(table.values())
keys = list(table.keys())
new_table = {}
index = -1
for i in range(values.count("")):
index = values.index("", index +1)
new_table[keys[index]] = values[index]
If you want to remove those values from the original dict you can just add in
d.pop(keys[index]) into the loop

Extract value from key-value pair of dictionary

I have a CSV file with column name (in first row) and values (rest of the row). I wanted to create variables to store these values for every row in a loop. So I started off by creating a dictionary with the CSV file and I got a list of the records with a key-value pair. So now I wanted to create variables to store the "value" extracted from the "key" of each item and within a loop for every record. I am not sure if I am setting this correctly.
Here is the dictionary I have.
my_dict = [{'value id':'value1', 'name':'name1','info':'info1'},
{'value id':'value2', 'name':'name2','info':'info2'},
{'value id':'value3', 'name':'name3','info':'info3'},
}]
for i in len(my_dict):
item[value id] = value1
item[name] = name1
item[info] = info1
The value id and name will be unique and are identifiers the list. Ultimately, I wanted to create an item object i.e. item[info] = info1 and I can add other codes to modify the item[info].
try this,
my_dict = [{'value':'value1', 'name':'name1','info':'info1'},
{'value':'value2', 'name':'name2','info':'info2'},
{'value':'value3', 'name':'name3','info':'info3'}]
for obj in my_dict:
value = obj['value']
name = obj['name']
info = obj['info']
to expand on #aws_apprentice's point, you can capture the data by creating some additional variables
my_dict = [{'value':'value1', 'name':'name1','info':'info1'},
{'value':'value2', 'name':'name2','info':'info2'},
{'value':'value3', 'name':'name3','info':'info3'}]
values = []
names = []
info = []
for obj in my_dict:
values.append(obj['value'])
names.append(obj['name'])
info.append(obj['info'])

create a filtered list of dictionaries based on existing list of dictionaries

I have a list of dictionaries read in from csv DictReader that represent rows of a csv file:
rows = [{"id":"123","date":"1/1/18","foo":"bar"},
{"id":"123","date":"2/2/18", "foo":"baz"}]
I would like to create a new dictionary, where only unique ID's are stored. But I would like to only keep the row entry with the most recent date. Based on the above example, it would keep the row with date 2/2/18.
I was thinking of doing something like this, but having trouble translating the pseudocode in the else statement into actual python.
I can figure out the part of checking the two dates for which is more recent, but having the most trouble figuring out how I check the new list for the dictionary that contains the same id and then retrieving the date from that row.
Note: Unfortunately, due to resource constraints on our platform I am unable to use pandas for this project.
new_data = []
for row in rows:
if row['id'] not in new_data:
new_data.append(row)
else:
check the element in new_data with the same id as row['id']
if that element's date value is less recent:
replace it with the current row
else :
continue to next row in rows
You'll need a function to convert your date (as string) to a date (as date).
import datetime
def to_date(date_str):
d1, m1, y1 = [int(s) for s in date_str.split('/')]
return datetime.date(y1, m1, d1)
I assumed your date format is d/m/yy. Consider using datetime.strptime to parse your dates, as illustrated by Alex Hall's answer.
Then, the idea is to loop over your rows and store them in a new structure (here, a dict whose keys are the IDs). If a key already exists, compare its date with the current row, and take the right one. Following your pseudo-code, this leads to:
rows = [{"id":"123","date":"1/1/18","foo":"bar"},
{"id":"123","date":"2/2/18", "foo":"baz"}]
new_data = dict()
for row in rows:
existing = new_data.get(row['id'], None)
if existing is None or to_date(existing['date']) < to_date(row['date']):
new_data[row['id']] = row
If your want your new_data variable to be a list, use new_data = list(new_data.values()).
import datetime
rows = [{"id":"123","date":"1/1/18","foo":"bar"},
{"id":"123","date":"2/2/18", "foo":"baz"}]
def parse_date(d):
return datetime.datetime.strptime(d, "%d/%m/%y").date()
tmp_dict = {}
for row in rows:
if row['id'] not in tmp_dict.keys():
tmp_dict['id'] = row
else:
if parse_date(row['date']) > parse_date(tmp_dict[row['id']]):
tmp_dict['id'] = row
print tmp_dict.values()
output
[{'date': '2/2/18', 'foo': 'baz', 'id': '123'}]
Note: you can merge the two if to if row['id'] not in tmp_dict.keys() || parse_date(row['date']) > parse_date(tmp_dict[row['id']]) for cleaner and shorter code
Firstly, work with proper date objects, not strings. Here is how to parse them:
from datetime import datetime, date
rows = [{"id": "123", "date": "1/1/18", "foo": "bar"},
{"id": "123", "date": "2/2/18", "foo": "baz"}]
for row in rows:
row['date'] = datetime.strptime(row['date'], '%d/%m/%y').date()
(check if the format is correct)
Then for the actual task:
new_data = {}
for row in rows:
new_data[row['id']] = max(new_data.get(row['id'], date.min),
row['date'])
print(new_data.values())
Alternatively:
Here are some generic utility functions that work well here which I use in many places:
from collections import defaultdict
def group_by_key_func(iterable, key_func):
"""
Create a dictionary from an iterable such that the keys are the result of evaluating a key function on elements
of the iterable and the values are lists of elements all of which correspond to the key.
"""
result = defaultdict(list)
for item in iterable:
result[key_func(item)].append(item)
return result
def group_by_key(iterable, key):
return group_by_key_func(iterable, lambda x: x[key])
Then the solution can be written as:
by_id = group_by_key(rows, 'id')
for id_num, group in list(by_id.items()):
by_id[id_num] = max(group, key=lambda r: r['date'])
print(by_id.values())
This is less efficient than the first solution because it creates lists along the way that are discarded, but I use the general principles in many places and I thought of it first, so here it is.
If you like to utilize classes as much as I do, then you could make your own class to do this:
from datetime import date
rows = [
{"id":"123","date":"1/1/18","foo":"bar"},
{"id":"123","date":"2/2/18", "foo":"baz"},
{"id":"456","date":"3/3/18","foo":"bar"},
{"id":"456","date":"1/1/18","foo":"bar"}
]
class unique(dict):
def __setitem__(self, key, value):
#Add key if missing or replace key if date is newer
if key not in self or self[key]["date"] < value["date"]:
dict.__setitem__(self, key, value)
data = unique() #Initialize new class based on dict
for row in rows:
d, m, y = map(int, row["date"].split('/')) #Split date into parts
row["date"] = date(y, m, d) #Replace date value
data[row["id"]] = row #Set new data. Will overwrite same ids with more recent
print data.values()
Outputs:
[
{'date': datetime.date(18, 2, 2), 'foo': 'baz', 'id': '123'},
{'date': datetime.date(18, 3, 3), 'foo': 'bar', 'id': '456'}
]
Keep in mind that data is a dict that essentially overrides the __setitem__ method that uses IDs as keys. And the dates are date objects so they can be compared easily.

Categories

Resources