I'm reading data from a file into a series of lists as follows:
sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]
wherein the sources and topics in each list may be the same or different.
What I am trying to achieve is a dictionary which groups the topics associated with each source, and their associated scores (the scores will then be averaged, but for the purpose of this question let's just list them as values of the topic (key)).
The results would ideally look like a list of nested dicts as follows:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I think the best way to do this would be to create a Counter of the sources, and then a dict for each topics per source, and save each dict as a value for each corresponding source. However I am having trouble iterating properly to get the desired result.
Here's what I have so far:
sourceDict = {}
sourceDictList = []
for row in sourceData:
source = row[0]
score = row[1]
topic = row[2]
sourceDict = [source,{topic:score}]
sourceDictList.append(sourceDict)
sourceList.append(source)
wherein sourceDictList results in the following: [[source, {topic: score}]...], (essentially reformatting the data from the originally list of lists), and sourceList is just a list of all the source (some repeating).
Then I initialize a counter and match the source from the counter with the source from sourceDictList and if they match, save the topic:score dict as the key:
sourceCounter = Counter(sourceList)
for key,val in sourceCounter.items():
for dictitem in sourceDictList:
if dictitem[0] == key:
sourceCounter[key] = dictitem[1]
But the output is only saving the last topic:score dict to each source. So instead of the desired:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I am only getting:
Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})
I am under the impression that if there is a unique key saved to a dict, it will append that key:value pair without overwriting previous ones. Am I missing something?
Appreciate any help on this.
Simply we can do:
sourceData = [
['source1', 'topic1', 'score1'],
['source1', 'topic2', 'score1'],
['source1', 'topic1', 'score2'],
['source2', 'topic1', 'score1'],
['source2', 'topic2', 'score2'],
['source2', 'topic1', 'score3'],
]
sourceDict = {}
for row in sourceData:
source = row[0]
topic = row[1]
score = row[2]
if source not in sourceDict:
# This will be executed when the source
# comes for the first time.
sourceDict[source] = {}
if topic not in sourceDict[source]:
# This will be executed when the topic
# inside that source comes for the first time.
sourceDict[source][topic] = []
sourceDict[source][topic].append(score)
print(sourceDict)
You can simply use the collection's defaultdict
sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]
from collections import defaultdict
sourceDict = defaultdict(dict)
for source, topic, score in sourdata:
topicScoreDict = sourceDict[source]
topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]
>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}
Related
I have a dictionary (table) defined like this:
table = {"id": [1, 2, 3]}, {"file": ['good1.txt', 'bad2.txt', 'good3.txt']}
and I have a list of bad candidates that should be removed:
to_exclude = ['bad0.txt', 'bad1.txt', 'bad2.txt']
I hope to filter the table based on if the file in a row of my table can be found inside to_exclude.
filtered = {"id": [1, 2]}, {"file": ['good1.txt', 'good3.txt']}
I guess I could use a for loop to check the entries one by one, but I was wondering what's the most python-efficient manner to solve this problem.
Could someone provide some guidance on this? Thanks.
I'm assuming you miswrote your data structure. You have a set of two dictionaries, which is impossible. (Dictionaries are not hashable). I'm hoping your actual data is:
data = {"id": [1, 2, 3], "file": [.......]}
a dictionary with two keys.
So for me, the simplest would be:
# Create a set for faster testing
to_exclude_set = set(to_exclude)
# Create (id, file) pairs for the pairs we want to keep
pairs = [(id, file) for id, file in zip(data["id"], data["file"])
if file not in to_exclude_set]
# Recreate the data structure
result = { 'id': [_ for id, _ in pairs],
'file': [_ for _, file in pairs] }
For example, for the txt file of
Math, Calculus, 5
Math, Vector, 3
Language, English, 4
Language, Spanish, 4
into the dictionary of:
data={'Math':{'name':[Calculus, Vector], 'score':[5,3]}, 'Language':{'name':[English, Spanish], 'score':[4,4]}}
I am having trouble with appending value to create list inside the smaller dict. I'm very new to this and I would not understand importing command. Thank you so much for all your help!
For each line, find the 3 values, then add them to a dict structure
from pathlib import Path
result = {}
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
if subject_type not in result:
result[subject_type] = {'name': [], 'score': []}
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
You can simplify it with the use of a defaultdict that creates the mapping if the key isn't already present
result = defaultdict(lambda: {'name': [], 'score': []}) # from collections import defaultdict
for row in Path("test.txt").read_text().splitlines():
subject_type, subject, score = row.split(", ")
result[subject_type]['name'].append(subject)
result[subject_type]['score'].append(int(score))
With pandas.DataFrame you can directly the formatted data and output the format you want
import pandas as pd
df = pd.read_csv("test.txt", sep=", ", engine="python", names=['key', 'name', 'score'])
df = df.groupby('key').agg(list)
result = df.to_dict(orient='index')
From your data:
data={'Math':{'name':['Calculus', 'Vector'], 'score':[5,3]},
'Language':{'name':['English', 'Spanish'], 'score':[4,4]}}
If you want to append to the list inside your dictionary, you can do:
data['Math']['name'].append('Algebra')
data['Math']['score'].append(4)
If you want to add a new dictionary, you can do:
data['Science'] = {'name':['Chemisty', 'Biology'], 'score':[2,3]}
I am not sure if that is what you wanted but I hope it helps!
I'm new to this forum, kindly excuse if the question format is not very good.
I'm trying to fetch rows from database table in mysql and print the same after processing the cols (one of the cols contains json which needs to be expanded). Below is the source and expected output. Would be great if someone can suggest an easier way to manage this data.
Note: I have achieved this with lots of looping and parsing but the challenges are.
1) There is no connection between col_names and data and hence when I am printing the data I don't know the order of the data in the resultset so there is a mismatch in the col title that I print and the data, any means to keep this in sync ?
2) I would like to have the flexibility of changing the order of the columns without much rework.
What is best possible way to achieve this. Have not explored the pandas library as I was not sure if it is really necessary.
Using python 3.6
Sample Data in the table
id, student_name, personal_details, university
1, Sam, {"age":"25","DOL":"2015","Address":{"country":"Poland","city":"Warsaw"},"DegreeStatus":"Granted"},UAW
2, Michael, {"age":"24","DOL":"2016","Address":{"country":"Poland","city":"Toruń"},"DegreeStatus":"Granted"},NCU
I'm querying the database using MySQLdb.connect object, steps below
query = "select * from student_details"
cur.execute(query)
res = cur.fetchall() # get a collection of tuples
db_fields = [z[0] for z in cur.description] # generate list of col_names
Data in variables:
>>>db_fields
['id', 'student_name', 'personal_details', 'university']
>>>res
((1, 'Sam', '{"age":"25","DOL":"2015","Address":{"country":"Poland","city":"Warsaw"},"DegreeStatus":"Granted"}','UAW'),
(2, 'Michael', '{"age":"24","DOL":"2016","Address":{"country":"Poland","city":"Toruń"},"DegreeStatus":"Granted"}','NCU'))
Desired Output:
id, student_name, age, DOL, country, city, DegreeStatus, University
1, 'Sam', 25, 2015, 'Poland', 'Warsaw', 'Granted', 'UAW'
2, 'Michael', 24, 2016, 'Poland', 'Toruń', 'Granted', 'NCU'
A not-too-pythonic way but easy to understand (and maybe you can write a more pythonic soltion) might be:
def unwrap_dict(_input):
res = dict()
for k, v in _input.items():
# Assuming you know there's only one nested level
if isinstance(v, dict):
for _k, _v in v.items():
res[_k] = _v
continue
res[k] = v
return res
all_data = list()
for row in result:
res = dict()
for field, data in zip(db_fields, row):
# Assuming you know personal_details is the only JSON column
if field == 'personal_details':
data = json.loads(data)
if isinstance(data, dict):
extra = unwrap_dict(data)
res.update(extra)
continue
res[field] = data
all_data.append(res)
import csv
def partytoyear():
party_in_power = {}
with open("presidents.txt") as f:
reader = csv.reader(f)
for row in reader:
party = row[1]
for year in row[2:]:
party_in_power[year] = party
print(party_in_power)
return party_in_power
partytoyear()
def statistics():
with open("BLS_private.csv") as f:
statistics = {}
reader = csv.DictReader(f)
for row in reader:
statistics = row
print(statistics)
return statistics
statistics()
These two functions return two dictionaries.
Here is a sample of the first dictionary:
'Democrat', '1981': 'Republican', '1982': 'Republican', '1983'
Sample of the second dictionary:
'2012', '110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'
The first dictionary associates a year and the political party. The next dictionary associates the year with job statistics.
I need to combine these two dictionaries, so I can have the party inside the dictionary with the job statistics.
I would like the dictioary to look like this:
'Democrat, '2012','110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'
How would I go about doing this? I've looked at the syntax for update() but that didn't work for my program
You can’t have a dictionary in that manor in python it’s syntactically wrong but you can have each value be a collection such as a list. Here’s a comprehension that does just that using dict lookups:
first_dict = {'Democrat': '1981': 'Republican': '1982': 'Republican': '1983', ...}
second_dict = {'2012': ['110470', '110724', '110871', '110956', '111072', '111135', '111298', '111432', '111560', '111744'], ...}
result = {party: [year, *second_dict[year] for party, year in first_dict.items()}
Pseudo result dict structure:
{'Party Name': [year, stats, ...], ...}
I'm trying to create a dictionary of dictionaries like this:
food = {"Broccoli": {"Taste": "Bad", "Smell": "Bad"},
"Strawberry": {"Taste": "Good", "Smell": "Good"}}
But I am populating it from an SQL table. So I've pulled the SQL table into an SQL object called "result". And then I got the column names like this:
nutCol = [i[0] for i in result.description]
The table has about 40 characteristics, so it is quite long.
I can do this...
foodList = {}
for id, food in enumerate(result):
addMe = {str(food[1]): {nutCol[id + 2]: food[2], nulCol[idx + 3]:
food[3] ...}}
foodList.update(addMe)
But this of course would look horrible and take a while to write. And I'm still working out how I want to build this whole thing so it's possible I'll need to change it a few times...which could get extremely tedious.
Is there a DRY way of doing this?
In order to make solution position independent you can make use of dict1.update(dict2). This simply merges dict2 with dict1.
In our case since we have dict of dict, we can use dict['key'] as dict1 and simply add any additional key,value pair as dict2.
Here is an example.
food = {"Broccoli": {"Taste": "Bad", "Smell": "Bad"},
"Strawberry": {"Taste": "Good", "Smell": "Good"}}
addthis = {'foo':'bar'}
Suppose you want to add addthis dict to food['strawberry'] , we can simply use,
food["Strawberry"].update(addthis)
Getting result:
>>> food
{'Strawberry': {'Taste': 'Good', 'foo': 'bar', 'Smell': 'Good'},'Broccoli': {'Taste': 'Bad', 'Smell': 'Bad'}}
>>>
Assuming that column 0 is what you wish to use as your key, and you do wish to build a dictionary of dictionaries, then its:
detail_names = [col[0] for col in result.description[1:]]
foodList = {row[0]: dict(zip(detail_names, row[1:]))
for row in result}
Generalising, if column k is your identity then its:
foodList = {row[k]: {col[0]: row[i]
for i, col in enumerate(result.description) if i != k}
for row in result}
(Here each sub dictionary is all columns other than column k)
addMe = {str(food[1]):dict(zip(nutCol[2:],food[2:]))}
zip will take two (or more) lists of items and pair the elements, then you can pass the result to dict to turn the pairs into a dictionary.