Trying to iterate over a "subdictionaries" in pymongo - python

I am having a lot of trouble trying to iterate over a mongo collection that have other ""subdictionaries" on it.
Basically, inside the structure of my collection I have an object that represents other Strings:
For instance:
I am trying to get the stats over this collection:
This is the code I am using for now:
cursor = mycol.find(
{}, {'_id': 1, 'stats.total': 1, 'stats.additions': 1, 'stats.deletions': 1})
with open('commits.csv', 'w') as outfile:
fields = ['id', 'stats.total', 'stats.additions', 'stats.deletions']
write = csv.DictWriter(outfile, fieldnames=fields)
write.writeheader()
for commits in cursor:
id = commits['_id']
for stats in commits['stats']:
flattened_record = {
'_id': id,
'stats.total': stats['total'],
'stats.additions': stats['additions'],
'stats.deletions': stats['deletions']
}
write.writerow(flattened_record)
I keep getting the TypeError: string indices must be integers and the type of stats object seem to be an unicode.
Does anyone know how to fix it?
Thanks for your time,

I have solved this by iterating over the stats first and then checking the total, additions and deletions separately:
if 'stats' in commits:
stats = commits['stats']
for key in stats:
if key == 'deletions':
deletions = stats[key]
elif key == 'additions':
additions = stats[key]
else:
total = stats[key]

Related

Wrong sum of values from list of dict

From a JSON file, with a list of dict, I'm trying to get data for each user and use them to create e new dict. Since each user has multiple objects, I'm interacting through them with a for loop. The start json looks like this:
[{
"obj_id": "bfyguwc9971",
"user_id": 3,
"investment": 34,
"profit": 67
},
{
"obj_id": "sdklgnv820",
"user_id": 12,
"investment": 71,
"profit": 43
}]
The JSON contains hundreds of these dictionaries, what I'm trying to achieve is to have a final dict with the avg investment and profit for each user.
Here is the code I'm running:
import json
with open('user_data.json','r') as f:
user_data = json.load(f)
users_id = []
users_dict = {}
obj_count = 0
investment_list = []
profit_list = []
for i in user_data:
if i['user_id'] not in users_id:
users_id.append(i['user_id'])
for a in users_id:
users_dict[a] = {}
for i in user_data:
if i['user_id'] == a:
obj_count += 1
users_dict[a]['Objects'] = obj_count
investment_list.append(i['investment'])
profit_list.append(i['profit'])
avg_investment = (sum(investment_list)/len(investment_list))
users_dict[a]['avg_investment'] = avg_investment
avg_profit = (sum(profit_list)/len(profit_list))
users_dict[a]['avg_profit'] = avg_profit
print(users_dict)
The problem occurs at the output, instead of giving me the right single values for each user, it give me back the first user values right and then it continues to add values for the next users, so the last user will have all the previous values calculated for the other users. What am I doing wrong? Thanks on advance for helping me!

parse github api.. getting string indices must be integers error

I need to loop through commits and get name, date, and messages info from
GitHub API.
https://api.github.com/repos/droptable461/Project-Project-Management/commits
I have many different things but I keep getting stuck at string indices must be integers error:
def git():
#name , date , message
#https://api.github.com/repos/droptable461/Project-Project-Management/commits
#commit { author { name and date
#commit { message
#with urlopen('https://api.github.com/repos/droptable461/Project Project-Management/commits') as response:
#source = response.read()
#data = json.loads(source)
#state = []
#for state in data['committer']:
#state.append(state['name'])
#print(state)
link = 'https://api.github.com/repos/droptable461/Project-Project-Management/events'
r = requests.get('https://api.github.com/repos/droptable461/Project-Project-Management/commits')
#print(r)
#one = r['commit']
#print(one)
for item in r.json():
for c in item['commit']['committer']:
print(c['name'],c['date'])
return 'suc'
Need to get person who did the commit, date and their message.
item['commit']['committer'] is a dictionary object, and therefore the line:
for c in item['commit']['committer']: is transiting dictionary keys.
Since you are calling [] on a string (the dictionary key), you are getting the error.
Instead that code should look more like:
def git():
link = 'https://api.github.com/repos/droptable461/Project-Project-Management/events'
r = requests.get('https://api.github.com/repos/droptable461/Project-Project-Management/commits')
for item in r.json():
for key in item['commit']['committer']:
print(item['commit']['committer']['name'])
print(item['commit']['committer']['date'])
print(item['commit']['message'])
return 'suc'

Is it possible to treat dictionary values as objects?

Here is a class that analyses data:
class TopFive:
def __init__(self, catalog_data, sales_data, query, **kwargs):
self.catalog_data = catalog_data
self.sales_data = sales_data
self.query = query
def analyse(self):
CATALOG_DATA = self.catalog_data
SALES_DATA = self.sales_data
query = self.query
products = {}
# Creating a dict with ID, city or hour ( depending on query ) as keys and their income as values.
for row in SALES_DATA:
QUERIES = {
'category': row[0],
'city': row[2],
'hour': row[3]
}
if QUERIES[query] in products:
products[QUERIES[query]] += float(row[4])
products[QUERIES[query]] = round(products[QUERIES[query]], 2)
else:
products[QUERIES[query]] = float(row[4])
if query == 'category':
top_five = {}
top_five_items = sorted(products, key=products.get, reverse=True)[:5] # Getting top 5 categories.
for key in top_five_items:
for row in CATALOG_DATA:
if key == row[0]:
key_string = row[5] + ', ' + row[4]
top_five[key_string] = products[key]
return top_five
else:
return products
It is being called like so:
holder = TopFive(catalog_data=catalog_data, sales_data=sales_data, query='hour')
top_hour = holder.analyse()
What I want to do now is work with the dates. They come in from an input csv file looking like this:
2015-12-11T17:14:05+01:00
Now I need to change to UTC time zone. I thought of using:
.astimezone(pytz.utc)
And now to my question: Can I somehow do so in my QUERIES dictionary, so that when the 'hour' argument is passed to the class I can then execute the program, without changing the following code's structure:
if QUERIES[query] in products:
products[QUERIES[query]] += float(row[4])
products[QUERIES[query]] = round(products[QUERIES[query]], 2)
else:
products[QUERIES[query]] = float(row[4])
and without adding more conditions.
I am thinking of something like:
'hour': row[3].astimezone(pytz.utc)
But this is not working. I can understand why, I am just wondering if there is a similar approach that works. Otherwise I would have to add yet another condition with separate return value and work there.
Got it! The answer to my question is yes: you can use methods in dictionary, just as I tried:
QUERIES = {
'category': row[0],
'city': row[2],
'hour': hour.astimezone(pytz.utc)
}
What I just realized was that I forgot to parse the csv input into datetime format. So obviously when I try to use .astimezone on string it raises error. Sorry for the long useless post, but I'm still very new to OOP and its quite difficult keeping track of all files, instances and so on ;D Thanks

populating sqlalchemy table with csv file having foreign key saves null to database

Am trying to populate my table using csv with foreignkey constraints. the problem is it saves all as none in the database. After reading the csv file, i change the list to a dictionary (song_params). I dont no where i could have got it wrong beacuse all seems to be the way i wanted it to work
header = ['artist', 'album', 'genre', 'song', 'price', 'download_link', 'duration']
for row in csv_file:
song_params = dict(zip(header, row))
song_values = {}
dbsession = DBSession()
for key, value in song_params.iteritems():
if key == 'artist':
martist = dbsession.query(Artist).filter_by(artist = value).first()
song_values['artist'] = martist
else:
song_values[key] = value
if key == 'album':
malbum =dbsession.query(Album).filter_by(album_name = value).first()
song_values['album'] = malbum
else:
song_values[key] = value
if key == 'genre':
mgenre = dbsession.query(Genre).filter_by(genre = value).first()
song_values['genre'] = mgenre
else:
song_values[key] = value
song = Song(song_values)
dbsession.add(song)
Try Song(**song_values) - unpack dict to an argument list.

How do I look get an associated value in a json variable using python?

How do I look up the 'id' associated with the a person's 'name' when the 2 are in a dictionary?
user = 'PersonA'
id = ? #How do I retrieve the 'id' from the user_stream json variable?
json, stored in a variable named "user_stream"
[
{
'name': 'PersonA',
'id': '135963'
},
{
'name': 'PersonB',
'id': '152265'
},
]
You'll have to decode the JSON structure and loop through all the dictionaries until you find a match:
for person in json.loads(user_stream):
if person['name'] == user:
id = person['id']
break
else:
# The else branch is only ever reached if no match was found
raise ValueError('No such person')
If you need to make multiple lookups, you probably want to transform this structure to a dict to ease lookups:
name_to_id = {p['name']: p['id'] for p in json.loads(user_stream)}
then look up the id directly:
id = name_to_id.get(name) # if name is not found, id will be None
The above example assumes that names are unique, if they are not, use:
from collections import defaultdict
name_to_id = defaultdict(list)
for person in json.loads(user_stream):
name_to_id[person['name']).append(person['id'])
# lookup
ids = name_to_id.get(name, []) # list of ids, defaults to empty
This is as always a trade-off, you trade memory for speed.
Martijn Pieters's solution is correct, but if you intend to make many such look-ups it's better to load the json and iterate over it just once, and not for every look-up.
name_id = {}
for person in json.loads(user_stream):
name = person['name']
id = person['id']
name_id[name] = id
user = 'PersonA'
print name_id[user]
persons = json.loads(...)
results = filter(lambda p:p['name'] == 'avi',persons)
if results:
id = results[0]["id"]
results can be more than 1 of course..

Categories

Resources