Using PyMongo insert_many and empty/null values - python

I'm in the process of populating a mongoDB database and and not sure what to do with null values when using the insert_many statement (I should state at this point that I'm new to both Python and MongoDB)
The data I am inserting is two dimensional traditional SQL like data obtained from a text file, it looks something like this:
emp = [
[7839, 'KING', 'PRESIDENT', null],
[7698, 'BLAKE', 'MANAGER', 7839],
[7782, 'CLARK', 'MANAGER', 7839],
[7566, 'JONES', 'MANAGER', 7839],
[7788, 'SCOTT', 'ANALYST', 7566],
[7902, 'FORD', 'ANALYST', 7566],
[7369, 'SMITH', 'CLERK', 7902],
[7499, 'ALLEN', 'SALESMAN', 7698],
[7521, 'WARD', 'SALESMAN', 7698],
[7654, 'MARTIN', 'SALESMAN', 7698],
[7844, 'TURNER', 'SALESMAN', 7698],
[7876, 'ADAMS', 'CLERK', 7788],
[7900, 'JAMES', 'CLERK', 7698],
[7934, 'MILLER', 'CLERK', 778]
]
And my database population looks like this
employee.insert_many([{
"_id" : emp[i,0],
"Name": emp[i,1],
"Role": emp[i,2],
"Boss": emp[i,3]
}
for i in range(len(emp))
],False)
Ideally I would like "KING", the president, to not have the "Boss" field but I'm not sure how to achieve this. Could anyone point me in the right direction?

Related

Nested Python Object to CSV

I looked up "nested dict" and "nested list" but either method work.
I have a python object with the following structure:
[{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
}]
what i need to output is a csv file in the following, flattened structure:
id, productname, variantid, size, currency, price
productID1, productname A, variantID1, M, USD, 1
productID1, productname A, variantID2, L, USD, 2
productID2, productname A, variantID3, XL, USD, 3
i tried this solution: Python: Writing Nested Dictionary to CSV
or this one: From Nested Dictionary to CSV File
i got rid of the [] around and within the data and e.g. i used this code snippet from 2 and adapted it to my needs. IRL i can't get rid of the [] because that's simple the format i get when calling the API.
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output:
writer = csv.writer(output, delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
for key in sorted(data):
value = data[key]
if len(value) > 0:
writer.writerow([key, value])
else:
for i in value:
writer.writerow([key, i, value])
but the output is like this:
"id";"productID1"
"name";"productname A"
"option";"{'size': {'type': 'list', 'name': 'size', 'choices': {'value': 'M'}}}"
"variant";"{'id': 'variantID1', 'choices': {'size': 'M'}, 'attributes': {'currency': 'USD', 'price': 1}}"
anyone can help me out, please?
thanks in advance
list indices must be integers not strings
The following presents a visual example of a python list:
0 carrot.
1 broccoli.
2 asparagus.
3 cauliflower.
4 corn.
5 cucumber.
6 eggplant.
7 bell pepper
0, 1, 2 are all "indices".
"carrot", "broccoli", etc... are all said to be "values"
Essentially, a python list is a machine which has integer inputs and arbitrary outputs.
Think of a python list as a black-box:
A number, such as 5, goes into the box.
you turn a crank handle attached to the box.
Maybe the string "cucumber" comes out of the box
You got an error: TypeError: list indices must be integers or slices, not str
There are various solutions.
Convert Strings into Integers
Convert the string into an integer.
listy_the_list = ["carrot", "broccoli", "asparagus", "cauliflower"]
string_index = "2"
integer_index = int(string_index)
element = listy_the_list[integer_index]
so yeah.... that works as long as your string-indicies look like numbers (e.g. "456" or "7")
The integer class constructor, int(), is not very smart.
For example, x = int("3 ") will produce an error.
You can try x = int(strying.strip()) to get rid of leading and trailing white-space characters.
Use a Container which Allows Keys to be Strings
Long ago, before before electronic computers existed, there were various types of containers in the world:
cookie jars
muffin tins
carboard boxes
glass jars
steel cans.
back-packs
duffel bags
closets/wardrobes
brief-cases
In computer programming there are also various types of "containers"
You do not have to use a list as your container, if you do not want to.
There are containers where the keys (AKA indices) are allowed to be strings, instead of integers.
In python, the standard container which like a list, but where the keys/indices can be strings, is a dictionary
thisdict = {
"make": "Ford",
"model": "Mustang",
"year": 1964
}
thisdict["brand"] == "Ford"
If you want to index into a container using strings, instead of integers, then use a dict, instead of a list
The following is an example of a python dict which has state names as input and state abreviations as output:
us_state_abbrev = {
'Alabama': 'AL',
'Alaska': 'AK',
'American Samoa': 'AS',
'Arizona': 'AZ',
'Arkansas': 'AR',
'California': 'CA',
'Colorado': 'CO',
'Connecticut': 'CT',
'Delaware': 'DE',
'District of Columbia': 'DC',
'Florida': 'FL',
'Georgia': 'GA',
'Guam': 'GU',
'Hawaii': 'HI',
'Idaho': 'ID',
'Illinois': 'IL',
'Indiana': 'IN',
'Iowa': 'IA',
'Kansas': 'KS',
'Kentucky': 'KY',
'Louisiana': 'LA',
'Maine': 'ME',
'Maryland': 'MD',
'Massachusetts': 'MA',
'Michigan': 'MI',
'Minnesota': 'MN',
'Mississippi': 'MS',
'Missouri': 'MO',
'Montana': 'MT',
'Nebraska': 'NE',
'Nevada': 'NV',
'New Hampshire': 'NH',
'New Jersey': 'NJ',
'New Mexico': 'NM',
'New York': 'NY',
'North Carolina': 'NC',
'North Dakota': 'ND',
'Northern Mariana Islands':'MP',
'Ohio': 'OH',
'Oklahoma': 'OK',
'Oregon': 'OR',
'Pennsylvania': 'PA',
'Puerto Rico': 'PR',
'Rhode Island': 'RI',
'South Carolina': 'SC',
'South Dakota': 'SD',
'Tennessee': 'TN',
'Texas': 'TX',
'Utah': 'UT',
'Vermont': 'VT',
'Virgin Islands': 'VI',
'Virginia': 'VA',
'Washington': 'WA',
'West Virginia': 'WV',
'Wisconsin': 'WI',
'Wyoming': 'WY'
}
i could actually iterate this list and create my own sublist, e.g. e list of variants
data = [{
'id': 'productID1', 'name': 'productname A',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'M'},
]}},
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}}]
},
{'id': 'productID2', 'name': 'productname B',
'option': {
'size': {
'type': 'list',
'name': 'size',
'choices': [
{'value': 'XL', 'salue':'XXL'},
]}},
'variant': [{
'id': 'variantID2',
'choices':
{'size': 'XL', 'size2':'XXL'},
'attributes':
{'currency': 'USD', 'price': 2}}]
}
]
new_list = {}
for item in data:
new_list.update(id=item['id'])
new_list.update (name=item['name'])
for variant in item['variant']:
new_list.update (varid=variant['id'])
for vchoice in variant['choices']:
new_list.update (vsize=variant['choices'][vchoice])
for attribute in variant['attributes']:
new_list.update (vprice=variant['attributes'][attribute])
for option in item['option']['size']['choices']:
new_list.update (osize=option['value'])
print (new_list)
but the output is always the last item of the iteration, because i always overwrite new_list with update().
{'id': 'productID2', 'name': 'productname B', 'varid': 'variantID2', 'vsize': 'XXL', 'vprice': 2, 'osize': 'XL'}
here's the final solution which worked for me:
data = [{
'id': 'productID1', 'name': 'productname A',
'variant': [{
'id': 'variantID1',
'choices':
{'size': 'M'},
'attributes':
{'currency': 'USD', 'price': 1}},
{'id':'variantID2',
'choices':
{'size': 'L'},
'attributes':
{'currency':'USD', 'price':2}}
]
},
{
'id': 'productID2', 'name': 'productname B',
'variant': [{
'id': 'variantID3',
'choices':
{'size': 'XL'},
'attributes':
{'currency': 'USD', 'price': 3}},
{'id':'variantID4',
'choices':
{'size': 'XXL'},
'attributes':
{'currency':'USD', 'price':4}}
]
}
]
for item in data:
for variant in item['variant']:
dic = {}
dic.update (ProductID=item['id'])
dic.update (Name=item['name'].title())
dic.update (ID=variant['id'])
dic.update (size=variant['choices']['size'])
dic.update (Price=variant['attributes']['price'])
products.append(dic)
keys = products[0].keys()
with open('productdata.csv', 'w', newline='', encoding='utf-8') as output_file:
dict_writer = csv.DictWriter(output_file, keys,delimiter=';', quotechar = '"', quoting=csv.QUOTE_NONNUMERIC)
dict_writer.writeheader()
dict_writer.writerows(products)
with the following output:
"ProductID";"Name";"ID";"size";"Price"
"productID1";"Productname A";"variantID1";"M";1
"productID1";"Productname A";"variantID2";"L";2
"productID2";"Productname B";"variantID3";"XL";3
"productID2";"Productname B";"variantID4";"XXL";4
which is exactly what i wanted.

Array into json formatting

I am trying to format a list of cities into json to put it on a firebase database.
I am really new to coding and very lost. Working in python but just trying to get this text formatted.
My list of cities
cities = ['Abu Dhabi', 'Albuquerque', 'Amsterdam', 'Anchorage', 'Antalya', 'Aspen', 'Athens', 'Atlanta', 'Austin', 'Bali', 'Baltimore', 'Bangalore', 'Bangkok', 'Barcelona', 'Beijing', 'Berlin', 'Berlin', 'Bogota', 'Bora Bora', 'Boston', 'Brisbane', 'Brussels', 'Buffalo', 'Burbank', 'Cairo', 'Cancun', 'Cape Town', 'Changcha', 'Charlotte', 'Chengdu', 'Chicago', 'Chongqing', 'Cincinnati']
I need to format them like this
},
"Seattle" : {
"city_name" : "Seattle"
},
"Houston" : {
"city_name" : "Houston"
}
What is the best way to go about doing this?
You can use a simple dict comprehension:
cities = ['Chicago', 'Charlotte', 'Barcelona']
print({city: {'city_name': city} for city in cities})
Which prints:
{'Chicago': {'city_name': 'Chicago'}, 'Charlotte': {'city_name': 'Charlotte'}, 'Barcelona': {'city_name': 'Barcelona'}}

Create JSON from another JSON with duplicate values in Python

I have a JSON:
[{'job': 'fireman', 'salary': 30000', 'country':'USA'}, {'job': 'doctor', 'salary': '50000': 'country': 'Canada'},{'job': 'fireman', 'salary': 60000', 'country':'France'}, {'job': 'Engineer', 'salary': 45000', 'country':'Mexico'} ]
I want to combine the duplicate values and create a JSON like:
[
{"job": "fireman",
"sumamry": [{"country": "USA", "Salary": 40000}, {"Country": "France", "Salary": 60000}]
"total" : 100000},
{"job": "doctor",
"summary": [{"country": "Canada", "Salary": 50000}]
"total" : 50000},
....
]
Try this:
non_summarized = [{'job': 'fireman', 'salary': 30000, 'country':'USA'}, {'job': 'doctor', 'salary': 50000, 'country': 'Canada'},{'job': 'fireman', 'salary': 60000, 'country':'France'}, {'job': 'Engineer', 'salary': 45000, 'country':'Mexico'}]
# sort the list of dictionary base on job keys, so we can loop in the order
non_summarized = sorted(non_summarized, key = lambda i: i['job'])
summarized = list()
last_value = dict()
for d in non_summarized:
# check if the last value has the same job or not
# if not then create a new dict value and update with new information
if last_value.get('job') != d.get('job'):
last_value = {
'job': d.get('job'),
'total': 0,
'summary': list()
}
summarized.append(last_value)
last_value['total'] += d.get('salary', 0)
last_value['summary'].append({
'country': d.get('country'),
'salary': d.get('salary')
})
print(summarized)
Please let me know if you need any clarification.

How to access specific text from json data? [python]

I need to access the text attribute from this json data so i could end up having:
{'description': {'tags': ['outdoor', 'building', 'street', 'city', 'busy', 'people', 'filled', 'traffic', 'many', 'table', 'car', 'group', 'walking', 'bunch', 'crowded', 'large', 'night', 'light', 'standing', 'man', 'tall', 'umbrella', 'riding', 'sign', 'crowd'], 'captions': [{'text': 'a group of people on a city street filled with traffic at night', 'confidence': 0.8241405091548035}]}, 'requestId': '12fd327f-9b9c-4820-9feb-357a776211d3', 'metadata': {'width': 1826, 'height': 2436, 'format': 'Jpeg'}}
text = "The Text"
I Have tried doing parsed['captions']['text'] but this didnt work. Please let me know if you can help Thanks!
Two problems here. First, captions is under description, and second, text is a key of a dictionary inside a list (first and only item):
>>> import pprint
>>> pprint.pprint(parsed)
{'description': {'captions': [{'confidence': 0.8241405091548035,
'text': 'a group of people on a city street filled with traffic at night'}],
...
So, you could extract the text like this:
>>> parsed['description']['captions'][0]['text']
'a group of people on a city street filled with traffic at night'
Another option could be to use a 3rd-party library that simplifies traversing such JSON structures, for example plucky (full disclosure: I'm the author). With plucky, you can say:
>>> from plucky import pluckable
>>> pluckable(parsed).description.captions.text
['a group of people on a city street filled with traffic at night']
and not worry about dictionaries inside lists.
You can use python json library here, like below -
import json
your_json_string = "{'description': {'tags': ['outdoor', 'building', 'street', 'city', 'busy', 'people', 'filled', 'traffic', 'many', 'table', 'car', 'group', 'walking', 'bunch', 'crowded', 'large', 'night', 'light', 'standing', 'man', 'tall', 'umbrella', 'riding', 'sign', 'crowd'], 'captions': [{'text': 'a group of people on a city street filled with traffic at night', 'confidence': 0.8241405091548035}]}, 'requestId': '12fd327f-9b9c-4820-9feb-357a776211d3', 'metadata': {'width': 1826, 'height': 2436, 'format': 'Jpeg'}}"
data_dict = json.loads(your_json_string)
print(data_dict['description']['captions'][0]['text'])

Django: replace columns in ValueQuerySet

My django project database is organized like this:
Student.objects.values('name', 'schoolId') = [ {'name': 'Bob', 'schoolId': 5},
{'name': 'Alice', 'schoolId': 2} ]
School.objects.values('schoolId', 'name') = [ {'schoolId': 2, 'name': 'East High'},
{'schoolId': 5, 'name': 'West High'} ]
I want to generate a something like this:
foo = [ {'name': 'Bob', 'school': 'West High'},
{'name': 'Alice', 'school': 'East High'} ]
Basically it iterates through the first ValueQuerySet to replace all ('schoolId', int) pairs with ('school', schoolNameStr). Logically it can achieve this by searching the schoolId value in School table and returning the corresponding name attribute of each entry.
I know and can use a for loop and filter to do this. But how can I implement it any faster?

Categories

Resources