(Python/Selenium) Dictionary is missing elements - python

I'm webscraping wellrx.com. I'm trying to create a dictionary for each page that I scrape and adding that page to an csv file.
I am getting the name and price on one page, clicking a link to get to the alternative drug name/price and getting the name and price for that page.
Ex: Dictionaries should be
{'drug name': 'ARIPIPRAZOLE', 'price': '$16.45', 'other name': 'ABILIFY', 'price1': '$892.59'}
{'drug name': 'PIOGLITAZONE HCL', 'price': '$9.00', 'other name': 'ACTOS', 'price1': '$392.11'}
but I just get
{'drug name': 'PIOGLITAZONE HCL', 'price': '$9.00', 'other name': 'ACTOS', 'price1': '$392.11'}
#get drug name and price
pages_dict = {}
try:
drug_name = driver.find_element_by_xpath('//h4[#class="displayName skel skel-displayName"]').text
price = driver.find_element_by_xpath('//p[#class="right pr2"]').text
pages_dict['drug name'] = drug_name
pages_dict['price'] = price
print(pages_dict)
except:
continue
#click the alternative name
try:
name = driver.find_element_by_xpath('//h5[#id="OtherName"]').text
name1 = re.findall(": (.*)", name)[0]
driver.find_element_by_link_text(name1).click()
time.sleep(5)
#other drug name
other_name = driver.find_element_by_xpath('//h4[#class="displayName skel skel-displayName"]').text
price1 = driver.find_element_by_xpath('//p[#class="right pr2"]').text
pages_dict['other name'] = other_name
pages_dict['price1'] = price1
print(pages_dict)
except:
continue

I think you have first to learn about dictionaries, because you are always overwriting the same key with:
pages_dict['drug name'] = drug_name
pages_dict['price'] = price
just try:
pages_dict[drug_name] = drug_name
pages_dict[price] = price
and you will see how different keys are store with different values.
If you wish to store the price for every drug it would be more reasonable to do something like this:
pages_dict[drug_name] = price
If you instead want the same kind of dict with fixed keys representing one instance I suggest you to create a list of dictionaries:
list_of_dicts = []
pages_dict = {}
# Put a block code who creates page_dict
list_of dicts.append(pages_dict)
# Put a block code who creates page_dict
list_of dicts.append(pages_dict)
# Put a block code who creates page_dict
list_of dicts.append(pages_dict)
for pages in list_of_dicts:
print(pages)
But this is obvious not a nice way to implement. You should use some iteration (for or while) through the pages. The code you posted needs a lot of revision. It's quite unreadable.

Related

How to handle multiple missing keys in a dict?

I'm using an API to get basic information about shops in my area, name of shop, address, postcode, phone number etc… The API returns back a long list about each shop, but I only want some of the data from each shop.
I created a for loop that just takes the information that I want for every shop that the API has returned. This all works fine.
Problem is not all shops have a phone number or a website, so I get a KeyError because the key website does not exist in every return of a shop. I tried to use try and except which works but only if I only handle one thing, but a shop might not have a phone number and a website, which leads to a second KeyError.
What can I do to check for every key in my for loop and if a key is found missing to just add the value "none"?
My code:
import requests
import geocoder
import pprint
g = geocoder.ip('me')
print(g.latlng)
latitude, longitude = g.latlng
URL = "https://discover.search.hereapi.com/v1/discover"
latitude = xxxx
longitude = xxxx
api_key = 'xxxxx' # Acquire from developer.here.com
query = 'food'
limit = 12
PARAMS = {
'apikey':api_key,
'q':query,
'limit': limit,
'at':'{},{}'.format(latitude,longitude)
}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
data = r.json()
#print(data)
for x in data['items']:
title = x['title']
address = x['address']['label']
street = x['address']['street']
postalCode = x['address']['postalCode']
position = x['position']
access = x['access']
typeOfBusiness = x['categories'][0]['name']
contacts = x['contacts'][0]['phone'][0]['value']
try:
website = x['contacts'][0]['www'][0]['value']
except KeyError:
website = "none"
resultList = {
'BUSINESS NAME:':title,
'ADDRESS:':address,
'STREET NAME:':street,
'POSTCODE:':postalCode,
'POSITION:':position,
'POSITSION2:':access,
'TYPE:':typeOfBusiness,
'PHONE:':contacts,
'WEBSITE:':website
}
print("--"*80)
pprint.pprint( resultList)
I think a good way to handle it would be to use the operator.itemgetter() to create a callable the will attempt to retrieve all the keys at once, and if any aren't found, it will generate a KeyError.
A short demonstration of what I mean:
from operator import itemgetter
test_dict = dict(name="The Shop", phone='123-45-6789', zipcode=90210)
keys = itemgetter('name', 'phone', 'zipcode')(test_dict)
print(keys) # -> ('The Shop', '123-45-6789', 90210)
keys = itemgetter('name', 'address', 'phone', 'zipcode')(test_dict)
# -> KeyError: 'address'

Python - How to compare CSV with duplicate Names and Increment values

I'm trying to take the First Column (Name) and the fourth column (is active) from a CSV file and do the following:
Create a single entry for the Company Name
If 'is active' = yes then increment the value and output the final result.
If 'is active' = NO, then increment that number and give me a 'is active', 'is not active' list with a value at the end.
Data1 and Data2 fields are other columns that I don't care about at this time.
csv =
Name,Data1,Data2, Is Active:
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,No
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 3,Data1,Data2,No
Company 3,Data1,Data2,No
Ideal result would be in the format of:
Company name, Yes-count, no-count
I've started with csvreader to read the columns and I can put them into lists, but i'm unsure how to compare and consolidate names and counts after that.
Any help would be greatly appreciated.
One way to do, Use this:
with open("your_csv_file", "r") as file:
reader = csv.reader(file)
_ = next(reader) # skip header
consolidated = {}
for line in reader:
company_name = line[0]
is_active = line[3]
if company_name not in consolidated:
consolidated[company_name] = { "yes_count": 0, "no_count": 0}
if is_active == "Yes":
consolidated[company_name]["yes_count"] += 1
else:
consolidated[company_name]["no_count"] += 1
Sample Output:
>>> print(consolidated)
{
'Company 1': {'yes_count': 3, 'no_count': 0},
'Company 2': {'yes_count': 3, 'no_count': 1},
'Company 3': {'yes_count': 0, 'no_count': 2}
}

Dictionary Getting Overwritten in While Loop

def get_list_expenses():
expense_list = {}
print('Please type the name of the expense followed by the price of the expense')
while True:
name = input('Name of expense: ')
price = int(input('Price of expense: '))
expense_list.update({
'name': name,
'price': price,
})
cont = input('Want to add another? [y/n] ').lower()
if cont == 'n':
break
print(type(expense_list))
print(expense_list)
return expense_list
Input ==========================
Please type the name of the expense followed by the price of the expense
Name of expense: Food
Price of expense: 100
Want to add another? [y/n] y
Name of expense: Car Insurance
Price of expense: 200
Want to add another? [y/n] n
Output =========================
<class 'dict'>
{'name': 'car', 'price': 200}
I'm new to python and wanted to try and make a budget application to save me time manually inputting information to excel. My idea was to create a loop that would take in the name of an expense and the price per month of it. I wanted to put this into a dictionary so I could .get the information whenever I needed it. However, my dictionary keeps getting overwritten. I've tried a few different solutions I can find online but nothing worked. Thanks in advance.
Using the update method on a dictionary you are basically rewriting the dictionary from scratch at every iteration, for this reason you see a single value at the end (the last one).
I would suggest to create an empty list and then append a new dictionary of values at every iteration:
def get_list_expenses():
expense_list = []
print('Please type the name of the expense followed by the price of the expense')
while True:
name = input('Name of expense: ')
price = int(input('Price of expense: '))
expense_list.append({
'name': name,
'price': price,
})
cont = input('Want to add another? [y/n] ').lower()
if cont == 'n':
break
print(type(expense_list))
print(expense_list)
return expense_list
expense_list.update({
'name': name,
'price': price,
})
Should be:
expense_list.update({name,price})
Dictionary is a key value pair. In your case key will be 'Name of expense' and value will be price. The way you are creating you have 2 keys in dictionary. 1st key is 'name' and second key is 'price'.
You can simply do:
expense_list[name] = price
If name exists it will update otherwise will add.
Make expense_list an actual list:
expense_list = []
and then append to it
expense_list.append({
'name': name,
'price': price,
})

save two list in one json file

I'm getting data with two lists and I want to save both of them in one single json file can someone help me.
I'm using selenium
def get_name(self):
name = []
name = self.find_elements_by_class_name ('item-desc')
price = []
price = self.find_elements_by_class_name ('item-goodPrice')
for names in name :
names = (names.text)
#print names
for prices in price :
prices = (prices.text)
#print price
I would create a dictionary and then JSON dumps
An example could be:
import json
def get_name(self):
names = [ name.text for name in self.find_elements_by_class_name('item-desc') ]
prices = [ price.text for price in self.find_elements_by_class_name('item-goodPrice')]
with open('output-file-name.json', 'w') as f:
f.write(json.dumps({'names': names, 'prices': prices}))
EDIT: In the first version of the answer I was only creating the JSON, if you want to create a file as well, you should include what suggested by #Andersson comment

How do I look get an associated value in a json variable using python?

How do I look up the 'id' associated with the a person's 'name' when the 2 are in a dictionary?
user = 'PersonA'
id = ? #How do I retrieve the 'id' from the user_stream json variable?
json, stored in a variable named "user_stream"
[
{
'name': 'PersonA',
'id': '135963'
},
{
'name': 'PersonB',
'id': '152265'
},
]
You'll have to decode the JSON structure and loop through all the dictionaries until you find a match:
for person in json.loads(user_stream):
if person['name'] == user:
id = person['id']
break
else:
# The else branch is only ever reached if no match was found
raise ValueError('No such person')
If you need to make multiple lookups, you probably want to transform this structure to a dict to ease lookups:
name_to_id = {p['name']: p['id'] for p in json.loads(user_stream)}
then look up the id directly:
id = name_to_id.get(name) # if name is not found, id will be None
The above example assumes that names are unique, if they are not, use:
from collections import defaultdict
name_to_id = defaultdict(list)
for person in json.loads(user_stream):
name_to_id[person['name']).append(person['id'])
# lookup
ids = name_to_id.get(name, []) # list of ids, defaults to empty
This is as always a trade-off, you trade memory for speed.
Martijn Pieters's solution is correct, but if you intend to make many such look-ups it's better to load the json and iterate over it just once, and not for every look-up.
name_id = {}
for person in json.loads(user_stream):
name = person['name']
id = person['id']
name_id[name] = id
user = 'PersonA'
print name_id[user]
persons = json.loads(...)
results = filter(lambda p:p['name'] == 'avi',persons)
if results:
id = results[0]["id"]
results can be more than 1 of course..

Categories

Resources