When I run my code, I get the price of the hotel I have defined in the url and after that I get the prices of all the other hotels that come as a suggestion. In order to subset and pick the first output, I need to store the for-loop output in a single variable or as a list. How do I do that?
I am using python 3.6.5, windows 7 professional
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
chrome_path= r"C:\Users\Downloads\chromedriver_win32\chromedriver.exe"
dr = webdriver.Chrome(chrome_path)
dr.get("url")
hoteltrial = dr.find_elements_by_class_name("hotel-info")
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
print(nametrial.text + " - ")
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
print(currency.text + price.text)
except NoSuchElementException:
print("sold")
The actual output looks somewhat like this and I need the price of only Langham
The Langham Hong Kong -
$272
Cordis Hong Kong -
$206
Island Shangri-La -
$881
What you are doing is overriding the variables you use in your for-loop. For every iteration, the new value found is assigned to the variable in the loop.
for i in range(5):
x = i
When you run this example and look at the value assigned to x after the for-loop, you'll see that the value is 4. You are doing the same in your code.
To solve this you can define a list outside of the for-loop and append the results to this list.
hotel = []
for i in range(5):
hotel.append(i)
After running the above code you will see that this results in a list.
hotel
[0,1,2,3,4]
You should do the same in you code.
hotellist = []
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
hName = nametrial.text + " - "
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
result = hName + currency.text + price.text
hotellist.append(result)
except NoSuchElementException:
result = hName + "Sold"
hotellist.append(result)
After running this for-loop you will have a list with all the results found in each iteration of the loop. You could use a dictionary instead, so you could get each hotel and price by searching for the key.
Use dict:
hoteldict = {}
for hoteltrial1 in hoteltrial:
nametrial = hoteltrial1.find_element_by_class_name("hotel-name")
try:
pricetrial = hoteltrial1.find_element_by_class_name("c-price")
price = pricetrial.find_element_by_css_selector("span.price-num")
currency = pricetrial.find_element_by_class_name("price-currency")
hoteldict.update({nametrial.text:currency.text+price.text})
except NoSuchElementException:
hoteldict.update({nametrial.text:"Sold"})
For dictionary use update instead of append.
Access your hoteldict:
hoteldict["The Langham Hong Kong"] #Will return $272
I hope this helped you.
Kind regards,
Sam
Related
I'm trying to scrape a website and get every meal_box meal_container row in a list by driver.find_elements but for some reason I couldn't do it. I tried By.CLASS_NAME, because it seemed the logical one but the length of my list was 0. Then I tried By.XPATH, and the length was then 1 (I understand why). I think I can use XPATH to get them one by one, but I don't want to do it if I can handle it in a for loop.
I don't know why the "find_elements(By.CLASS_NAME,'print_name')" works but not "find_elements(By.CLASS_NAME,"meal_box meal_container row")"
I'm new at both web scraping and stackoverflow, so if any other details are needed I can add them.
Here is my code:
meals = driver.find_elements(By.CLASS_NAME,"meal_box meal_container row")
print(len(meals))
for index, meal in enumerate(meals):
foods = meal.find_elements(By.CLASS_NAME, 'print_name')
print(len(foods))
if index == 0:
mealName = "Breakfast"
elif index == 1:
mealName = "Lunch"
elif index == 2:
mealName = "Dinner"
else:
mealName = "Snack"
for index, title in enumerate(foods):
recipe = {}
print(title.text)
print(mealName + "\n")
recipe["name"] = title.text
recipe["meal"] = mealName
Here is the screenshot of the HTML:
It seems Ok but about class name put a dot between characters.
Like "meal_box.meal_container.row" Try this.
meals = driver.find_elements(By.CLASS_NAME,"meal_box.meal_container.row")
Try to use driver.find_element_by_css_selector
It can be because "meal_box meal_container row" is inside of other element. So you should try finding the highest element and look for needed one inside.
root = driver.find_element(By.CLASS_NAME,"row")
meals = root.find_elements(By.CLASS_NAME, "meal_box meal_container row")
I am trying to write a python script using selenium for a horse racing software site.
The site shows a table in which horses names and odds will appear when they become an 'arb'.
When a horse is an 'arb' it will show up on the table and when it is no longer an arb it will disappear.
The script needs to make a list of all the horses that have come up throughout the day with their name and odds.
So far I have managed to get it to write out the selected 'name' and 'odds' values when I first run the script.
However I am unsure how I would code it to iterate over time, updating and adding to the list.
Any help/advice would be greatly appreciated.
from selenium import webdriver
import time
#Start Driver
driver = webdriver.Chrome("/Users/username/PycharmProjects/SoftwareBot/drivers/chromedriver")
driver.get("https://software.com/members/user/software2")
#Get Title
title = driver.title
print(title)
#Login To Page
driver.find_element_by_name("email").send_keys("johndoe#hotmail.com")
driver.find_element_by_name("password").send_keys("Pass123")
driver.find_element_by_xpath('//button[text()="Continue"]').click()
#Sleep
time.sleep(1.5)
#Find Num Of Rows
rows = len(driver.find_elements_by_xpath('//*[#id="data_body"]/tr'))
print(rows)
#Find Num Of Columns
cols = len(driver.find_elements_by_xpath('//*[#id="data_body"]/tr[1]/td'))
print(cols)
#Open Text File
f= open("horses.txt","w+")
#Write out needed table values
for r in range(1, rows + 1):
for c in range(3, 7, 3):
value = driver.find_element_by_xpath('//*[#id="data_body"]/tr[' + str(r) + ']/td[' + str(c) + ']').text
print(value)
f.write(value)
f.write("\n")
time.sleep(1.5)
I am working on a project for school where I am creating a nutrition plan based off our schools nutrition menu. I am trying to create a dictionary with every item and its calorie content but for some reason the loop im using gets stuck at 7 and will never advance the rest of the list. To add to my dictionary. So when I search for a known key (Sour Cream) it throws and error because it is never added to the dictionary. I have also noticed it prints several numbers twice in a row as well double adding them to the dictionary.
edit: have discovered the double printing was from the print statement I had - still wondering about the 7 however
from bs4 import BeautifulSoup
import urllib3
import requests
url = "https://menus.sodexomyway.com/BiteMenu/Menu?menuId=14756&locationId=11870001&whereami=http://mnsu.sodexomyway.com/dining-near-me/university-dining-center"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
allFood = soup.findAll('a', attrs={'class':'get-nutritioncalculator primary-textcolor'})
allCals = soup.findAll('a', attrs={'class':'get-nutrition primary-textcolor'})
nums = '0123456789'
def printData(charIndex):
for char in allFood[charIndex].contents:
print(char)
for char in allCals[charIndex].contents:
print(char)
def getGoals():
userCalories = int(input("Please input calorie goal for the day (kC): "))
#Display Info (Text/RsbPi)
fullList = {}
def compileFood():
foodCount = 0
for food in allFood:
print(foodCount)
for foodName in allFood[foodCount].contents:
fullList[foodName] = 0
foodCount += 1
print(foodCount)
compileFood()
print(fullList['Sour Cream'])
Any help would be great. Thanks!
Ok first why is this happening:
The reason is because the food on the index 7 is empty. Because it's empty it will never enter your for loop and therefore never increase your foodCount => it will stuck at 7 forever.
So if you would shift your index increase outside of the for loop it would work without a problem.
But you doing something crude here.
You already iterate through the food item and still use an additional variable.
You could solve it smarter this way:
def compileFood():
for food in allFood:
for foodName in food.contents:
fullList[foodName] = 0
With this you don't need to care about an additional variable at all.
I'm trying to do an accumulation in the XPath, is it possible?
Look at my code yet:
driver = webdriver.Chrome()
driver.get('http://www.imdb.com/user/ur33778891/watchlist?ref_=wt_nv_wl_all_0')
wait = (WebDriverWait, 10)
x = 1
while True:
try:
film = driver.find_element(By.XPATH, "((//h3[#class='lister-item-header']/a)[%d]" % x).text
x = x + 1
print(film)
The thing is, I'm trying to go to the IMDB website, inside a user watchlist, and get the name film by film with this method, but as I see it's not possible to use %d for this purpose, is there anything that the Selenium allows that could do the job?
PS: Nor string nor number is working.
The thing is: if you open the IMDB watchlist, there will be a list of films, and the XPath of then are the same //h3[#class='lister-item-header']/a, but I was thinking about how to select then individually, I know that it can be done like this:
//h3[#class='lister-item-header']/a [1] #this will select the first movie
//h3[#class='lister-item-header']/a [2] #this will select the second movie
Now, is there a way to do this automatically? I was thinking with the accumulators, instead of the [1] I was going to put the [x] and determine as x = x + 1.
Use for that the following code:
xpath = "//h3[#class='lister-item-header']/a[{}]".format(x)
After that use like this:
film = driver.find_element(By.XPATH, xpath).text
Hope it helps you!
As you are starting with x = 1 so you can use the index as follows :
film = driver.find_elements_by_xpath("//h3[#class='lister-item-header']/a")[x].get_attribute("innerHTML")
(Code below)
I'm scraping a website and the data I'm getting back is in 2 multi-dimensional arrays. I'm wanting everything to be in a JSON format because I want to save this and load it in again later when I add "tags".
So, less vague. I'm writing a program which takes in data like what characters you have and what missions are requiring you to do (you can complete multiple at once if the attributes align), and then checks that against a list of attributes that each character fulfills and returns a sorted list of the best characters for the context.
Right now I'm only scraping character data but I've already "got" the attribute data per character - the problem there was that it wasn't sorted by name so it was just a randomly repeating list that I needed to be able to look up. I still haven't quite figured out how to do that one.
Right now I have 2 arrays, 1 for the headers of the table and one for the rows of the table. The rows contain the "Answers" for the Header's "Questions" / "Titles" ; ie Maximum Level, 50
This is true for everything but the first entry which is the Name, Pronunciation (and I just want to store the name of course).
So:
Iterations = 0
While loop based on RowArray length / 9 (While Iterations <= that)
HeaderArray[0] gives me the name
RowArray[Iterations + 1] gives me data type 2
RowArray[Iterations + 2] gives me data type 3
Repeat until Array[Iterations + 8]
Iterations +=9
So I'm going through and appending these to separate lists - single arrays like CharName[] and CharMaxLevel[] and so on.
But I'm actually not sure if that's going to make this easier or not? Because my end goal here is to send "CharacterName" and get stuff back based on that AND be able to send in "DesiredTraits" and get "CharacterNames who fit that trait" back. Which means I also need to figure out how to store that category data semi-efficiently. There's over 80 possible categories and most only fit into about 10. I don't know how I'm going to store or load that data.
I'm assuming JSON is the best way? And I'm trying to keep it all in one file for performance and code readability reasons - don't want a file for each character.
CODE: (Forgive me, I've never scraped anything before + I'm actually somewhat new to Python - just got it 4? days ago)
https://pastebin.com/yh3Z535h
^ In the event anyone wants to run this and this somehow makes it easier to grab the raw code (:
import time
import requests, bs4, re
from urllib.parse import urljoin
import json
import os
target_dir = r"D:\00Coding\Js\WebScraper" #Yes, I do know that storing this in my Javascript folder is filthy
fullname = os.path.join(target_dir,'TsumData.txt')
StartURL = 'http://disneytsumtsum.wikia.com/wiki/Skill_Upgrade_Chart'
URLPrefix = 'http://disneytsumtsum.wikia.com'
def make_soup(url):
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
return soup
def get_links(url):
soup = make_soup(url)
a_tags = soup.find_all('a', href=re.compile(r"^/wiki/"))
links = [urljoin(URLPrefix, a['href'])for a in a_tags] # convert relative url to absolute url
return links
def get_tds(link):
soup = make_soup(link)
#tds = soup.find_all('li', class_="category normal") #This will give me the attributes / tags of each character
tds = soup.find_all('table', class_="wikia-infobox")
RowArray = []
HeaderArray = []
if tds:
for td in tds:
#print(td.text.strip()) #This is everything
rows = td.findChildren('tr')#[0]
headers = td.findChildren('th')#[0]
for row in rows:
cells = row.findChildren('td')
for cell in cells:
cell_content = cell.getText()
clean_content = re.sub( '\s+', ' ', cell_content).strip()
if clean_content:
RowArray.append(clean_content)
for row in rows:
cells = row.findChildren('th')
for cell in cells:
cell_content = cell.getText()
clean_content = re.sub( '\s+', ' ', cell_content).strip()
if clean_content:
HeaderArray.append(clean_content)
print(HeaderArray)
print(RowArray)
return(RowArray, HeaderArray)
#Output = json.dumps([dict(zip(RowArray, row_2)) for row_2 in HeaderArray], indent=1)
#print(json.dumps([dict(zip(RowArray, row_2)) for row_2 in HeaderArray], indent=1))
#TempFile = open(fullname, 'w') #Read only, Write Only, Append
#TempFile.write("EHLLO")
#TempFile.close()
#print(td.tbody.Series)
#print(td.tbody[Series])
#print(td.tbody["Series"])
#print(td.data-name)
#time.sleep(1)
if __name__ == '__main__':
links = get_links(StartURL)
MainHeaderArray = []
MainRowArray = []
MaxIterations = 60
Iterations = 0
for link in links: #Specifically I'll need to return and append the arrays here because they're being cleared repeatedly.
#print("Getting tds calling")
if Iterations > 38: #There are this many webpages it'll first look at that don't have the data I need
TempRA, TempHA = get_tds(link)
MainHeaderArray.append(TempHA)
MainRowArray.append(TempRA)
MaxIterations -= 1
Iterations += 1
#print(MaxIterations)
if MaxIterations <= 0: #I don't want to scrape the entire website for a prototype
break
#print("This is the end ??")
#time.sleep(3)
#jsonized = map(lambda item: {'Name':item[0], 'Series':item[1]}, zip())
print(MainHeaderArray)
#time.sleep(2.5)
#print(MainRowArray)
#time.sleep(2.5)
#print(zip())
TsumName = []
TsumSeries = []
TsumBoxType = []
TsumSkillDescription = []
TsumFullCharge = []
TsumMinScore = []
TsumScoreIncreasePerLevel = []
TsumMaxScore = []
TsumFullUpgrade = []
Iterations = 0
MaxIterations = len(MainRowArray)
while Iterations <= MaxIterations: #This will fire 1 time per Tsum
print(Iterations)
print(MainHeaderArray[Iterations][0]) #Holy this gives us Mickey ;
print(MainHeaderArray[Iterations+1][0])
print(MainHeaderArray[Iterations+2][0])
print(MainHeaderArray[Iterations+3][0])
TsumName.append(MainHeaderArray[Iterations][0])
print(MainRowArray[Iterations][1])
#At this point it will, of course, crash - that's because I only just realized I needed to append AND I just realized that everything
#Isn't stored in a list as I thought, but rather a multi-dimensional array (as you can see below I didn't know this)
TsumSeries[Iterations] = MainRowArray[Iterations+1]
TsumBoxType[Iterations] = MainRowArray[Iterations+2]
TsumSkillDescription[Iterations] = MainRowArray[Iterations+3]
TsumFullCharge[Iterations] = MainRowArray[Iterations+4]
TsumMinScore[Iterations] = MainRowArray[Iterations+5]
TsumScoreIncreasePerLevel[Iterations] = MainRowArray[Iterations+6]
TsumMaxScore[Iterations] = MainRowArray[Iterations+7]
TsumFullUpgrade[Iterations] = MainRowArray[Iterations+8]
Iterations += 9
print(Iterations)
print("It's Over")
time.sleep(3)
print(TsumName)
print(TsumSkillDescription)
Edit:
tl;dr my goal here is to be like
"For this Mission Card I need a Blue Tsum with high score potential, a Monster's Inc Tsum for a bunch of games, and a Male Tsum for a long chain.. what's the best Tsum given those?" and it'll be like "SULLY!" and automatically select it or at the very least give you a list of Tsums. Like "These ones match all of them, these ones match 2, and these match 1"
Edit 2:
Here's the command Line Output for the code above:
https://pastebin.com/vpRsX8ni
Edit 3: Alright, just got back for a short break. With some minor looking over I see what happened - my append code is saying "Append this list to the array" meaning I've got a list of lists for both the Header and Row arrays that I'm storing. So I can confirm (for myself at least) that these aren't nested lists per se but they are definitely 2 lists, each containing a single list at every entry. Definitely not a dictionary or anything "special case" at least. This should help me quickly find an answer now that I'm not throwing "multi-dimensional list" around my google searches or wondering why the list stuff isn't working (as it's expecting 1 value and gets a list instead).
Edit 4:
I need to simply add another list! But super nested.
It'll just store the categories that the Tsum has as a string.
so Array[10] = ArrayOfCategories[Tsum] (which contains every attribute in string form that the Tsum has)
So that'll be ie TsumArray[10] = ["Black", "White Gloves", "Mickey & Friends"]
And then I can just use the "Switch" that I've already made in order to check them. Possibly. Not feeling too well and haven't gotten that far yet.
Just use the with open file as json_file , write/read (super easy).
Ultimately stored 3 json files. No big deal. Much easier than appending into one big file.