I have a scenario file that describes logging into an account;
Scenario: Failed login with blank login details
Given I go to "BBC Login Page"
When I fill in the username textfield with ""
And I fill in the password textfield with ""
And I press "Log in"
Then I should see "In order to login, you must enter a valid user name."
In my step definitions (Python using Lettuce), which would fail unless I pass in the URL in my scenario (bad BDD);
#step('I go to "(.*?)"$')
def go_to(step, url):
with AssertContextManager(step):
world.browser.get(url)
Instead I want to build in a little bit of logic that substitutes the path for the real URL;
msr_login_page = "https://www.bbc.co.uk/login"
#step('I go to "(.*?)"$')
def go_to(step, url):
if url == "BBC Login page":
urladjusted = msr_login_page
with AssertContextManager(step):
world.browser.get(urladjusted)
This fails with an error, and I don't appear to be able to set the URL variable at all no matter how I try to set it.
Thanks in advance for any help
Why not use a dictionary mapping for your page URLs by name?
urls = {"BBC Login page": "https://www.bbc.co.uk/login"}
#step('I go to "(.*?)"$')
def go_to(step, page):
with AssertContextManager(step):
world.browser.get(urls[page])
Related
I have a login page for a flask app with cloud database, I want to test the results after logging in, specifically, I want to test the HTML elements after logging in. I have seen people test return status code or using assertIn to check if data exist.
Is there a way for me to target a specific HTML tag, like <h1 id="userTitle"> </h1> from rendered templates after POST username, password to the route function login()
def test_users_login(self):
result = self.app.post('/login', data=dict(username='Nicole', password='abc123'), follow_redirects=True)
# I want to check the HTML tag's text value data after logging in
self.assertEqual(result.data.getTag("h1", b"Nicole") #What I imagined using <h1>
self.assertEqual(result.data.getId("user", b"Nicole") #What I imagined using id
#This returns true which is okay, because 'Nicole' exists in the whole page
self.assertIn(b'Nicole', result.data)
In my rendered jinja2 template I have this which is after logging in.
<h1 id="userTitle">{{ session['username'] }},Welcome!</h1>
I guess assertIn works well, but I just want to know how to test an HTML tag without running a browser test.
Although I didn't get a correct answer from here, but I just managed to do the unit-test with just assertIn, by checking the contents of the page.
Thanks everyone
I have been working on this for couple days now. Can not really find how to make this work. I am fairly new to aspx websites and fetching information out of them.
I am trying to login/authenticate on a website that uses aspx pages. So I followed this thread which really helped me get this in motion. (Last Answer)
Following those directions, I write:
url = "http://samplewebsite/Main/Index.aspx" # Logon page
username = "user"
password = "password"
browser = RoboBrowser(history=True)
# This retrieves __VIEWSTATE and friends
browser.open(url)
signin = browser.get_form(id='form1')
print(signin)
This is the outcome of that print statement:
<RoboForm __VIEWSTATE=/wEPDwULLTE5ODM2NTU1MzJkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQlidG5TdWJtaXRriD1xvrfrHuJ/0xbQM08yEjyoUg==, __VIEWSTATEGENERATOR=E78488FE, adminid=, btnSubmit=, pswd=>
So it is obvious that I am retrieving the information correctly. Now I have 3 input fields:
adminid
btnSubmit
pswd
Which I can use in the following manner:
signin["adminid"].value = username
signin["pswd"].value = password
signin["btnSubmit"].value = "btnSubmit.x=29&btnSubmit.y=22"
My only problem is the last field btnSubmit which I do not know how to input a value since this is of the following type:
<input type="image" name="btnSubmit" id="btnSubmit" tabindex="3" src="../image/login_btn.gif" style="height:41px;width:57px;border-width:0px;" />
when I submit on the website, using the Chrome Tools I get the following outcome:
__VIEWSTATE:/wEPDwULLTE5ODM2NTU1MzJkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBQlidG5TdWJtaXRriD1xvrfrHuJ/0xbQM08yEjyoUg==
__VIEWSTATEGENERATOR:E78488FE
adminid:user
btnSubmit.x:23
btnSubmit.y:15
pswd:password
Where basically the x,y positions are where I clicked on the page. Really do not know how to do this request through Python. Used this to no avail.
When you click on an input object of type image, two form values are set, the button name plus .x for the first, and .y for the other.
However, pressing Enter in a regular text input field will also submit a form, so you don't have to click on a submit button. I'd just leave the value empty altogether.
There is not much flexibility in the way robobrowser handles form submits, to avoid using the submit button you'd have to delete it from the form outright:
del signin.fields['btnSubmit']
before submitting.
If you must submit using the image button, then you'll have to teach Robobrowser how to handle that type; currently it has no handling for these. The following adds that:
from functools import wraps
from robobrowser.forms import form
from robobrowser.forms.fields import Submit, Input
class ImageSubmit(Submit):
def serialize(self):
return {self.name + '.x': '0', self.name + '.y': '0'}
def include_image_submit(parse_field):
#wraps(parse_field)
def wrapper(tag, tags):
field = parse_field(tag, tags)
if type(field) is Input: # not a subclass, exactly this class
if field._parsed.get('type') == 'image':
field = ImageSubmit(field._parsed)
return field
return wrapper
form._parse_field = include_image_submit(form._parse_field)
at which point you can use browser.submit_form(signin, signin['btnSubmit']) to submit the form and the correct fields will be included.
I've submitted a pull request to the robobrowser project to add image submit support.
I am trying to develop a script with python to web scraping some information on a specific website for learning purposes.
I went over a lot of different tutorials and posts, trying to gather some insights from them, they are very useful but still didn't help me to find a way to log in the website and do searches with different keywords.
I tried to use different APIs, such as requests and urllib, maybe I didn't find the right way to solve it.
The steps lists as follow:
login information set up
Send login information to the website and get response for future use
keywords setup
import header
set up cookiejar
from login response, do the search
After I tried, it will work randomly, and
here is the code:
import getpass
# marvin
# date:2018/2/7
# login stage preparation
def login_values():
login="https://www.****.com/login"
username = input("Please insert your username: ")
password = getpass.getpass("Please type in your password: ")
host="www.****.com"
#store login screts
data = {
"username": username,
"password": password,
}
return login,host,data
The following is for getting the HTML file from a website
import requests
import random
import http.cookiejar
import socket
# Set up web scraping function to output the html text file
def webscrape(login_url,host_url,login_data,target_url):
#static values preparation
##import header
user_agents = [
***
]
agent = random.choice(user_agents)
headers={'User-agent':agent,
'Accept':'*/*',
'Accept-Language':'en-US,en;q=0.9;zh-cmn-Hans',
'Host':host_url,
'charset':'utf-8',
}
##set up cookie jar
cj = http.cookiejar.CookieJar()
#
# get the html file
socket.setdefaulttimeout(20)
s=requests.Session()
req=s.post(login_url, data=login_data)
res = s.get(target_url, cookies=cj,headers=headers)
html=res.text
return html
Here is the code to get each links from html:
from bs4 import BeautifulSoup
#set up html parsing function for parsing all the list links
def getlist(keyword,loginurl,hosturl,valuesurl,html_lists):
page=1
pagenum=10# set up maximum page num
links=[]
soup=BeautifulSoup(html_lists,"lxml")
try:
for li in soup.find("div",class_="search_pager human_pager in-block").ul.find_all('li'):
target_part=soup.find_all("div",class_="search_result_single search-2017 pb25 pt25 pl30 pr30 ")
[links.append(link.find('a')['href']) for link in target_part]
page+=1
if page<=pagenum:
try:
nexturl=soup.find('div',class_='search_pager human_pager in-block').ul.find('li',class_='pagination-next ng-scope ').a['href'] #next page
except AttributeError:
print("{}'s links are all stored!".format(keyword))
return links
else:
chs_html=webscrape(loginurl,hosturl,valuesurl,nexturl)
soup=BeautifulSoup(chs_html,"lxml")
except AttributeError:
target_part=soup.find_all("div",class_="search_result_single search-2017 pb25 pt25 pl30 pr30 ")
[links.append(link.find('a')['href']) for link in target_part]
print("There is only one page")
return links
The test code is:
keyword="****"
myurl="https://www.****.com/search/os2?key={}".format(keyword)
chs_html=webscrape(login,host,values,myurl)
chs_links=getlist(keyword,login,host,values,chs_html)
targethtml=webscrape(login,host,values,chs_links[1])
There are total 22 links and one page containing 19 links, so it is supposed to have more than one page, if the result "There is only one page" shown up, it indicates a failure.
Problems:
The login_values function is to secure my login information by combining all functions to a final function, but apparently, the username and password are still really easy to show just by print() command.
This the main problem!! Like I mentioned before, this method works randomly. By the way, what I mean not working, it is that the HTML file is only the login page instead of the searching result. I want to get a better control to make it work most of the time. I checked user-agents by print agent every time to see if they are relevant, and it is not! I cleared cookies with suspicious to full storage memory, and it is not.
There are sometimes I facing max trial error or OS error, I guess it is the error from the server I was trying to reach, is there a way I can set up a wait timer for me to prevent these errors from happening?
I am trying to use requests (python) to grab some pages from a website that requires me to be logged in.
I did inspect the login page to check out the username and password headers. But I found the names for those fields are not the standard 'username', 'password' used by most sites as you can see from the below screenshots
password field
I used them that way in my python script but each time I get a 'wrong syntax' error. Even sublimetext displayed a part of the name in orange as you can see from the pix below
From this I know there must be some problem with the name. But try to escape the $ signs did not help.
Even the login.aspx header disappears before google chrome could register it on the network.
The site is www dot bncnetwork dot net
I'd be happy if someone could help me figure out what to do about this.
Here is the code`import requests
import requests
def get_project_page(seed_page):
username = "*******************"
password = "*******************"
bnc_login = dict(ctl00$MainContent$txtEmailID=username, ctl00$MainContent$txtPassword=password)
sess_req = requests.Session()
sess_req.get(seed_page)
sess_req.post(seed_page, data=bnc_login, headers={"Referer":"http://www.bncnetwork.net/MyBNC.aspx"})
page = sess_req.get(seed_page)
return page.text`
You need to use strings for the keys, the $ will cause a syntax error if you don't:
data = {"ctl00$MainContent$txtPassword":password, "ctl00$MainContent$txtEmailID":email}
There are evenvalidation fileds etc.. to be filled in also, follow the logic from this answer to fill them out, all the fields can be seen in chrome tools:
I've been writing a program to log in to Facebook and update the status as a side project. I managed to get the program to login. However, I'm having trouble selecting the textarea that ends up being the "Enter your status here" box. Using "Inspect Element" in Chrome, I'm able to see the form under which it's located, but listing the forms in the program doesn't seem to list said form...
import mechanize
import re
br = mechanize.Browser()
usernamecorrect = 0
while usernamecorrect == 0:
username = raw_input("What is the username for your Facebook Account? ")
matchmail = re.search(r'[\w.-]+#[\w.-]+', username)
if matchmail:
print matchmail.group()
usernamecorrect = 1
else:
print "That is not a valid username; please enter the e-mail address registered with your account.\n"
password = raw_input("What is the password for your account?")
print "Logging in..."
br.set_handle_robots(False)
br.open("https://www.facebook.com/")
br.select_form(nr = 0)
br['email'] = username
br['pass'] = password
br.submit()
raw_input("Login successful!")
print "Forms: \n"
for f in br.forms():
print f.name
The full output is as follows:
What is the username for your Facebook Account? myemail#website.com
What is the password for your account? thisisapassword
Logging in...
Login successful!
Forms:
navSearch
None
I took a look through the source of Facebook via Inspect Elements again, and "navSearch" is the "Find People, things, etc." search bar, and the unnamed form appears to have to do with the logout button. However, while Inspect Elements gives at least 2 more forms, one of which holds the status update box. I haven't been able to determine if it's because of JavaScript or not (while the status update box code block is encapsulated in , so are the navSearch and logout forms.) The most relevant thing I've been able to find is that navSearch and the logout forms are in a separate div, but I somehow feel as though that shouldn't be much of a problem for mechanize. Is there just something wrong with my code, or is it something else entirely?
Is there just something wrong with my code, or is it something else entirely?
Your whole approach is wrong:
I've been writing a program to log in to Facebook and update the status
That’s what the Graph API is for.
Scraping FB pages and trying to act as a “browser” is not the way to go. Apart from the fact, that FB policies do not allow that, you see how difficult it gets on a page that uses JavaScript/AJAX so much.
Go with the API, it’s the easy way.