I am new in Python and web scraping, but I keep learning. I have managed to get some exciting results using BeautifulSoup and Requests libraries and my next goal is to log into a website that allows remote access to my heating system to do some web scraping and maybe extend its capabilities further.
Unfortunately, I got stuck. I have used Mozilla's Web Dev Tools to see the url that the form posts to, and the name attributes of the username and password fields. The webpage url is https://emodul.pl/login and the Request payload looks as follows:
{"username":"my_username","password":"my_password","rememberMe":false,"languageId":"en","remote":false}
I am using requests.Session() instance to make a post request to the login url and using the above-mentioned payload:
import requests
url = 'https://emodul.pl/login'
payload = {'username':'my_username','password':'my_password','rememberMe':False,'languageId':'en','remote':False}
with requests.Session() as s:
p = s.post(url, data=payload)
print(p.text)
Apparently I'm doing something wrong because I'm getting the "error":"Sorry, something went wrong. Try again." response.
Any advice will be much appreciated.
Related
Trying to log into the website kiphideways.com using Requesrs andI am having trouble logging in.
Without needing an account or password, is there any way to tell if I am missing anything from the payload?
LOGIN_URL = 'https://www.kiphideaways.com/login'
URL = 'https://www.kiphideaways.com/my-kip/account/'
I set the following for payload
payload = {'log':"myemail", 'pwd':"mypass"}
I then go do
with requests.Session() as s:
p = s.post(LOGIN_URL, data=payload)
r = s.get(URL)
I can’t log in as the my account page is not populated with my information.
Is there anything wrong with my payload?
By analyzing the POST request in Chrome tools, I see that the complete payload when trying to login through the website forms is:
log=test&pwd=test&rememberme=forever&wp-submit=Log+In&redirect_to=https%3A%2F%2Fwww.kiphideaways.com%2Fmy-kip%2F&mepr_process_login_form=true&mepr_is_login_page=true
Besides it, there are some cookies caught from the browser session. If you want to do the request externally, you should provide all of that.
Though, I can't replicate what happens when the account is good because the account creation seems to be paid :/
HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?
Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.
I'm trying to scrape multiple financial websites (Wells Fargo, etc.) to pull my transaction history for data analysis purposes. I can do the scraping part once I get to the page I need; the problem I'm having is getting there. I don't know how to pass my username and password and then navigate from there. I would like to do this without actually opening a browser.
I found Michael Foord's article "HOWTO Fetch Internet Resources Using The urllib Package" and tried to adapt one of the examples to meet my needs but can't get it to work (I've tried adapting to several other search results as well). Here's my code:
import bs4
import urllib.request
import urllib.parse
##Navigate to the website.
url = 'https://www.wellsfargo.com/'
values = {'j_username':'USERNAME', 'j_password':'PASSWORD'}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
the_page = response.read()
soup = bs4.BeautifulSoup(the_page,"html.parser")
The 'j_username' and 'j_password' both come from inspecting the text boxes on the login page.
I just don't think I'm pointing to the right place or passing my credentials correctly. The URL I'm using is just the login page so is it actually logging me in? When I print the URL from response it returns https://wellsfargo.com/. If I'm ever able to successfully login, it just takes me to a summary page of my accounts. I would then need to follow another link to my checking, savings, etc.
I really appreciate any help you can offer.
I am planning to write a website crawler in Python using Requests and PyQuery.
However, the site I am targeting requires me to be signed into my account. Using Requests, is it possible for me to establish a session with the server (using my credentials for the site), and use this session to crawl sites that I have access to only when logged in?
I hope this question is clear, thank you.
Yes it is possible.
I don't know about PyQuery but I've made crawlers that log in to sites using urllib2.
All you need is to use cookiejar to handle cookies and send the login form using a request.
If you ask something more specific I will try to be more explicit too.
LE:
urllib2 is not a mess. It's the best library for such things in my opinion.
Here's a code snipet that will log in to a site (after that you can just parse the site normally):
import urllib
import urllib2
import cookielib
"""Adding cookie support"""
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
"""Next we will log in to the site. The actual url will be different and also the data.
You should check the log in form to see what parameters it takes and what values.
"""
data = {'username' : 'foo',
'password' : 'bar'
}
data = urllib.urlencode(data)
urllib2.urlopen('http://www.siteyouwanttoparse.com/login', data) #this should log us in
"""Now you can parse the site"""
html = urllib2.urlopen('http://www.siteyoutwanttoparse.com').read()
print html
trying to authenticate to a website and fill out a form using requests lib
import requests
payload = {"name":"someone", "password":"somepass", "submit":"Submit"}
s = requests.Session()
s.post("https://someurl.com", data=payload)
next_payload = {"item1":"something", "item2":"something", "submit":"Submit"}
r = s.post("https://someurl.com", data=next_payload)
print r.text
authentication works and i verified that i can post to forms but this one i am having problem with gives The action could not be completed, perhaps because your session had expired. Please try again
Attempted in urllib2 and same result -- dont think its an issue with a cookie.
I am wondering if javascript on this page has something to do with giving session error? other form page doesnt have any javascripts.
Thanks for your input...