scraping data from webpage with python 3, need to log in first

scraping data from webpage with python 3, need to log in first - python

I checked this question but it only has one answer and it's a little over my head (just started with Python). I'm using Python 3.
I'm trying to scrape data from this page, but if you have a BP account, the page is a lot different/more useful. I need my program to log me in before I have BeautifulSoup get the data for me.
So far I have
from bs4 import BeautifulSoup
import urllib.request
import requests
username = 'myUsername'
password = 'myPassword'
from requests import session
payload = {'action': 'Log in',
'Username: ': username,
'Password: ': password}
# the next 3 lines are pretty much copied from a different StackOverflow
# question. I don't really understand what they're doing, and obviously these
# are where the problem is.
with session() as c:
c.post('https://www.baseballprospectus.com/manageprofile.php', data=payload)
response = c.get('http://www.baseballprospectus.com/sortable/index.php?cid=1820315')
soup = BeautifulSoup(response.content, "lxml")
for row in soup.find_all('tr')[7:]:
cells = row.find_all('td')
name = cells[1].text
print(name)
The script does work, it just pulls the data from the site before it's logged in, so its not the data I want.

Conceptually, there is no problem with your code. You're using a session object to send a login request, then with the same session you're sending a request for the desired page. This means that the cookies set by the login request should be kept for the second request. If you want to read more about the workings of the Session object, here's the relevant Requests documentation.
Since I don't have a valid login for Baseball Prospectus, I'll have to guess that something is wrong with the data you're sending to the login page. A quick inspection using the 'Network' tab in Chrome's Developer Tools, shows that the login page, manageprofile.php, accepts four POST parameters:
username: myUsername
password: myPassword
action: muffinklezmer
nocache: some long number, e.g. 2417395155
However you're sending a different set of parameters, and specifying a different value for the 'action' parameter. Note that the parameter names have to match the original request exactly, otherwise manageprofile.php will not accept the login.
Try replacing the payload dictionary with this version:
payload = {
'action': 'muffinklezmer',
'username': username,
'password': password}
If this doesn't work, try adding the 'nocache' parameter too, e.g.:
'nocache': '1437955145'

Related

how to use python requests to login to website

Im trying to login and scrape a job site and send me notification when ever certain key words are found.I think i have correctly traced the xpath for the value of feild "login[iovation]" but i cannot extract the value, here is what i have done so far to login
import requests
from lxml import html
header = {"User-Agent":"Mozilla/4.0 (compatible; MSIE 5.5;Windows NT)"}
login_url = 'https://www.upwork.com/ab/account-security/login'
session_requests = requests.session()
#get csrf
result = session_requests.get(login_url)
tree=html.fromstring(result.text)
auth_token = list(set(tree.xpath('//*[#name="login[_token]"]/#value')))
auth_iovat = list(set(tree.xpath('//*[#name="login[iovation]"]/#value')))
# create payload
payload = {
"login[username]": "myemail#gmail.com",
"login[password]": "pa$$w0rD",
"login[_token]": auth_token,
"login[iovation]": auth_iovation,
"login[redir]": "/home"
}
#perform login
scrapeurl='https://www.upwork.com/ab/find-work/'
result=session_requests.post(login_url, data = payload, headers = dict(referer = login_url))
#test the result
print result.text
This is screen shot of form data when i login successfully

This is because upworks uses something called iOvation (https://www.iovation.com/) to reduce fraud. iOvation uses digital fingerprint of your device/browser, which are sent via login[iovation] parameter.
If you look at the javascripts loaded on your site, you will find two javascript being loaded from iesnare.com domain. This domain and many others are owned by iOvaiton to drop third party javascript to identify your device/browser.
I think if you copy the string from the successful login and send it over along with all the http headers as is including the browser agent in python code, you should be okie.

Are you sure that result is fetching 2XX code
When I am this code result = session_requests.get(login_url)..its fetching me a 403 status code, which means I am not going to login_url itself

They have an official API now, no need for scraping, just register for API keys.

Logging in to site with Python

I'm trying to use Python to scrape a website, but I have to login first before I can get to the page with the data on it.
The URL for the login page is:
https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier
I have read numerous threads which seem to answer the question, but I'm struggling to relate it to my own situation.
The code I have (from a response in this thread) is:
import requests
# Fill in your details here to be posted to the login form.
payload = {
'Username': 'user',
'Password': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('https://tunein.com/account/login/?returnTo=https://amplifier.tunein.com/sessions/new&source=amplifier', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
I have looked at the source code to see what the name of the form fields are, hence the 'Username' and 'Password' attributes in the payload variable.
When I run the script, p.text just returns the HTML of the same page so it obviously isn't logging in correctly. Any suggestions? Is there a better way to do it?
Edit:
The "Form Data" headers once I log in are:
Username:user
Password:pass
Remember:true
Remember:false
btnLogin:Sign In
ReturnTo:https://amplifier.tunein.com/sessions/new
Source:amplifier
Does this mean I have to add all of these to my payload variable?

Trying to access a password protected url using python

I've had problems accessing www.bizi.si or more specifically
http://www.bizi.si/BALMAR-D-O-O/ for instance. If you look at it without registering you won't see any financial data. But if you use the free registration, I used username: Lukec, password: lukec12345, you can see some of the financial data. I've used this next code:
import urllib.parse
import urllib.request
import re
import csv
username = 'Lukec'
password = 'lukec12345'
url = 'http://www.bizi.si/BALMAR-D-O-O/'
values = {'username':username, 'password':password}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url,data,values)
resp = urllib.request.urlopen(req,data)
respData = resp.read()
paragraphs = re.findall('<tbody>(.*?)</tbody>',str(respData))
And my len(paragraphs) is zero. I would really be grateful if anyone of you would be able to tell me how to access the page correctly. I know that the length being zero isnt the best indicator, but also the len(respData) if I use values as stated in my code or if I take it out of my code is the same, so I know I have not accessed the page through username, password.
Thank you for you're help in advance and have a nice day.

There are two issues here:
You are not using POST, but GET for the request.
There is no <tbody> element in the HTML produced; any such tags have been automatically added by your browser, do not rely on them being there.
To create a POST request, use:
req = urllib.request.Request(url, data, method='POST')
resp = urllib.request.urlopen(req)
Note that I removed the values argument (those are not headers, the third positional argument to Request() and you don't pass in a data argument when using a Request object.
The resulting HTML returned does not necessarily include the same data that is sent to a browser; you probably need to maintain a session here, return the cookies that the site sets.
It is far easier to do this with better tools such as the requests library and BeautifulSoup (the latter lets you parse HTML without having to resort to regular expressions), which can be combined with the robobrowser project to help you fill and submit forms on websites.
Note however that the page forms and state are managed by ASP.NET JavaScript code and are not easily reverse-engineered even by robobrowser. When you log in with a browser (which has run the JavaScript code for you), the POST looks like this:
{'__EVENTTARGET': ['ctl00$ctl00$loginBoxPopup$loginBox1$ButtonLogin'],
'__VSTATE': ['H4sIAAAAAAAEAO18zXPbSJKvBYmkPixLdrfVs909EkbTPZa7JYrfH27bsyAJSRA/QJEUbXNmVg8kIAoSCNAASFmanXd7t+4XsaeNjY2NcEfsfU8bsZeek3V8t72+/+D9D/t+CYAU9WW7pydidzZMm0AhKzMrMysrqwqV4n+Mzf1s3Le4t5c1dNs0NKuivOypplI2LDsjtY7yysne3oLP92XL1kKhL9xrVZHM1gEn9yW9pcjhL1rNWqUidhXd9+CdaFnNsBTZt/JOxIxmtI6A+dXbMT1Iy1b7iu9X74Nb5n0PR/Fa3YOipOqDe9bQvij2NFutq8pxeG5O3pd9/Dvws0anK+knOcWWVM3KmGjxQLFyki2FL/Oa802vhRPRZCoejsfmFpjFOxsmTK7odjWfE3JW5P+O3Rq7devWf+BDd/pMUOF/Vk8sW+kE0Z6mQF1Dt4Kbiq6YaitYUC37f4R/8xsPRdDtaGSV7Vgtw9TU5ipbV0wLBE9iwRD9W2WzEKpnKk90pWebkrbKlntNTW2ht2vGkaI/aSaTUrwVT4TT0ZgSSqV/97txiODfU8He8u1Z6qkyudd3uQZu3ZqcnJxigDDmfefoYQLfSYdu5DOzwOzPkdr3N7maCQdTTM94NdXWFN9sRtI6ksnKQQP/ZMKWF27T5R7jI7pAC44Ka/n+bcxFXWW7pqGe9g1ZP5RWWdtsG31Vl1hVZy3bMFW7r3jcVtm8Kpvqm++UvsT2oK7ERmKZVTYaCoXYrKIdKkE2J/XffOdSdyRbdcpn39tKX9WOwL1rWJrR11Wq30crOhBUQGXJPnLuh4p9KLH9AaLSYSULnQOBe2xTPVWDlhqUGT80WWJ83485ra6ystfqSEvXtX4E1aUjW12FZpLds0CoHEp9HWN1lcWQWWVPpb5yKulKCyU2l6uvXpWS7KUeGDKVZKNJ5jgCa6mr2uQImmLrBnBN4813qmbIzDRZfaKmvLJ9k0FPBZmZIQ3GfX4C0PNt93nCNnuKzMy6TysHtt19tL5+fHw8oFxX9X3D7Egt9VBZ76rWkQGR1mXmjkvxr9OPrZapdm3WPukqT5ZtNLsOFSUXuvx0dprFRzZavQ5sGjxG/yqavrJ8gey3l+l+u8xaZgtwT6BTpQO791U5qEuHsiMX9F8vVRuDirVoMBwNRoOH1q/DwcRvl58+Xma/ZpfXXX5Plx9+MzvtytKHt3akLvuE1Xua9s05sH10FabCe69C4cUYBxfh+z3dGeSsdWAcF6Vu2YRYyvFKS9KhlSCvsi3DMOXn3v0F+t4wOqtsz0QnujTE9CH7e5ffiKhZhwWa+2LlwS8fQK0Bz4ffnOOq++zKEDdIA37lIfvkCRt6eI5DH1NBINGH1qDPOZmqIz5t1YoFNLa8/M0FlIvq0ueywRwxrhjMJb9osCuMlWN2pDNXrlMQmEFJlsuS3oDhvOlvpVTdK3OlhigW92ovynzk4RX5LrIObuZXLnbE5TYsxc7CVRVzpX3kdtJlM/9ipLveadyBQS6JIQAs2Kq18nsiwph/tC9pFkIL4VcV+9FyV9WX8ajLyqtHYfYPD6815yWurlCO4L93OD2iC+KGKbXbUlNTBq0cGJgM3IfrWJOh+T6sUHBiIVgutxB/jyDQ0M9XOgOHZY8hpXEcNLCiWHH8eXmvqUn6EUbde3LvGD1LIZlGWhhp4IuVZVnt/0aV/+bJA1q3FKQTzGV7zm2vjtnVMPcqhmGTV2CKV8wHv1t+GGwdqJpsQqaHQeXlSuhhULJttGXTxIV2lsumeiSd/VFlEbWvRphrhHcc0LOxJ9xV5eA/VYVmfEVeoeg6QPnD7PTjQTiSmXk3fs4goCOm9iUQyMxdZ8KsI0Cjqw5khnGR/m7smiDb1aDoMtvSJMt6stxqrUm6pJ3YastaftqV2oo3WQf3IE9dgto93VTaZHBTkaHbUJZ3BPBWSwqaiqCr9soIkcyMY4qfZMbouk9i+xxJfBOmccwuME6Fsxag5QOuiz+TerbRwtIBE5ayZilmX20pa/AWX+KaWcert1CgJWBQsjqv1rmsuyDEAqRj6DLaoIY/3lc1KJWVbKWN5YNikVdF/nn8yqrsNi1yfp3pWXAQyzpfZIXfd5FFwctZ2v3D2Hus7ThZatlSMJNTzeCg0eAzpRmsogCKoCfxyYYjvwB+q+xPlO5tyz4s7+J/gky/+R2ZDevF0YVhwHkYLnejfwJf4jojL/gdT/nC9ZcRr0HR8f3xOnYfTGAUjqHjX5ylxTotcjZURZN9E2JZqC7eIWBd0nqKC2WE3OLM3i8ImjF6utyW5xZuM2MMw4wzE4yP8TMBZpKZYqaZGeb2wu1539SmWOFL29U852NC6fa8bxIA4ew1PYfDeL6/vbsp1htbYq7EsdWCWOdLwjYhJwk5L1ZEFzkUxfO9klircMROXMtXOI9NCDV3xAxXKI1CI4DeFasVPlfa5l3GrhQpVEyVxVyFq7sAwpwsi8XdivtMYk1VubowkDsGwFwVJOBUFqsDugTRNbjhc7y9MLYvM1OXTbs06fOLmT2hyC9N+XxUyi1Nt+X5+/8+dus2t50T62ev33wrlDg/V+befMtPZvhCTShlBRRKfE7I1+5mhGqtImQ5VsywVbFWECYyBT7nyxTEPO/PiFsQdSoDW9VLQPJloGx2OlPhSG0wnMhUctxUpsI3xDrqJ1E6+0HI8oEsV8qLdc6X5Qvb/P0sX8njzqIjBr3GFycJSlz9TkGcphtsuc3lp7JCJb9b4Er8zJtvgX/2WqxnhXsogwP+e53HF6cIJBbRyGSOhyIlIR/ICXXoyvlyYmabn8a1AtOidT8VS9x9ukG0NQC3xDq7WeFyfHbKhYL+bk4ssOWKwBa2dzOFba4kEMSVma2J5QLUm8yJxbMfuAIfyIlgWeemqM9FYuXP7ZYLfP4jV0+OrXNVagl8+GnXQR1DorjN38GVcEC2KZa4GedRcCSa88plvlYh3ScIOLVZEatgv80HttC76Nc7WyJ6Ya1aEJz+8cPZYRjcKtu7hckt9FMNKt3ZqpSpH9byYoOcQchVMBIYYXNeKAjkmuzAC+Zgu9Kbb91+IjmFhljgJuHmGDzQe5tv8CAQp7Z3K2evyZP8ea6IJnx5rsQVpvPEGhLVxUAe46Yi5OguoG0MOIJv8748+qsUcK45Dk9lvvJxXqzWMPp4ciKngyt5cAaQLwCjQVQ0Omfz7hh1xOOm8jDnD3ydhIA0ebS5W9oUa2C62yhy/gINWNGPvuMqtUCBmqsDKtSg/NSgdwFAqSRO4laDTJVAQcT2mc/exn0N3bpJwNmCEyxYeAIMeNt5QueINTF/DyMBXZ0lscuiY3lfYZcGW2E374yKqSK3ffYaaub9RS4PXwoUYRkMqknc89S7gSKfq4s5VPClTf7sNe4YiXkOdxpO3KdFIV/g6mc/UCNueOGLLLnV7r2igNbXRg3oA6jETTtXONCbbyeLAjwEnna3KOYrYgmK1So8YgHULsKQNJLnncKIjweKcBY46awbvRAcMjDtRHEX9i1x+YIYKHENDggzJYwmz1+mURbZIoaiGKA4CAeZAjux5ninWCmKZz9MilWhQM+BMl91BkNZACau9Tw3g1gIaWqVGp+nUFrgt+BdiKG5Ol/Ic1OkMwZ3iQ+g1ODhmwibNXG7BG5QSSzkJnHH7hzWRaHOwfUmyrXd7UB5NytCnOkKBcK1jQpXDNDIx4NzR9Uc7hSlEAnJxoSKCFrYhoj38FziL4Qe1DYcB9zm71XAZhBDnegu3EGnUDsUFDBeAujvksPHvY84y3wF/uY4V7XAIThz0y7A8YuK64u+yu7Za36+yjt2Gwror/JFjLZAlXdGP+4ISggK7py07YW2e8MpajjM7w5APPSBtFl+ukrd5TjbZFUswMR17iOa4gSaQc5tEoCbYazxuBM691m1HmQRoshIbN2ZZoOeL2xNVet8jWMh0c+pJKDktHEZ76/cWoQUYZsaohY50kj4bLTmEtVttw5jFiwnz15zBbLYXchWRNDkK2v1Cvwiw6GKumK34hRqiEUzTmGbr5T4bReIBlxgZRfDtDZJcSOL4HCbChuI0JgNuWnnwXGHj85eF8n1nXlimy+g9a0BjJxjMIbmHVitJJJaZS4vfDQEODMMhSJhCn199ppGrx83TBu+GofgEKgJFGE5f00sFIXSZK2SIWfkAzR0MdJvY7YTaX1AU4yvVkFkRxV68M2307VdzIROxwUwbOAK/CzuCCgUM8hDnCeepQjpIoGQ3/bVhRxf9NeFMno/UBeciOJHYIIquG3T7FrHUCWHrldoYHKT9d0GTQ3cXINz5rTBGPA3uHrlzbf+Bo0N3u/MltlptFLgGyAUAijWAPNB4oowTVd3jTFz9sMu6ooZxEtax4yFx8Pp+FhkPBxL4Zsei46H46Gx2Fh8LIFSeCw5lhpLoxTBsg+LMyYcZcIxPAMvnWDCcSaMa5IJp5gwocWYSAi3OBMB43iCiYBzPMlEokwEVQADhMfUeCSEW5qJhpgoYaaYKGGCRQL0CYASESYaY6JxJppgokkmCow0EwsxMaqLMrEIE8M1xsTiTCzBxJJMLMXEiD7GxMEinUQxzsTDDEQnaRMJBtJBMkgFidBiPM0kQkwizKApcEzEIFSKAVECqieSDP3HM3imU0ySxMItzCQjTDLKJGGERJpJxscjYUieDDFJUKUBSTJJ4KWZVIhJhZlUhElFmVSMScWZFDDAIJVkUikmlWbSISYdZtJEHmHSUTSPRsA7Dd7JGJOO4xZnYOZ0koEIaUhCGCFYIASiELRCo5FQGl+CERnUDqGZEGhDVEavJvEFTjgM2jDwSN4waKkfk4CnQqAnGHQJkRSAp4CXIhj4h6ke/QCSFJGBbRgsqb/R2RFiSb1NXR1BfYTsCBEiEIE0iVAZNLABSYC+D0dAGkU5BRQyCrkA9b/TBFCjJAHwoqiPUj3B6JmkxTcGWvKDGPktaGLATxOMuoDgwCezpqMffz3+72Pt/4YfWWY+G2xYJs43LIyYw2aFcXcqcwtzN+3xmFnmDoP6ed9YCFsg2oqNhb091FjE21yNRb3921jc202hB50NE0qRkLeVwnDztn3knt5GjaKIt3Mkt/Z2e+MRgjr7x/G4A6VmJkBGZWpgujR43S/5MILcndn4ULuc+GfULjbQLj3ULj3QLpYeaJdOD7VzgK52DtTVjqCudjGCutqlnSI140un3Ycb1fv51R3/cMuvop+XBuo7e/tRA/jODbDgw143Z3SkN9+pgzMRKI02Z2q9w3MIiTRXVM6+l/QRPDJDoKSc6pJuwB60v8+7daylsLrCyoZmvPlOgtHI835xo0T+0S5Z8GPrnzN7Z39sKrrEapJl66ouuWLNNyRLaV4Ak2z3GpJMFBcqSLx5V+hRMFn0bs48+6PUvwCnHvx8eG4ksQZrNDX1SB2gKKQi1Pil+0IuIOpZeqfru0uvswtGW9WLoNRWHn4jMw+gKpCWZq6rBY+HczdXM1+/rXLtHZzD76iPv6M+/bbGv3lb5ZPRwHIj3uh7kokCX6Oh6d6dlyTjE5iUonSJ0CV8BYDQPI5A5o3wC1EsQG61VxWGnnWDCPP3ubvqaVOhszT2iK6meriwtraGtThWYlg0O/s2jgVosmoqsmX3pY9zhtZ2TgnhNJYH5EqK0UW5o9jKCJhOS+UhusLSGWYfdwwFiw5JWVM67SiHdFSbEkfoDUs3+pf46JJmOCOBBhRwO3RKCmdczI00MDj0VIboH49WQ8FDCY3dz5vw7MtKfDVQkV2xjnpd8KbDVEO2e5BQ6T+kp64J2x0a/gbYt5WlET7XNb54oX5wNDwUYz4HZc0RCVYu4L/FWp+LzT7CjWHZKgzM9lXT6HtclH7Aiz1zomPGYZxKVJRTxew7R7CXe6YrWTe1tTBqQWPY7icXZD2HX9ThLXw/JkcTGnmuwe5Wy/yW42YLZe+MmlVPPYNJ+qH06ZtvVdJ1cIKNXvG6QuGq5AxKpymxbB8WcI1ySv0zTBCAcciDFNk4MtBDEgk2zB1QPkGMbavnR/yDRubLA4ghO4Avq7ZpnH2P5pua1HbP6jFkJG0k6UDpzw6xZEWT5kW5q1rquWi3uY5h2uopdYO0cKlpr6GlnHc+z67DDu0edLtgjnsb54f7nrB3z0Eekzsebw9h1n30Kr/ISX2wB1c3FeDIUcGQyW8V8JCl/pE6sPogW4CVNNWT5+6wnwa5D8Wy0UdPQAJjkLpgoBtIaEVvKX06HAMuDOcYhtQ9YHsm24fvmJh2e7p1pHTwgNYOe5+Rc2xgh0kv/AQWTsIVaGdGPvJzqjuHkPOcvS6J1ZpTi1lcpnluYPBPLj07h3RSN+jnM0Itx91xb9R7XVj2Y9Ir6OE4oitt6Wfo0aDTm5cqPr8oSU6EFENBlhzrSbCeTvEMLXvjEPGlIvIPr9YPBjDVcw+/dpiL/IaQFbD7ZzdFehfPVdwGyiK9cBhE5082DYuma9Phc+4ol0yFfTV20UMJP9lSbRPhwVD2gyxm+L7quujncPwulgvkCFcqR2PaNdVfUYvVmlgmyQrOK156NUEtOt2ZpddprtCfV12be+sKDBln6DphNPhglA9Zte6KLZTqPO5DJp8OmLhzhsNB1fuKZQcfXTJgnisL2NXTi6WymK8INYehy1vYYumFUrXG14nrohdCg7AlTYwOW7QQdPooeJ+7dQuTcIqm41SMLnG6JOiSpEuKLmlc0iG6hOlC03aaKNJEkSaKNFGkiSJNFGlQYDFPlzBdIrfoGqVLjC5xuiTokqRLii5EEiaSMJHQ2iBMqwTsTulCFGGiCBNFmJAjhBwh5IjDP0LYEcKOEHaEsCOEHSH+MaeC6KJEFyW6KDUSJbIo1UaJLEpkUSKLElmUKGJEESOKGFHEaDfJ/Tl2buNvW339+h3ruvHhdmFipDzcOnjnV7LsLKwCo0u5CxuMycVxTbMOFsdty2gvTlBTeLAULfJ//oLz+AafucHD5MzAZCOqL90+WJptL905WJr7b6it73wTufhZAVFe2zDMjuXmYoBBVZe66G478v+Y61MDvsmop+pPSQv4Xz86LYCyHQaJvMGhzKvsT5DkHSkAX75H++9z5P/L9+DjHvFTGgtt7CcOHJek5MQDU9n3fXVd2oeXALpOySyQfM3JDl2XmYnhMF8gh/Z7I943tzCxOOVkrR3YHc23kjF0KG7TXqRFO5JDWtSzfUM7NFBwJ78joxNcmmi7wkAsN9lgYnFc0uxL+ayIDGbL8S3nzcSlZNfLEWhiGIdcvtNu5uxMSeqrbSw5d03N94trtP596A/rHeNQWaP11brsJbYs3fd9dJ6I86hvqPJKiPafNwRQxiNavqEFi95zdGBRc92NxQ72r27AHmx61hTdsBWimPAorksSdTSQbLWlG2vewlJZvxCBXCvNeglBsHagZhhaTe36vqoZLJYlIPJ6y01TMlVT0tUO2+saptR0e20xwOuUxSYfLExca4Wlu9fajIxzzYwx6eQqu17lukDA0Fs3vBO5kTW8aGLpZz+GwkcGgCXXr7FkhisUucpabk1cEy93gjuEprzMsarARkOJaCwUjQzTe6ejkWQilIqEQiEnqdd/gwyQd5uy9YpSt6ZaK/FIOJmMr7JYX4RTCOkPLvj5317JwmbZSIzN/K2TDP0A3Cba5wk7/3vssdWVdJbiDzZr3SfL6EtFsdckWTYVy1p+ei27zON1Inu6yl4mhxGwoltrGbKy/JRSvz3MK4ia0ULQsE+WnzqCeWiyF31GREy5lF7qXPfA0JXlET42ZgwX+BTrQDYWi6aHvD7yIhlzf5TpBMy80JFUzTYe9UwVG46/RrjVgrpiLzKmPtotwylqUuhIbSck/MPYNX7g1Dpzh4SYGJSs7qtf81w8Z9RfSFs7DelZ1nxVTm8YKTWX07ZKqV5L4Pu6JdXCm19G9kNWOpLrtyPdlxX7OHlUfN5/tnMQK0RCRyf5Qqa0vXWSNFMprcyVQ9Hq/g6vlEFUV7a04+dNM5o62Dmx0qEvo7Tu+mREUTcAHI6Iq9oqJRuvK6/sYaYeJC0dcx0uF+IEjjO4HEf3I46PcZXjTJPjd7iddqbB8RZXce5weO7ifYfL7HA8z3mv+/1e247/j8wmE+54xuM9X5GzaPrBADx0t/V9xXY2Uwr9FcCRBo+j12WIL26nstF0cDpLcwT+d1S9ZxvsEza8mk6z/G5lECDRzPlC0nGetbIX39gLg4TFPgg+zWIzo7CWQZvOfkt55E4dlIRGOW9Zo6fb+8P16vjAmZzlqhOjpofB4et3BYeReO5aI/CnzBpeNBkLUwwZybsbCw3/KmAs4maJ+b0o6oj9GTM28rIUgE99//S+fryfyti93ZIVfl7aifV3hC6c79gyT+qvpFNzu26VI1uyoZ4C2nhVFjp8vXGYKHOJZq2W2ai+iAKeOYjtb0RzTW278+zLSLNla6WdyE5DK5mb3czLaP7Fq3gx+ywXTbafx8yTrNl2fXn83HX8rhLMwsRFJf7+fZVISfqRHVeeZaXaYcnaiYXr9m7+RVnd2Hq2vxtN7lZ4K90/aeT6z05CnNnfCGdyiX6ef3lY2slkmvvlnePYfnkzcxp/GRcqpWoumkqWas3CK6hT3Xx28CpXTSbqWw11s7fVFa6Iz1wQ+h/fV+hSezPx/MR61T1uFnLF3Xi+3KxGQzCo8MyM67WadLCVr3R60mFms7gFScIHx+Uyt//ypVqt5NvPZCVVfl4vHhQSae2oEI9xG+rRSymhylrkJN8Bfr5eq2Or2+TkTG2n265YVwSfvsHu//KTVIhYfe7woFJoHh/WMulQNHsgmhnIk9isSRsHr7q70ecbcSu78aIC++7z7dP6zm69lTosV0PNciLXPmk8z0nJbVAkX+zU+6excvHgefKkqecytRehbvfVgcaVGr3Obo2M9bxsvTx6W3/8NCeSd7XnYvE01NjdN3dCfSUuHJe4rrHxMnwi5huV/ahy/PzFaTO0D3EbL3byL8KJTKOBEdHXzNSpnjN1e4vPp4uHuVOjpeVf9iKv7FziVepKX9CAvzOYv65dJlwfM8Z/NIXvR1MEfjTFzLlis+dFb/KnwuiLAgDv+r582w4EC/I1Aq5fWDxeeelwcfV//fng/P3Z8b6lnCcXD7OKr08nHuYRX5NAfClz+GrK8Hmu8DBJ+Dw7+HJa8DAf2JWQCaUp3yWUZEJRJhyixJdQignhCmCMCSWYUPzjr8dnL7zYuXhW7Teae2pHodRhKsmD1GFi/yF7+EP28Ifs4Q/Zwx+yhz9kD3/IHv6QPfwXmj1MS5kPCcQfEog/JBD/BX6c3crEyIbO3dz9Se/ORk4cB6dbo3++OrH0V6NvfhfHe6qsDM5RHHzfyPEIvRbZp/dSI69Krj3l9jun3LfoJzy8w5Thrvfi853B32Y77xiZQWFiUPAPtsa+C/tbel/J9Ns+f1pupiMx2X1T472CXZyrS5oqS3S2uWkave4QbfBmYXEE4rsCcd6TDA+E/AOLOW9w/wmBV9JYXeqw7m+ssDr9uTrbV958B+jZ9+cneMHHTXP96cXfXhh86oRpm4res3XV++EVesfr7fDpkKlpKqddTXJynkZOm87Pmt7G/y1VmXfxZTXp4Mh4dANjWe17hxIPmj1NU2zrwQ0NOeg97S21DoamPq0e0DHa6C/dWH3jkC0bpr2vaIf00yMOgmodSZp+nk7qvEM/VA9Yq2sa9BsJRhNGtDSFMu86Drp+2Hu8jibeLURdlVVW7kvDVEtpJEnZE6336C2GHf3cnNxGIju1yiB7ThlJOHyHrI/Xb7Tn43X0zNPhcP3grOwHZ/0v76wB9x0n+65jpJEshovh+Huf6OR0jnog5UBbNm6ukzgn9l3DVo4044OXf/Dy/9yQ/MFhPzjsf3mHff+w7HeXyg76j8qV8063Pr32Z5UoZ2Dd2wS4x/7+H/3bf4GLvyX4/wE1p0V+k1QAAA=='],
'ctl00$ctl00$SearchAdvanced1$ActivitiesAndProductsSearch1$RadioButtonList1': ['TSMEDIA '
'dejavnost'],
'ctl00$ctl00$SearchAdvanced1$DropDownListYearSelection': ['2013'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihDo': ['do'],
'ctl00$ctl00$SearchAdvanced1$SteviloZaposlenihOd': ['od'],
'ctl00$ctl00$SearchAdvanced1$ddlLegalEvents': ['0'],
'ctl00$ctl00$loginBoxPopup$loginBox1$Password': ['lukec12345'],
'ctl00$ctl00$loginBoxPopup$loginBox1$UserName': ['Lukec'],
'ctl00_ctl00_ScriptManager1_HiddenField': [';;AjaxControlToolkit, '
'Version=3.5.40412.0, '
'Culture=neutral, '
'PublicKeyToken=28f01b0e84b6d53e:sl:1547e793-5b7e-48fe-8490-03a375b13a33:475a4ef5:effe2a26:3ac3e789:5546a2b:d2e10b12:37e2e5c9:5a682656:12bbc599:1d3ed089:497ef277:a43b07eb:751cdd15:dfad98a5:3cf12cf1'],
'hiddenInputToUpdateATBuffer_CommonToolkitScripts': ['1']}
That's a lot more information than a simple username / password combination.
See post request using python to asp.net page for approaches on how to handle such pages instead.

Logging with python into a subpage (from another post)

I came across this question: How to use Python to login to a webpage and retrieve cookies for later usage?
So, I'm trying to log into a page, (using the request method, second answer).
When I print the HTML using
print request.text
It will print the HTML of the login page, but not the subpage that I put on request.
Is there a problem with the code (which I don't think) or is it mine?
The code is similar to the one on that question, with different pages and usernames.
Thanks!
from requests import session
USERNAME = 'myuser'
PASSWORD = 'mypwd'
payload = {
'action': 'login',
'username': USERNAME,
'password': PASSWORD
}
with session() as c:
c.post('https://www.bricklink.com/login.asp', data=payload) #Login page
request = c.get('http://www.bricklink.com/orderExcelFinal.asp?') #Page I want to access
print request.headers
print request.text
Output
HTML code for the Login page, but not the page I want to access

Your code isn't sending the correct data on the login request.
Each web page is different, and sends different data in order to log in. Yours should be structured like this:
from requests import session
USERNAME = 'myuser'
PASSWORD = 'mypwd'
query = {
'logInTo': '',
'logFolder': 'p',
'logSub': 'w',
}
payload = {
'a': 'a',
'logFrmFlag': 'Y',
'frmUsername': USERNAME,
'frmPassword': PASSWORD,
}
with session() as c:
c.post('https://www.bricklink.com/login.asp', params=query, data=payload) #Login page
request = c.get('http://www.bricklink.com/orderExcelFinal.asp') #Page I want to access
print request.headers
print request.text
In the future, when you need to work out what data needs to be send on an attempt to submit a form, you should use Chrome or Firefox's Developer Tools. Use these to record your login attempt, and then structure the data accordingly. Getting started with using Chrome's developer tools is a bit beyond the scope of this answer, but there are lots of good resources on the web for finding out how you get this information.

How to "log in" to a website using Python's Requests module?

I am trying to post a request to log in to a website using the Requests module in Python but its not really working. I'm new to this...so I can't figure out if I should make my Username and Password cookies or some type of HTTP authorization thing I found (??).
from pyquery import PyQuery
import requests
url = 'http://www.locationary.com/home/index2.jsp'
So now, I think I'm supposed to use "post" and cookies....
ck = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
r = requests.post(url, cookies=ck)
content = r.text
q = PyQuery(content)
title = q("title").text()
print title
I have a feeling that I'm doing the cookies thing wrong...I don't know.
If it doesn't log in correctly, the title of the home page should come out to "Locationary.com" and if it does, it should be "Home Page."
If you could maybe explain a few things about requests and cookies to me and help me out with this, I would greatly appreciate it. :D
Thanks.
...It still didn't really work yet. Okay...so this is what the home page HTML says before you log in:
</td><td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_email.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="text" name="inUserName" id="inUserName" size="25"></td>
<td><img src="http://www.locationary.com/img/LocationaryImgs/icons/txt_password.gif"> </td>
<td><input class="Data_Entry_Field_Login" type="password" name="inUserPass" id="inUserPass"></td>
So I think I'm doing it right, but the output is still "Locationary.com"
2nd EDIT:
I want to be able to stay logged in for a long time and whenever I request a page under that domain, I want the content to show up as if I were logged in.

I know you've found another solution, but for those like me who find this question, looking for the same thing, it can be achieved with requests as follows:
Firstly, as Marcus did, check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields. In his example, they are inUserName and inUserPass.
Once you've got that, you can use a requests.Session() instance to make a post request to the login url with your login details as a payload. Making requests from a session instance is essentially the same as using requests normally, it simply adds persistence, allowing you to store and use cookies etc.
Assuming your login attempt was successful, you can simply use the session instance to make further requests to the site. The cookie that identifies you will be used to authorise the requests.
Example
import requests
# Fill in your details here to be posted to the login form.
payload = {
'inUserName': 'username',
'inUserPass': 'password'
}
# Use 'with' to ensure the session context is closed after use.
with requests.Session() as s:
p = s.post('LOGIN_URL', data=payload)
# print the html returned or something more intelligent to see if it's a successful login page.
print p.text
# An authorised request.
r = s.get('A protected web page url')
print r.text
# etc...

If the information you want is on the page you are directed to immediately after login...
Lets call your ck variable payload instead, like in the python-requests docs:
payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)
Otherwise...
See https://stackoverflow.com/a/17633072/111362 below.

Let me try to make it simple, suppose URL of the site is http://example.com/ and let's suppose you need to sign up by filling username and password, so we go to the login page say http://example.com/login.php now and view it's source code and search for the action URL it will be in form tag something like
<form name="loginform" method="post" action="userinfo.php">
now take userinfo.php to make absolute URL which will be 'http://example.com/userinfo.php', now run a simple python script
import requests
url = 'http://example.com/userinfo.php'
values = {'username': 'user',
'password': 'pass'}
r = requests.post(url, data=values)
print r.content
I Hope that this helps someone somewhere someday.

The requests.Session() solution assisted with logging into a form with CSRF Protection (as used in Flask-WTF forms). Check if a csrf_token is required as a hidden field and add it to the payload with the username and password:
import requests
from bs4 import BeautifulSoup
payload = {
'email': 'email#example.com',
'password': 'passw0rd'
}
with requests.Session() as sess:
res = sess.get(server_name + '/signin')
signin = BeautifulSoup(res._content, 'html.parser')
payload['csrf_token'] = signin.find('input', id='csrf_token')['value']
res = sess.post(server_name + '/auth/login', data=payload)

Find out the name of the inputs used on the websites form for usernames <...name=username.../> and passwords <...name=password../> and replace them in the script below. Also replace the URL to point at the desired site to log into.
login.py
#!/usr/bin/env python
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
payload = { 'username': 'user#email.com', 'password': 'blahblahsecretpassw0rd' }
url = 'https://website.com/login.html'
requests.post(url, data=payload, verify=False)
The use of disable_warnings(InsecureRequestWarning) will silence any output from the script when trying to log into sites with unverified SSL certificates.
Extra:
To run this script from the command line on a UNIX based system place it in a directory, i.e. home/scripts and add this directory to your path in ~/.bash_profile or a similar file used by the terminal.
# Custom scripts
export CUSTOM_SCRIPTS=home/scripts
export PATH=$CUSTOM_SCRIPTS:$PATH
Then create a link to this python script inside home/scripts/login.py
ln -s ~/home/scripts/login.py ~/home/scripts/login
Close your terminal, start a new one, run login

Some pages may require more than login/pass. There may even be hidden fields. The most reliable way is to use inspect tool and look at the network tab while logging in, to see what data is being passed on.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

scraping data from webpage with python 3, need to log in first - python

Related

how to use python requests to login to website

Logging in to site with Python

Trying to access a password protected url using python

Logging with python into a subpage (from another post)

How to "log in" to a website using Python's Requests module?

Categories

Resources