I am trying to create pages on my wordpress site automatically with a Python script. With mechanize, I log in and fill the corresponding form on my dashboard but can't seem to publish. When submitting the form, I get an error page that I can't access manually. Its url is http://example.com/wordpress/wp-admin/post.php?post=443&action=edit&message=10. However, the page seems to have been created since it is in the drafts with the correct title, the correct content and Wordpress tells me that it is currently being modified by my bot's account. Here is the code I am using to fill and submit my form:
br = mechanize.Browser()
br.open("http://example.com/wordpress/wp-admin/post-new.php?post_type=page")
br.select_form(name = "post")
br.form.set_value(title, name = "post_title")
br.form.set_value(content, name = "content")
br.submit()
Thank you for your help,
Ryunos
Related
I want to log in via a Python script to a website and do some operations there. The process will be:
Login to the website.
Press a specific button to get forwarded to the new page.
Get specific data from the forwarded page and do operations like putting values in fields and press the save button.
My problem is, I can't get access to the website.
The error message in PyCharm(IDE):
<div class="content-container"><fieldset>
<h2>401 - Unauthorized: Access denied due to invalid credentials.</h2>
<h3>The credentials provided do not authorize you to view this directory or page.</h3>
</fieldset></div>
I linked an image with the wanted website login form:
Login window.
I am unsure if I need a http/s request or if it's done with JavaScript because I have no knowledge about both.
I can reach some kind of success with this:
Result on main page.
But it's just giving me like 10% of the information I need. Since it's also hard to visualize, I can't really tell if it is the site I expected.
I have used the requests module for this:
user_name = file[0]
password = file[1]
login_url = r"https://.../..."
response = requests.get(login_url,
auth=HTTPBasicAuth(user_name, password))
print(response.text)
What I have used:
PyCharm IDE
Python module Requests
The wanted website
I also tried to get it to work with the mechanize module. But I could not even login into the website at all.
I'm trying to write some code that downloads content from Moodle website.
the first thing was trying and logging in, but from what I've tried so far, it seems as if I'm not actually being redirected to the page after log in (with the courses data etc...). here's that I've tried
user = 'my_username'
pas = 'my_password'
payload = {'username':user, 'password':pas}
login_site = "https://moodle2.cs.huji.ac.il/nu20/login/index.php?" # actual login webpage
data_site = "https://moodle2.cs.huji.ac.il/nu20" # should be the inner webpage with the courses etc...
with requests.Session() as session:
post = session.post(login_site, data=payload)
r = session.get(data_site)
content = r.text
print(content) # doesn't actually contain the HTML of the main courses page (seems to me its the login page)
any idea why might that happen? would appreciate your help ;)
It is difficult to help without knowing more about the specific site you are trying to log into.
One thing that's worth a try is changing
session.post(login_site, data=payload)
to
session.post(login_site, json=payload)
When the data parameter is used, the content-type header is not set to "application/json". Some sites will reject the POST based on this.
I've also run into sites which have protections against logins from scripts. They may require an additional token to be sent in the POST.
If all else fails, you could consider using selenium. Selenium allows you to control a browser instance programmatically You can simply load the page and send text input to the username and password fields on the login page. This would also get you access to any content which is rendered client side via javascript. However, this may be overkill depending on your use case.
I am trying to create a program that will allow workers at a company to automatically add information to a digital noticeboard which is connected to a Raspberry Pi. They'll submit information on an online form and then a python-pptx enabled program will turn it into nicely designed PowerPoint slides.
I've managed to get a script that can enter the login information for my Microsoft forms account and print the session using:
import requests
print('starting')
#This URL will be the URL that your login form points to with the "action" tag.
POST_LOGIN_URL = #insert URL for microsoft forms login page with username
#This URL is the page you actually want to pull down with requests.
REQUEST_URL = #insert URL you want in the microsoft forms page (responses)
payload = {
'passwd’: ‘mypassowrd'
#insert your password ('passwd' is the microsoft forms variable name)
}
with requests.Session() as session:
post = session.post(POST_LOGIN_URL, data=payload)
r = session.get(REQUEST_URL)
print(type(r))
print((r.text))
The types of r and of r.text are:
print(type(r))
<class 'requests.models.Response'>
print(r.text)
<class 'str'>
Where REQUEST_URL is the url of the results for the form (page looks like: Microsoft Forms results page). I then want to be able to automatically scrape the information of all the results. This is displayed on a page like this: results printed on microsoft forms page.
My issue is then extracting the information from that url. When I print r.text, I get information from the page, but it more seems to be HTTP formatting and hashing (I can include the output of print(r.text), but is several pages long and more confusing than anything).
I'm trying to find a way to reliably copy specific data from the microsoft forms webpage, but currently don't know a function that is able to do that. Does anyone have any experience with the python requests library? Or has anyone tried anything like this before?
Thanks,
Luke
I'm trying to use requests to query a website for response information from a form post, but I get back different html content when using the site manually (filling out form and clicking submit button) on submitting a form than I get in my response.text object when using requests to post to the site.
When posting the form manually, the site redirects back to the form page, with new text (some new <h#> and <ul> objects) showing below the form. However, with requests.post, my response.text object just gives me the content of the page as if I were doing a requests.get, suggesting to me the redirect when using the site manually is different from the redirect I get from requests.
Any idea how I can get my response.text to match up with what I see using the site manually? Or maybe the response object isn't even what I should use to get that text? My thoughts are maybe the website manually redirects to the same form page as a POST, and requests forces redirect as a GET, and I need to override this feature somehow?
Here's code I'm using:
import requests
get_resp = requests.get(url="https://example-site.com")
# It's a Django site so I need to get the csrftoken
csrf_token = get_resp.cookies['csrftoken']
post_resp = requests.post(
url="https://example-site.com",
data={"key1": "value1",
"key2": "value2"},
headers={"X-CSRFToken": csrf_token},
)
print(post_resp.text)
Thank you for the help!
I'm using python to scrape my school's webpage, but in order to do that I needed to simulate a user login first. here is my code:
import requests, lxml.html
s = requests.session()
url = "https://my.emich.edu"
login = s.get(url)
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[#type="hidden"]')
form = {x.attrib["name"]:x.attrib["value"] for x in hidden_inputs}
form["username"] = "myusernamge"
form["password"] = "mypassword"
form["submit"] = "LOGIN"
response = s.post("https://netid.emich.edu/cas/loginservice=https%3A%2F%2Fmy.emich.edu%2Fc%2Fportal%2Flogin",form)
response = s.get("http://my.emich.edu")
f = open("result.html","w")
f.write(response.text)
print response.text
i am expecting that response.text will give me my own student account page instead of that it gives me a log in requirement page. Can any one help me with this issue?
BTW this is not a homework
There are a few options here, and I think your requests approach can be made much easier by logging in manually and copying over the headers.
Use a python scripting package like http://wwwsearch.sourceforge.net/mechanize/ to scrape the site.
Use a browser-emulater such as http://casperjs.org/. Using this you can basically do anything you'd be able to do in a browser.
My suggestion here would be to go to the website, log in, and then open the developer console and copy those headers/cookies into your requests headers/cookies. This way you can just hardcode the 'already-authenticated request' and it will work fine. Note that this method is the least reliable for doing robust, everyday scraping, but if you're looking for something that will be the quickest to implement and will work until the authentication runs out, use this method.
Also, you need the request the logged-in homepage (again) after you successfully do the post.