I'd like to modify a wiki page (Confluence by Atlassian - JIRA editors) programmatically (in python). What I tried so far is to simulate user behaviour:
click on Edit button
change content of a textarea input
submit changes with Save button
Part 1 is ok since I have the URL corresponding to an edit of the page, part 2 (retrieval of the page and modification) is ok too, but I don't know how to achieve step 3... I'm using urllib2.
Thanks for your help !!
EDIT: XML-RPC is indeed the solution, this example does exactly what I want !
# write to a confluence page
import xmlrpclib
CONFLUENCE_URL = "https://intranet.example.com/confluence/rpc/xmlrpc"
CONFLUENCE_LOGIN = "a confluence username here"
CONFLUENCE_PASSWORD = "confluence pwd for username"
# get this from the page url while editing
# e.g. ../editpage.action?pageId=132350005 <-- here
PAGE_ID = "132350005"
client = xmlrpclib.Server(CONFLUENCE_URL, verbose = 0)
auth_token = client.confluence2.login(CONFLUENCE_LOGIN, CONFLUENCE_PASSWORD)
page = client.confluence2.getPage(auth_token, PAGE_ID)
# and write the new contents
page['content'] = "!!!your content here!!!"
result = client.confluence2.storePage(auth_token, page)
client.confluence2.logout(auth_token)
Note that confluence modifies your html code when you do this. It strips out scripts, styles and sometimes title attributes on elements for example. In order to get that stuff back in you then need to use their macro code.
Easiest way to do this is to edit the page in confluence and make it look like you want and then grab the page and do a print page['content'] to see what magical new stuff the atlassian people have decided to do to standard html.
This seems like the absolutely wrong way to go about it.
First off, Confluence has a plugin architecture which should allow you to manage content programmatically from the application itself without any kind of HTTP requests. Secondly, even if you don't want to, or can't, use the plugin API for some reason, the next obvious option is to use the SOAP/XML-RPC API.
There is no reason to actually mess with buttons and textareas unless you're trying to do some kind of end-to-end test that includes testing GUI (e.g. automated cross-browser testing).
Related
If given a website like e.g. http://www.barchart.com/historicaldata.php, is there a way to fill in the text box and then click the submit button to download the data?
I'm use to using urllib to download entire pages, but can seem to figure out how to submit text into the text box and then click the button, from my script.
There are two paths I can think of:
Selenium
It's possible to directly simulate filling in data and clicking the button using a great library called Selenium Webdriver
Using Selenium, you can open up a programmatic browser session and do all manner of things that a user would do. Combined with ghost browser, this can be done behind the scenes in a browser-independent way (useful if this is going to run on a server, which won't have chrome installed).
While an awesome library (fantastic for testing web pages) Selenium requires learning quite a bit. It's required if you specifically want to perform the action of filling out and clicking. But I think there might be an easier way to accomplish what you're trying to do using Python requests.
Requests
Python's requests library is another library for requesting data from pages. You can use it to submit a GET request (what the browser would be doing in the event of just visiting the page) or a POST request (where the browser sends its form data off to after you click submit).
To know which fields you want to send off data to, look at the page's HTML for each form field, and grab the "name" attribute.
If it weren't for the fact that your content seems to be paywalled, you could accomplish this pretty easily. For example, let's say your form has 3 fields to fill in, with name attributes consisting of 'start_date', 'end_date', and 'type'. You could accomplish this with the following:
import requests
url = "http://www.barchart.com/historicaldata.php/"
r = requests.post(url, data = {
'item1': 'one of the form fields',
'color': 'green',
'location': 'Boston, MA',
...
}
)
with open("~~DESIRED FILE LOCATION~~", "wb") as code:
code.write(r.content)
Because of the paywall, you'll have to log in first, and retain that session data. I defer explanation of how to do that to this excellent answer
EDIT:
Possibly one more thing to note regarding where you should be submitting your data to. The url for where you should submit your POST data might be the same as the barchart url that you gave, but it also might not be. To find out, look at the "action" attribute of the HTML form object itself. 9 times out of 10, that's where the data is getting submitted. If the site does something wonky with Javascript, you might have to open up a console and examine where exactly data is getting sent upon submission. But that bridge can be crossed if/when needed.
As all we know in web application we have get method and post data method.
Here my problem appear with post data.
For example i want to make my python code that access for search bar of website by insert same values and submit (the website button), then check for the page.
How the code gonna be then if there any documentation about this python concepts!
I am totally confused
Note : i am just beginner in python.
If the website relies on javascript, you're going to need to use something like Selenium which will emulate a typical browser and allow you to insert information onto a page and execute javascript commands.
If, however, the search bar simply posts data to a URL. You can determine that URL and then use requests to post the data and retrieve the result.
resp = requests.post('http://website/search', data = {'term':'value'})
I'm trying to use Ghost.py to do some web scraping. I'm trying to follow a link but the Ghost doesn't seem to actually evaluate the javascript and follow the link. My problem is that i'm in an HTTPS session and cannot use redirection. I've also looked at other options (like selenium) but I cannot install a browser on the machine that will run the script. I also have some javascript evaluation further so I cannot use mechanize.
Here's what I do...
## Open the website
page,resources = ghost.open('https://my.url.com/')
## Fill textboxes of the form (the form didn't have a name)
result, resources = ghost.set_field_value("input[name=UserName]", "myUser")
result, resources = ghost.set_field_value("input[name=Password]", "myPass")
## Submitting the form
result, resources = ghost.evaluate( "document.getElementsByClassName('loginform')[0].submit();", expect_loading=True)
## Print the link to make sure that's the one I want to follow
#result, resources = ghost.evaluate( "document.links[4].href")
## Click the link
result, resources = ghost.evaluate( "document.links[4].click()")
#print ghost.content
When I look at ghost.content, I'm still on the same page and result is empty. I noticed that when I add expect_loading=True when trying to evaluate the click, I get a timeout error.
When I try the to run the javascript in a Chrome Developper Tools console, I get
event.returnValue is deprecated. Please use the standard
event.preventDefault() instead.
but the page does load up the linked url correctly.
Any ideas are welcome.
Charles
I think you are using the wrong methods for that.
If you want to submit the form there's a special method for that:
page, resources = ghost.fire_on("loginform", "submit", expect_loading=True)
Also there's a special ghost.py method for performing a click:
ghost.click('#some-selector')
Another possibilty, if you just want to open that link could be:
link_url = ghost.evaluate("document.links[4]")[0]
ghost.open(link_url)
You only have to find the right selectors for that.
I don't know on which page you want to perform the task, thus I can't fix your code. But I hope this will help you.
I come from a world of scientific computing and number crunching.
I am trying to interact with the internet to compile data so I don't have to. One task it to auto-fill out searches on Marriott.com so I can see what the best deals are all on my own.
I've attempted something simple like
import urllib
import urllib2
url = "http://marriott.com"
values = {'Location':'New York'}
data = urllib.urlencode(values)
website = urllib2.Request(url, data)
response = urllib2.urlopen(website)
stuff = response.read()
f = open('test.html','w')
f.write(stuff)
My questions are the following:
How do you know how the website receives information?
How do I know a simple "Post" will work?
If it is simple, how do I know what the names of the dictionary should be for "Values?"
How to check if it's working? The write lines at the end are an attempt for me to see if my inputs are working properly but that is insufficient.
You may also have a look at splinter, where urllib may not be useful (JS, AJAX, etc.)
For finding out the form parameters firebug could be useful.
You need to read and analyze the HTML code of the related side. Every browser has decent tools for introspecting the DOM of a site, analyzing the network traffic and requests.
Usually you want to use the mechanize module for performing automatized interactions with a web site. There is no guarantee given that this will work in every case. Nowadays many websites use AJAX or more complex client-side programming making it hard to "emulate" a human user using Python.
Apart from that: the mariott.com site does not contain an input field "Location"...so you are guessing URL parameters with having analyzed their forms and functionality?
What i do to check is use a Web-debugging proxy to view the request you send
first send a real request with your browser and compare that request to the request that your script sends. try to make the two requests match
What I use for this is Charles Proxy
Another way is view the html file you saved (in this case test.html) and view it in your browser and compare this to the actual request reponse
To findout what the dictionary should have in it is look at the page source of the page and find out the names of the forms your trying to fill. in you're case the "location"should actually be "destinationAddress.destination"
Here is a picture:
So look in the HTML code to get the names of the forms and that is what should be in the dictionary. i know that Google Chrome and Mozilla Firefox both have tools to view the structure of the html (in the Picture i used inspect element in Google Chrome)
for more info on urllib2 read here
I really hope this helps :)
I have an app that uses htmlWindow and would like to migrate it to the new webview found in wx 2.9. However, I have learned there is no built-in method available to pass a JavaScript variable from the webpage back to the Python code. There is a RunScript method that allows one to send JavaScript to the page, but no method to retrieve an element id on a user click or any other user interaction.
My question is, is there any workaround to this? Is there any way to intercept, say, an alert call or anything else and get the data? The webview display is of not much value if one cannot receive data from user interaction.
As far as I'm aware the only way to get a return value from RunScript() is to use the page title hack.
e.g. somewhere in RunScript you set document.title to the value you wish to retrieve and get it into python with GetCurrentTitle(), if you wish you can reset the title after you have retrieved the data.
So if self.html is the webview
self.html.RunScript("""
//javascript goes here
// variable we wish to retrieve is called return_value
document.title = return_value
""")
r = self.html.GetCurrentTitle()
If you want to initiate it from within the webview it can be done (as suggested in the link Robin posted) by overriding the wxEVT_COMMAND_WEB_VIEW_NAVIGATING so that when it receives a custom url scheme, e.g. retrievedata://data/.... it retrieves the data from the url and does whatever you want with it, making sure you call evt.Veto() at some point. You can then pass any data you wish by simply calling a suitable url from within javascript.
Totally untested code below (just to give you an idea of how it can be done)
def OnPageNavigation(self, evt):
url = evt.GetUrl()
if url.startswith("retrievedata://data/"):
data = url[len("retrievedata://data/"):]
evt.Veto()
// Do whatever you want with the data
Alternatively you could use a combination of the two ideas and create a single url that when accessed calls GetPageTitle() and just make sure you set document.title before calling the page.
There was recently some discussion on the wx-users mail list and a suggestion for a workaround for things like this. I haven't tried it myself, but you may find it useful. See https://groups.google.com/d/topic/wx-users/Pdzl7AYPI4I/discussion