I'm trying to automate the process of creating an account for something, lets call it X, but I cant figure out what to do.
I saw this code somewhere,
import urllib
import urllib2
import webbrowser
data = urllib.urlencode({'q': 'Python'})
url = 'http://duckduckgo.com/html/'
full_url = url + '?' + data
response = urllib2.urlopen(full_url)
with open("results.html", "w") as f:
f.write(response.read())
webbrowser.open("results.html")
But I cant figure out how to modify it for my use.
I would highly recommend utilizing Selenium+Webdriver for this, since your question appears UI and browser-based. You can install Selenium via 'pip install selenium' in most cases. Here are a couple of good references to get started.
- http://selenium-python.readthedocs.io/
- https://pypi.python.org/pypi/selenium
Also, if this process needs to drive the browser headlessly, look into including PhantomJS (via GhostDriver), which can be downloaded from the phantomjs.org website.
Related
I want to scrape fanpage comment content so use this python package,but got something wrong
Here is my code
I think maybe proxy problem?so I take a free proxy,but it doesn't work
import requests
from http import cookiejar
import facebook_scraper as fb
file ='facebook.com_cookies.txt'
cookie = cookiejar.MozillaCookieJar()
cookie.load(file)
cookies =requests.utils.dict_from_cookiejar(cookie)
print(cookies)
fb.set_proxy('http://133.18.173.186:8080')
fb.set_cookies(cookies)
#post_url = ['https://www.facebook.com/animestory.animehk/photos/pcb.1035748480436588/1035747450436691/']
#for post in fb.get_posts('nintendo',options={'comments':True}):
# print(post['text'][:50])
if I don't use facebook-scraper,how can i do?
I used to try selenium,but tag is so hard to catch.
Good morning, everyone,
I want to create a script which automatically update an issue on RedMine when someone make a pull-request on our GitHub based on the pull-request comment.
I wrote a script in Python using selenium and redmine REST API that retrieves the comment of a pull-request on GitHub made by its requester, but I have to execute it manually.
Do you know if it is possible to execute a python script automatically just after a pull request?
(Currently the script is stored on my computer, but ideally it will be stored on an external server so that I and my partners can use it more easily)
I have searched some solutions based on WebHooks or CRON, but nothing seems to answer my problem.
I am using Python 2.7
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import test
# Xpath to retrieve number of the fix
DISCONNECTED_XPATH = "//div[4]/div/main/div[2]/div[1]/div/div[2]/div[3]/div[2]/div[1]/div[1]/div[2]/div/div[2]/task-lists/table/tbody/tr/td/p"
CONNECTED_XPATH = "//div[4]/div/main/div[2]/div[1]/div/div[1]/div[3]/div[2]/div[1]/div[1]/div[2]/div/div[2]/task-lists/table/tbody/tr/td/p"
PULL_URL = "https://github.com/MaxTeiger/TestCoopengo/pull/1"
# Init
print("Opening the browser...")
driver = webdriver.Firefox()
# Go to the specified pull
print("Reaching " + PULL_URL)
driver.get(PULL_URL)
assert "GitHub" in driver.title
print("Finding the pull comment...")
# retrieve the fix id
elem = driver.find_element_by_xpath(DISCONNECTED_XPATH)
issueID = elem.text
print("Closing driver")
driver.close()
issueID = int(issueID.split('#')[1])
print("Issue ID : " +str(issueID))
print("Updating ticket on RedMine...")
test.updateIssueOnRedMineFromGit(issueID, PULL_URL)
Thank you if you can help me or if you have a better solution to my problem
I finally found an answer to my problem and it turns out that the webhooks proposed by GitHub answer my problem (Repo > Settings > Webhooks).
Now, I just need to set up a server that calls my script when I make an HTML Post request, but I don't know how to retrieve the URL of the wanted pull-request.
Having an issue when trying to run my script any suggestions. When I run my script I receive an stating " UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 13 of the file /home/maddawg/Scripts/lucky.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
soup = bs4.BeautifulSoup(res.text)" So when I try to run it from the command line it will not work any advise?
! /usr/bin/python3
lucky.py - Opens several google search results.
import requests
import sys
import webbrowser
import bs4
print('Googling...') # display text while downloading the Google page
res = requests.get('http://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
LinkElems = soup.select('.r a')
numOpen = min(5, len(LinkElems))
for i in range(numOpen):
webbrowser.open('http://google.com/' + LinkElems[i].get('href'))
I'm building a Django app and I'm using Spynner for web crawling. I have this problem and I hope someone can help me.
I have this function in the module "crawler.py":
import spynner
def crawling_js(url)
br = spynner.Browser()
br.load(url)
text_page = br.html
br.close (*)
return text_page
(*) I tried with br.close() too
in another module (eg: "import.py") I call the function in this way:
from crawler import crawling_js
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
when I pass the first url in to the function all is correct when I pass the second "url" python crash. Python crash in this line:br.load(url). Someone can help me? Thanks a lot
I have:
Django 1.3
Python 2.7
Spynner 1.1.0
PyQt4 4.9.1
Why you need to instantiate br = spynner.Browser() and close it every time you call crawling_js(). In a loop this will utilize a lot of resources which I think is the reason why it crashes. let's think of it like this, br is a browser instance. Therefore, you can make it browse any number of websites without the need to close it and open it again. Adjust your code this way:
import spynner
br = spynner.Browser() #you open it only once.
def crawling_js(url):
br.load(url)
text_page = br._get_html() #_get_html() to make sure you get the updated html
return text_page
then if you insist to close br later you simply do:
from crawler import crawling_js , br
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
br.close()
Looking for a python script that would simply connect to a web page (maybe some querystring parameters).
I am going to run this script as a batch job in unix.
urllib2 will do what you want and it's pretty simple to use.
import urllib
import urllib2
params = {'param1': 'value1'}
req = urllib2.Request("http://someurl", urllib.urlencode(params))
res = urllib2.urlopen(req)
data = res.read()
It's also nice because it's easy to modify the above code to do all sorts of other things like POST requests, Basic Authentication, etc.
Try this:
aResp = urllib2.urlopen("http://google.com/");
print aResp.read();
If you need your script to actually function as a user of the site (clicking links, etc.) then you're probably looking for the python mechanize library.
Python Mechanize
A simple wget called from a shell script might suffice.
in python 2.7:
import urllib2
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urllib2.urlopen(url+"?"+params).read()
print html
more info at https://docs.python.org/2.7/library/urllib2.html
in python 3.6:
from urllib.request import urlopen
params = "key=val&key2=val2" #make sure that it's in GET request format
url = "http://www.example.com"
html = urlopen(url+"?"+params).read()
print(html)
more info at https://docs.python.org/3.6/library/urllib.request.html
to encode params into GET format:
def myEncode(dictionary):
result = ""
for k in dictionary: #k is the key
result += k+"="+dictionary[k]+"&"
return result[:-1] #all but that last `&`
I'm pretty sure this should work in either python2 or python3...
What are you trying to do? If you're just trying to fetch a web page, cURL is a pre-existing (and very common) tool that does exactly that.
Basic usage is very simple:
curl www.example.com
You might want to simply use httplib from the standard library.
myConnection = httplib.HTTPConnection('http://www.example.com')
you can find the official reference here: http://docs.python.org/library/httplib.html