I was using the wikipedia module in which you can get the information that is present about that topic on wikipedia. When I run the code it is unable to connect because of proxy. When I connected PC to a proxy free network it's working. It also happened while using the Beautiful soup module for scraping. I have tried to set environment variable like http://username:password#proxy_url:port but when the run the code in IDLE it's not working. Please help.
It worked:
import os
os.environ["HTTPS_PROXY"] = "http://user_id:pass#proxy:port"
If you don't want to store your password in code file:
import os
pxuser = "your.corporate.domain\\your_username"
pxpass = input(f"Password for {pxuser}: ")
env_px = f"http://{pxuser}:{pxpass}#your_proxy:port"
os.environ["HTTPS_PROXY"] = env_px
Related
I have been working on a python script intended to help moderate a subreddi using 'warnings'. However, i am unable to get it to work,as when i get to the point in the script where it it supposed to utilise the api, the python window just closes. I have tried using more simple scripts,but non of them work. Here is my simpler file(i used it for testing):
import prawcore
import praw
from os.path import isfile
import praw
import pandas as pd
from time import sleep
# Get credentials from DEFAULT instance in praw.ini
reddit = praw.Reddit()
# begining of script
#login
reddit = praw.Reddit(client_id='this is where i put my user agent',
client_secret='this is where i put my secret',
password='this is wher i put my password',
user_agent='PARS (Python-based Advanced Reddit Reprimand System) by u/veryinterestingnut',
username='PicoModBot')
print(reddit.user.me())
And here is my praw.ini file:
# The URL prefix for regular requests.
reddit_url=https://www.reddit.com
# The URL prefix for short URLs.
short_url=https://redd.it
[DEFAULT]
client_id=my id was here
client_secret=this is where i put my pass
user_agent=PARS (Python-based Advanced Reddit Reprimand System) by u/veryinterestingnut
username=PicoModBot
password=my account pass was here
What can i do to try and fix this? Any help would be much appreciated.
Good morning, everyone,
I want to create a script which automatically update an issue on RedMine when someone make a pull-request on our GitHub based on the pull-request comment.
I wrote a script in Python using selenium and redmine REST API that retrieves the comment of a pull-request on GitHub made by its requester, but I have to execute it manually.
Do you know if it is possible to execute a python script automatically just after a pull request?
(Currently the script is stored on my computer, but ideally it will be stored on an external server so that I and my partners can use it more easily)
I have searched some solutions based on WebHooks or CRON, but nothing seems to answer my problem.
I am using Python 2.7
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import test
# Xpath to retrieve number of the fix
DISCONNECTED_XPATH = "//div[4]/div/main/div[2]/div[1]/div/div[2]/div[3]/div[2]/div[1]/div[1]/div[2]/div/div[2]/task-lists/table/tbody/tr/td/p"
CONNECTED_XPATH = "//div[4]/div/main/div[2]/div[1]/div/div[1]/div[3]/div[2]/div[1]/div[1]/div[2]/div/div[2]/task-lists/table/tbody/tr/td/p"
PULL_URL = "https://github.com/MaxTeiger/TestCoopengo/pull/1"
# Init
print("Opening the browser...")
driver = webdriver.Firefox()
# Go to the specified pull
print("Reaching " + PULL_URL)
driver.get(PULL_URL)
assert "GitHub" in driver.title
print("Finding the pull comment...")
# retrieve the fix id
elem = driver.find_element_by_xpath(DISCONNECTED_XPATH)
issueID = elem.text
print("Closing driver")
driver.close()
issueID = int(issueID.split('#')[1])
print("Issue ID : " +str(issueID))
print("Updating ticket on RedMine...")
test.updateIssueOnRedMineFromGit(issueID, PULL_URL)
Thank you if you can help me or if you have a better solution to my problem
I finally found an answer to my problem and it turns out that the webhooks proposed by GitHub answer my problem (Repo > Settings > Webhooks).
Now, I just need to set up a server that calls my script when I make an HTML Post request, but I don't know how to retrieve the URL of the wanted pull-request.
I'm trying to automate the process of creating an account for something, lets call it X, but I cant figure out what to do.
I saw this code somewhere,
import urllib
import urllib2
import webbrowser
data = urllib.urlencode({'q': 'Python'})
url = 'http://duckduckgo.com/html/'
full_url = url + '?' + data
response = urllib2.urlopen(full_url)
with open("results.html", "w") as f:
f.write(response.read())
webbrowser.open("results.html")
But I cant figure out how to modify it for my use.
I would highly recommend utilizing Selenium+Webdriver for this, since your question appears UI and browser-based. You can install Selenium via 'pip install selenium' in most cases. Here are a couple of good references to get started.
- http://selenium-python.readthedocs.io/
- https://pypi.python.org/pypi/selenium
Also, if this process needs to drive the browser headlessly, look into including PhantomJS (via GhostDriver), which can be downloaded from the phantomjs.org website.
I'm extremely new to coding in general; I delved into this project in order to help my friend tag her fifteen thousand and some-odd posts on Tumblr. We've finally finished, but she wants to be sure that we haven't missed anything...
So, I've scoured the internet, trying to find a coding solution. I came across a script found here, that allegedly does exactly what we need -- so I downloaded Python, and...It doesn't work.
More specifically, when I click on the script, a black box appears for about half a second and then disappears. I haven't been able to screenshot the box to find out exactly what it says, but I believe it says there's a syntax error. At first, I tried with Python 2.4; it didn't seem to find the Json module the creator uses, so I switched to Python 3.3 -- the most recent version for Windows, and this is where the Syntax errors occur.
#!/usr/bin/python
import urllib2
import json
hostname = "(Redacted for Privacy)"
api_key = "(Redacted for Privacy)"
url = "http://api.tumblr.com/v2/blog/" + hostname + "/posts?api_key=" + api_key
def api_response(url):
req = urllib2.urlopen(url)
return json.loads(req.read())
jsonresponse = api_response(url)
post_count = jsonresponse["response"]["total_posts"]
increments = (post_count + 20) / 20
for i in range(0, increments):
jsonresponse = api_response(url + "&offset=" + str((i * 20)))
posts = jsonresponse["response"]["posts"]
for i in range(0, len(posts)):
if not posts[i]["tags"]:
print posts[i]["post_url"]
print("All finished!")
So, uhm, my question is this: If this coding has a syntax error that could be fixed and then used to find the Untagged Posts on Tumblr, what might that error be?
If this code is outdated (either via Tumblr or via Python updates), then might someone with a little free time be willing to help create a new script to find Untagged posts on Tumblr? Searching Tumblr, this seems to be a semi-common problem.
In case it matters, Python is installed in C:\Python33.
Thank you for your assistance.
when I click on the script, a black box appears for about half a second and then
disappears
At the very least, you should be able to run a Python script from the command line e.g., do Exercise 0 from "Learn Python The Hard Way".
"Finding Untagged Posts on Tumblr" blog post contains Python 2 script (look at import urllib2 in the source. urllib2 is renamed to urllib.request in Python 3). It is easy to port the script to Python 3:
#!/usr/bin/env python3
"""Find untagged tumblr posts.
Python 3 port of the script from
http://www.alexwlchan.net/2013/08/untagged-tumblr-posts/
"""
import json
from itertools import count
from urllib.request import urlopen
hostname, api_key = "(Redacted for Privacy)", "(Redacted for Privacy)"
url = "https://api.tumblr.com/v2/blog/{blog}/posts?api_key={key}".format(
blog=hostname, key=api_key)
for offset in count(step=20):
r = json.loads(urlopen(url + "&offset=" + str(offset)).read().decode())
posts = r["response"]["posts"]
if not posts: # no more posts
break
for post in posts:
if not post["tags"]: # no tags
print(post["post_url"])
Here's the same functionality implemented using the official Python Tumblr API v2 Client (Python 2 only library):
#!/usr/bin/env python
from itertools import count
import pytumblr # $ pip install pytumblr
hostname, api_key = "(Redacted for Privacy)", "(Redacted for Privacy)"
client = pytumblr.TumblrRestClient(api_key, host="https://api.tumblr.com")
for offset in count(step=20):
posts = client.posts(hostname, offset=offset)["posts"]
if not posts: # no more posts
break
for post in posts:
if not post["tags"]: # no tags
print(post["post_url"])
Tumblr has an API. You probably would have much better success using it.
https://code.google.com/p/python-tumblr/
I'm building a Django app and I'm using Spynner for web crawling. I have this problem and I hope someone can help me.
I have this function in the module "crawler.py":
import spynner
def crawling_js(url)
br = spynner.Browser()
br.load(url)
text_page = br.html
br.close (*)
return text_page
(*) I tried with br.close() too
in another module (eg: "import.py") I call the function in this way:
from crawler import crawling_js
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
when I pass the first url in to the function all is correct when I pass the second "url" python crash. Python crash in this line:br.load(url). Someone can help me? Thanks a lot
I have:
Django 1.3
Python 2.7
Spynner 1.1.0
PyQt4 4.9.1
Why you need to instantiate br = spynner.Browser() and close it every time you call crawling_js(). In a loop this will utilize a lot of resources which I think is the reason why it crashes. let's think of it like this, br is a browser instance. Therefore, you can make it browse any number of websites without the need to close it and open it again. Adjust your code this way:
import spynner
br = spynner.Browser() #you open it only once.
def crawling_js(url):
br.load(url)
text_page = br._get_html() #_get_html() to make sure you get the updated html
return text_page
then if you insist to close br later you simply do:
from crawler import crawling_js , br
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
br.close()