python logging into a forum - python

I've written this to try and log onto a forum (phpBB3).
import urllib2, re
import urllib, re
logindata = urllib.urlencode({'username': 'x', 'password': 'y'})
page = urllib.urlopen("http://www.woarl.com/board/ucp.php?mode=login"[logindata])
output = page.read()
However when I run it it comes up with;
Traceback (most recent call last):
File "C:/Users/Mike/Documents/python/test urllib2", line 4, in <module>
page = urllib.urlopen("http://www.woarl.com/board/ucp.php?mode=login"[logindata])
TypeError: string indices must be integers
any ideas as to how to solve this?
edit
adding a comma between the string and the data gives this error instead
Traceback (most recent call last):
File "C:/Users/Mike/Documents/python/test urllib2", line 4, in <module>
page = urllib.urlopen("http://www.woarl.com/board/ucp.php?mode=login",[logindata])
File "C:\Python25\lib\urllib.py", line 84, in urlopen
return opener.open(url, data)
File "C:\Python25\lib\urllib.py", line 192, in open
return getattr(self, name)(url, data)
File "C:\Python25\lib\urllib.py", line 327, in open_http
h.send(data)
File "C:\Python25\lib\httplib.py", line 711, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
TypeError: sendall() argument 1 must be string or read-only buffer, not list
edit2
I've changed the code from what it was to;
import urllib2, re
import urllib, re
logindata = urllib.urlencode({'username': 'x', 'password': 'y'})
page = urllib2.urlopen("http://www.woarl.com/board/ucp.php?mode=login", logindata)
output = page.read()
This doesn't throw any error messages, it just gives 3 blank lines. Is this because I'm trying to read from the log in page which disappears after logging in. If so how do I get it to display the index which is what should appear after hitting log in.

Your line
page = urllib.urlopen("http://www.woarl.com/board/ucp.php?mode=login"[logindata])
is semantically invalid Python. Presumably you meant
page = urllib.urlopen("http://www.woarl.com/board/ucp.php?mode=login", [logindata])
which has a comma separating the arguments. However, what you ACTUALLY want is simply
page = urllib2.urlopen("http://www.woarl.com/board/ucp.php?mode=login", logindata)
without trying to enclose logindata into a list and using the more up-to-date version of urlopen is the urllib2 library.

How about using a comma between the string,"http:..." and the urlencoded data, [logindata]?

Your URL string shouldn't be
"http://www.woarl.com/board/ucp.php?mode=login"[logindata]
But
"http://www.woarl.com/board/ucp.php?mode=login", logindata
I think, because [] is for array and it require an integer. I might be wrong cause I haven't done a lot of Python.

If you do a type on logindata, you can see that it is a string:
>>> import urllib
>>> logindata = urllib.urlencode({'username': 'x', 'password': 'y'})
>>> type(logindata)
<type 'str'>
Putting it in brackets ([]) puts it in a list context, which isn't what you want.

This would be easier with the high-level "mechanize" module.

Related

Regex errors utilizing Tweepy in python

I am having a problem with the bit of code shown below. My original code worked when I was just puling the tweet information. Once I edited it to extract the URL within the text it started to give me problems. Nothing is printing and I am receiving these errors.
Traceback (most recent call last):
File "C:\Users\Evan\PycharmProjects\DiscordBot1\main.py", line 22, in <module>
get_tweets(api, "cnn")
File "C:\Users\Evan\PycharmProjects\DiscordBot1\main.py", line 18, in get_tweets
url2 = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',
text)
File "C:\Users\Evan\AppData\Local\Programs\Python\Python39\lib\re.py", line 241, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
I am receiving no errors before I run it, so I am extremely confused about why this is not working. It will probably be something simple as I am new to using both Tweepy and Regex.
import tweepy
import re
TWITTER_APP_SECRET = 'hidden'
TWITTER_APP_KEY = 'hidden'
auth = tweepy.OAuthHandler(TWITTER_APP_KEY, TWITTER_APP_SECRET)
api = tweepy.API(auth)
def get_tweets(api, username):
page = 1
while True:
tweets = api.user_timeline(username, page=page)
for tweet in tweets:
text = tweet.text.encode("utf-8")
url2 = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-
F]))+', text)
print(url2)
get_tweets(api, "cnn")
Errors again:
Traceback (most recent call last):
File "C:\Users\Evan\PycharmProjects\DiscordBot1\main.py", line 22, in <module>
get_tweets(api, "cnn")
File "C:\Users\Evan\PycharmProjects\DiscordBot1\main.py", line 18, in get_tweets
url2 = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_#.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+',
text)
File "C:\Users\Evan\AppData\Local\Programs\Python\Python39\lib\re.py", line 241, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
Process finished with exit code 1
Tell me if you need more information to help me, any help is appreciated, thanks in advance.
You're getting that error because you are using a string pattern (your regex) against a string which you've turned into a bytes object via encode().
Try running your pattern directly against tweet.text without encoding it.

Troubles merging two json urls

First of all, I am getting this error. When I try running
pip3 install --upgrade json
in an attempt to resolve the error, python is unable to find the module.
The segment of code I am working with can be found below the error, but some further direction as for the code itself would be appreciated.
Error:
Traceback (most recent call last):
File "Chicago_cp.py", line 18, in <module>
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
Script:
#!/usr/bin/python
import json
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
# Define Merge
def Merge(dict1, dict2):
res = {**dict1, **dict2}
return res
# Open the URL and the screen name
StopWork__url = "someJsonUrl"
Violation_url = "anotherJsonUrl"
StopWork_response = http.request('GET', StopWork__url)
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
Violation_response = http.request('GET', Violation_url)
Violation_data = json.load(BeautifulSoup(Violation_response.data,'lxml'))
dict3 = Merge(StopWork_data,Violation_data)
print (dict1)
json.load expects a file object or something else with a read method. The BeautifulSoup object doesn't have a method read. You can ask it for any attribute and it will try to find a child tag with that name, i.e. a <read> tag in this case. When it doesn't find one it returns None which causes the error. Here's a demo:
import json
from bs4 import BeautifulSoup
soup = BeautifulSoup("<p>hi</p>", "html5lib")
assert soup.read is None
assert soup.blablabla is None
assert json.loads is not None
json.load(soup)
Output:
Traceback (most recent call last):
File "main.py", line 8, in <module>
json.load(soup)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
If the URL is returning JSON then you don't need BeautifulSoup at all because that's for parsing HTML and XML. Just use json.loads(response.data).

Python URLencoding Specific Results

I'm querying from a database in Python 3.8.2
I need the urlencoded results to be:
data = {"where":{"date":"03/30/20"}}
needed_results = ?where=%7B%22date%22%3A%20%2203%2F30%2F20%22%7D
I've tried the following
import urllib.parse
data = {"where":{"date":"03/30/20"}}
print(urllib.parse.quote_plus(data))
When I do that I get the following
Traceback (most recent call last):
File "C:\Users\Johnathan\Desktop\Python Snippets\test_func.py", line 17, in <module>
print(urllib.parse.quote_plus(data))
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 855, in quote_plus
string = quote(string, safe + space, encoding, errors)
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 839, in quote
return quote_from_bytes(string, safe)
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 864, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
I've tried a couple of other methods and received:?where=%7B%27date%27%3A+%2703%2F30%2F20%27%7D
Long Story Short, I need to url encode the following
data = {"where":{"date":"03/30/20"}}
needed_encoded_data = ?where=%7B%22date%22%3A%20%2203%2F30%2F20%22%7D
Thanks
where is a dictionary - that can't be url-encoded. You need to turn that into a string or bytes object first.
You can do that with json.dumps
import json
import urllib.parse
data = {"where":{"date":"03/30/20"}}
print(urllib.parse.quote_plus(json.dumps(data)))
Output:
%7B%22where%22%3A+%7B%22date%22%3A+%2203%2F30%2F20%22%7D%7D

File Create/Write Issue In Python

I'm trying to create and write to a file. I have the following code:
from urllib2 import urlopen
def crawler(seed_url):
to_crawl = [seed_url]
crawled=[]
while to_crawl:
page = to_crawl.pop()
page_source = urlopen(page)
s = page_source.read()
with open(str(page)+".txt","a+") as f:
f.write(s)
f.close()
return crawled
if __name__ == "__main__":
crawler('http://www.yelp.com/')
However, it returns the error:
Traceback (most recent call last):
File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 29, in <module>
crawler('http://www.yelp.com/')
File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 14, in crawler
with open("./"+str(page)+".txt","a+") as f:
IOError: [Errno 2] No such file or directory: 'http://www.yelp.com/.txt'
I thought that open(file,"a+") is supposed to create and write. What am I doing wrong?
If you want to use the URL as the basis for the directory, you should encode the URL. That way, slashes (among other characters) will be converted to character sequences which won't interfere with the file system/shell.
The urllib library can help with this.
So, for example:
>>> import urllib
>>> urllib.quote_plus('http://www.yelp.com/')
'http%3A%2F%2Fwww.yelp.com%2F'

ValueError: unknown url type

The title pretty much says it all. Here's my code:
from urllib2 import urlopen as getpage
print = getpage("www.radioreference.com/apps/audio/?ctid=5586")
and here's the traceback error I get:
Traceback (most recent call last):
File "C:/Users/**/Dropbox/Dev/ComServ/citetest.py", line 2, in <module>
contents = getpage("www.radioreference.com/apps/audio/?ctid=5586")
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 366, in open
protocol = req.get_type()
File "C:\Python25\lib\urllib2.py", line 241, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.radioreference.com/apps/audio/?ctid=5586
My best guess is that urllib can't retrieve data from untidy php URLs. if this is the case, is there a work around? If not, what am I doing wrong?
You should first try to add 'http://' in front of the url. Also, do not store the results in print, as it is binding the reference to another (non callable) object.
So this line should be:
page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586")
This returns a file like object. To read its contents you need to use different file manipulation methods, like this:
for line in page_contents.readlines():
print line
You need to pass a full URL: ie it must begin with http://.
Simply use http://www.radioreference.com/apps/audio/?ctid=5586 and it'll work fine.
In [24]: from urllib2 import urlopen as getpage
In [26]: print getpage("http://www.radioreference.com/apps/audio/?ctid=5586")
<addinfourl at 173987116 whose fp = <socket._fileobject object at 0xa5eb6ac>>

Categories

Resources