bdecode Library in Python doesn't work - python

I'm trying to decode a bencode format using the bdecode library in python. I have imported the bcode library as well in my python folder. When i try to use the function bdecode which is defined in the library. I get an error
File "C:\Python27\fit.py", line 21, in <module>
decoded = bdecode(data)
NameError: name 'bdecode' is not defined
Any idea why this error is happening, I'm just new to python? If this is because of the bcode library , could anyone submit a link to some other bcode library?
This is the code I'm trying
import bcode, urllib, urlparse, string
url = "http://update.utorrent.com/installoffer.php?"
url = url + "offer=conduit"
filename = "out_py.txt"
urllib.urlretrieve(url,filename)
with open ("out_py.txt", "r") as myfile:
data=myfile.readlines()
decoded = bdecode(data)

You can solve this one of two ways, change your import statement:
from bcode import bdecode
import urllib, urlparse, string
Or change the line where you call the function:
decoded = bcode.bdecode(data)
The issue is that while you were importing the bcode module, you were not importing any of the symbols within it in to the local namespace.

Related

Decode HTML Entity on Python

I have a file that contain some lines like this:
StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4
Respect to this lines, i have some files on disk, but saved on decoded form:
StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4
I need get file name from first file list and correct file name from second file and change file name to second name. For this goal, i need decode html entity from file name, so i do somthing like this:
import os
from html.parser import HTMLParser
fpListDwn = open('listDwn', 'r')
for lineNumberOnList, fileName in enumerate(fpListDwn):
print(HTMLParser().unescape(fileName))
but this action doesn't have any effect on run, some run's result is:
meysampg#freedom:~/Downloads/Practical Machine Learning$ python3 changeName.py
StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4
StatsLearning_Lect1_2b_111213_v2_%5BLvaTokhYnDw%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4a_110613_%5BWjyuiK5taS8%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4b_110613_%5BUvxHOkYQl8g%5D_%5Btag22%5D.mp4
StatsLearning_Lect3_4c_110613_%5BVusKAosxxyk%5D_%5Btag22%5D.mp4
How i can fix this?
I guess you should use urllib.parse instead of html.parser
>>> f="StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4"
>>> import urllib.parse as parse
>>> f
'StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4'
>>> parse.unquote(f)
'StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4'
So your script should look like:
import os
import urllib.parse as parse
fpListDwn = open('listDwn', 'r')
for lineNumberOnList, fileName in enumerate(fpListDwn):
print(parse.unquote(fileName))
This is actually "percent encoding", not HTML encoding, see this question:
How to percent-encode URL parameters in Python?
Basically you want to use urllib.parse.unquote instead:
from urllib.parse import unquote
unquote('StatsLearning_Lect1_2a_111213_v2_%5B2wLfFB_6SKI%5D_%5Btag22%5D.mp4')
Out[192]: 'StatsLearning_Lect1_2a_111213_v2_[2wLfFB_6SKI]_[tag22].mp4'

Reading this type of Json with Python 3 Urllib

My json url has this:
{years=["2014","2015","2016"]}
How can I get this strings from URL with Python 3? I know this method but Python 3 has no urllib2 module.
import urllib2
import json
response = urllib2.urlopen('http://127.0.0.1/years.php')
data = json.load(response)
print (data)
ImportError: No module named 'urllib2'
Try changing the import to urllib, and use urllib.request instead. For the reason being, please refer to this SO Answer
import urllib
import json
response = urllib.request.urlopen('http://127.0.0.1/years.php')
data = json.load(response)
print (data)

Python : Function to pull a sound clip from URL and save it in local machine

Would like to create a function that pulls a sound from given url and saves it in my machine locally
use urllib module
import urllib
urllib.urlretrieve(url,sound_clip_name)
the file will be save as what you provide the name
alternative, using urllib2
import urllib2
file = urllib2.urlopen(url).read()
f = open('sound_clip','w')
f.write(file)
f.close()
don't forget to give the extension of your file
If in Python 2.7, urllib2 module is your friend, or urllib.request in Python3.
Example in 2.7 :
import urllib2
f = urllib2.urlopen('http://www.python.org/')
with open(filename, w) as fd:
fd.write(f.read)

Working with a pdf from the web directly in Python?

I'm trying to use Python to read .pdf files from the web directly rather than save them all to my computer. All I need is the text from the .pdf and I'm going to be reading a lot (~60k) of them, so I'd prefer to not actually have to save them all.
I know how to save a .pdf from the internet using urllib and open it with PyPDF2. (example)
I want to skip the saving-to-file step.
import urllib, PyPDF2
urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
wFile = urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
lFile = PyPDF2.pdf.PdfFileReader(wFile.read())
I get an error that is fairly easy to understand:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
fil = PyPDF2.pdf.PdfFileReader(wFile.read())
File "C:\Python27\lib\PyPDF2\pdf.py", line 797, in __init__
self.read(stream)
File "C:\Python27\lib\PyPDF2\pdf.py", line 1245, in read
stream.seek(-1, 2)
AttributeError: 'str' object has no attribute 'seek'
Obviously PyPDF2 doesn't like that I'm giving it the urllib.urlopen().read() (which appears to return a string). I know that this string is not the "text" of the .pdf but a string representation of the file. How can I resolve this?
EDIT: NorthCat's solution resolved my error, but when I try to actually extract the text, I get this:
>>> print lFile.getPage(0).extractText()
ˇˆ˘˘˙˘˘˝˘˛˘ˇ˘ˇ˚ˇˇˇ˘ˆ˘˘˘˚ˇˆ˘ˆ˘ˇ˜ˇ˝˚˘˛˘ˇ ˘˘˘ˇ˛˘˚˚ˆˇˇ!
˝˘˚ˇ˘˘˚"˘˘ˇ˘˚ˇ˘˘˚ˇ˘˘˘˙˘˘˘#˘˘˘ˆ˘˛˘˚˛˙ ˘˘˚˚˘˛˙#˘ˇ˘ˇˆ˘˘˛˛˘˘!˘˘˛˘˝˘˘˘˚ ˛˘˘ˇ˘ˇ˛$%&˘ˇ'ˆ˛
$%&˘ˇˇ˘˚ˆ˚˘˘˘˘ ˘ˆ(ˇˇ˘˘˘˘ˇ˘˚˘˘#˘˘˘ˇ˛!ˇ)˘˘˚˘˘˛ ˚˚˘ˇ˘˝˘˚'˘˘ˇˇ ˘˘ˇ˘˛˙˛˛˘˘˚ˇ˘˘ˆ˘˘ˆ˙
$˘˘˘*˘˘˘ˇˆ˘˘ˇˆ˛ˇ˘˝˚˚˘˘ˇ˘ˆ˘"˘ˆ˘ˇˇ˘˛ ˛˛˘˛˘˘˘˘˘˘˛˘˘˚˚˘$ˇ˘ˇˆ˙˘˝˘ˇ˘˘˘ˇˇˆˇ˘ ˘˛ˇ˝˘˚˚#˘˛˘˚˘˘
˘ˇ˘˚˛˛˘ˆ˛ˇˇˇ ˚˘˘˚˘˘ˇ˛˘˙˘˝˘ˇ˘ˆ˘˛˙˘˝˘ˇ˘˘˝˘"˘˛˘˝˘ˇ ˘˘˘˚˛˘˚)˘˘ˆ˛˘˘
˘˛˘˛˘ˆˇ˚˘˘˘˘˚˘˘˘˘˛˛˚˘˚˝˚ˇ˘#˘˘˚ˆ˘˘˘˝˘˚˘ˆˆˇ˘ˆ
˘˘˘ˆ˘˝˘˘˚"˘˘˚˘˚˘ˇ˘ˆ˘ˆ˘˚ˆ˛˚˛ˆ˚˘˘˘˘˘˘˚˛˚˚ˆ#˘ˇˇˆˇ˘˝˘˘ˇ˚˘ˇˇ˘˛˛˚ ˚˘˘˘ˇ˚˘˘ˇ˘˘˚ˆ˘*˘
˘˘ˇ˘˚ˇ˘˙˘˚ˇ˘˘˘˙˙˘˘˚˚˘˘˝˘˘˘˛˛˘ˇˇ˚˘˛#˘ˆ˘˘ˇ˘˚˘ˇˇ˘˘ˇˆˇ˘$%&˘ˆ˘˛˘˚˘,
Try this:
import urllib, PyPDF2
import cStringIO
wFile = urllib.urlopen('https://bitcoin.org/bitcoin.pdf')
lFile = PyPDF2.pdf.PdfFileReader( cStringIO.StringIO(wFile.read()) )
Because PyPDF2 does not work, there are a couple of solutions, however, require saving the file to disk.
Solution 1
You can use ps2ascii (if you are using linux or mac ) or xpdf (Windows). Example of using xpdf:
import os
os.system('C:\\xpdfbin-win-3.03\\bin32\\pdftotext.exe C:\\xpdfbin-win-3.03\\bin32\\bitcoin.pdf bitcoin1.txt')
or
import subprocess
subprocess.call(['C:\\xpdfbin-win-3.03\\bin32\\pdftotext.exe', 'C:\\xpdfbin-win-3.03\\bin32\\bitcoin.pdf', 'bitcoin2.txt'])
Solution 2
You can use one of online pdf to txt converter. Example of using pdf.my-addr.com
import MultipartPostHandler
import urllib2
def pdf2text( absolute_path ):
url = 'http://pdf.my-addr.com/pdf-to-text-converter-tool.php'
params = { 'file' : open( absolute_path, 'rb' ),
'encoding': 'UTF-8',
}
opener = urllib2.build_opener( MultipartPostHandler.MultipartPostHandler )
return opener.open( url, params ).read()
print pdf2text('bitcoin.pdf')
Code of MultipartPostHandler you can find here. I tried to use the cStringIO instead open(), but it did not work.
Maybe it will be helpful for you.
I know this question is old, but I had the same issue and here is how I solved it.
In the newer docs of Py2PDF there is a section about streaming data
The example there looks like this:
from io import BytesIO
# Prepare example
with open("example.pdf", "rb") as fh:
bytes_stream = BytesIO(fh.read())
# Read from bytes_stream
reader = PdfReader(bytes_stream)
Therefore, what I did instead was this:
import urllib
from io import BytesIO
from PyPDF2 import PdfReader
NEW_PATH = 'https://example.com/path/to/pdf/online?id=123456789&date=2022060'
wFile = urllib.request.urlopen(NEW_PATH)
bytes_stream = BytesIO(wFile.read())
reader = PdfReader(bytes_stream)

With regards to urllib AttributeError: 'module' object has no attribute 'urlopen'

import re
import string
import shutil
import os
import os.path
import time
import datetime
import math
import urllib
from array import array
import random
filehandle = urllib.urlopen('http://www.google.com/') #open webpage
s = filehandle.read() #read
print s #display
#what i plan to do with it once i get the first part working
#results = re.findall('[<td style="font-weight:bold;" nowrap>$][0-9][0-9][0-9][.][0-9][0-9][</td></tr></tfoot></table>]',s)
#earnings = '$ '
#for money in results:
#earnings = earnings + money[1]+money[2]+money[3]+'.'+money[5]+money[6]
#print earnings
#raw_input()
this is the code that i have so far. now i have looked at all the other forums that give solutions such as the name of the script, which is parse_Money.py, and i have tried doing it with urllib.request.urlopen AND i have tried running it on python 2.5, 2.6, and 2.7. If anybody has any suggestions it would be really welcome, thanks everyone!!
--Matt
---EDIT---
I also tried this code and it worked, so im thinking its some kind of syntax error, so if anybody with a sharp eye can point it out, i would be very appreciative.
import shutil
import os
import os.path
import time
import datetime
import math
import urllib
from array import array
import random
b = 3
#find URL
URL = raw_input('Type the URL you would like to read from[Example: http://www.google.com/] :')
while b == 3:
#get file name
file1 = raw_input('Enter a file name for the downloaded code:')
filepath = file1 + '.txt'
if os.path.isfile(filepath):
print 'File already exists'
b = 3
else:
print 'Filename accepted'
b = 4
file_path = filepath
#open file
FileWrite = open(file_path, 'a')
#acces URL
filehandle = urllib.urlopen(URL)
#display souce code
for lines in filehandle.readlines():
FileWrite.write(lines)
print lines
print 'The above has been saved in both a text and html file'
#close files
filehandle.close()
FileWrite.close()
it appears that the urlopen method is available in the urllib.request module and not in the urllib module as you're expecting.
rule of thumb - if you're getting an AttributeError, that field/operation is not present in the particular module.
EDIT - Thanks to AndiDog for pointing out - this is a solution valid for Py 3.x, and not applicable to Py2.x!
The urlopen function is actually in the urllib2 module. Try import urllib2 and use urllib2.urlopen
I see that you are using Python2 or at least intend to use Python2.
urlopen helper function is available in both urllib and urllib2 in Python2.
What you need to do this, execute this script against the correct version of your python
C:\Python26\python.exe yourscript.py

Categories

Resources