wget loop to all the lines (url) in textfile and download Windows

wget loop to all the lines (url) in textfile and download Windows - python

I have a simple task but cannot make my code work. I want to loop over the URLs listed in my textfile and download it using wget command in Python. Each URL are placed in separate line in the textfile.
Basically, this is the structure of the list in my textfile:
http://e4ftl01.cr.usgs.gov//MODIS_Composites/MOLT/MOD11C3.005/2000.03.01/MOD11C3.A2000061.005.2007177231646.hdf
http://e4ftl01.cr.usgs.gov//MODIS_Composites/MOLT/MOD11C3.005/2014.12.01/MOD11C3.A2014335.005.2015005235231.hdf
all the URLs are about 178 lines. Then save it in the current working directory.
Below is the initial code that I am working:
import os, fileinput, urllib2 as url, wget
os.chdir("E:/Test/dwnld")
for line in fileinput.FileInput("E:/Test/dwnld/data.txt"):
print line
openurl = wget.download(line)
The error message is:
Traceback (most recent call last): File "E:\Python_scripts\General_purpose\download_url_from_textfile.py", line 5, in <module>
openurl = wget.download(line) File "C:\Python278\lib\site-packages\wget.py", line 297, in download
(fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".") File "C:\Python278\lib\tempfile.py", line 308, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags) File "C:\Python278\lib\tempfile.py", line 239, in _mkstemp_inner
fd = _os.open(file, flags, 0600) OSError: [Errno 22] Invalid argument: ".\\MOD11C3.A2000061.005.2007177231646.hdf'\n.frbfrp.tmp"

Try to use urllib.urlretrieve. Check the documentation here: https://docs.python.org/2/library/urllib.html#urllib.urlretrieve

Related

Download images with multiprocessing ThreadPool, FileNotFoundError?

I have a list (tuple), where is local path and img url. Works fine until now.
On Python 3.11 and Windows 11 still works well (no error at all), but on Ubuntu 20.04 with python 3.11 throwing FileNotFoundError.
I'm downloading more then 30 images, but i will cut off the list here.
I can't figure out what I'm missing?
/media/hdd/img/ directory exists with 755 permission.
from multiprocessing.pool import ThreadPool
import os
import requests
list2 = [('/media/hdd/img/tt24949716.jpg', 'https://www.cdnzone.org/uploads/2023/01/29/Devil-Beneath-2023-Cover.jpg'),
('/media/hdd/img/tt11231356.jpg', 'https://www.cdnzone.org/uploads/2023/01/29/Beneath-the-Green-2022-Cover.jpg'),
('/media/hdd/img/tt13103732.jpg', 'https://www.cdnzone.org/uploads/2023/01/29/The-Offering-2022-Cover.jpg'),
('/media/hdd/img/tt14826022.jpg', 'https://www.cdnzone.org/uploads/2023/01/29/You-People-2023-Cover.jpg'),
('/media/hdd/img/tt18252340.jpg', 'https://www.cdnzone.org/uploads/2023/01/29/Carnifex-2022-Cover.jpg')]
def download_images(list):
results = ThreadPool(8).imap_unordered(fetch_url, list)
for path in results:
print(path)
def fetch_url(entry):
path, uri = entry
if not os.path.exists(path):
r = requests.get(uri, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
for chunk in r:
f.write(chunk)
return path
>>> download_images(list2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in download_images
File "/usr/lib/python3.11/multiprocessing/pool.py", line 930, in __init__
File "/usr/lib/python3.11/multiprocessing/pool.py", line 196, in __init__
File "/usr/lib/python3.11/multiprocessing/context.py", line 113, in SimpleQueue
File "/usr/lib/python3.11/multiprocessing/queues.py", line 341, in __init__
File "/usr/lib/python3.11/multiprocessing/context.py", line 68, in Lock
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 162, in __init__
File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 57, in __init__
FileNotFoundError: [Errno 2] No such file or directory

BadRarFile when extracting single file using RarFile in Python

I need to extract a single file (~10kB) from many very large RAR files (>1Gb). The code below shows a basic implementation of how I'm doing this.
from rarfile import RarFile
rar_file='D:\\File.rar'
file_of_interest='Folder 1/Subfolder 2/File.dat'
output_folder='D:/Output'
rardata = RarFile(rar_file)
rardata.extract(file_of_interest, output_folder)
rardata.close()
However, the extract instruction is returning the following error: rarfile.BadRarFile: Failed the read enough data: req=16384 got=52
When I open the file using WinRAR, I can extract the file successfully, so I'm sure the file isn't corrupted.
I've found some similar questions, but not a definite answer that worked for me.
Can someone help me to solve this error?
Additional info:
Windows 10 build 1909
Spyder 5.0.0
Python 3.8.1
Complete traceback of the error:
Traceback (most recent call last):
File "D:\Test\teste_rar_2.py", line 27, in <module>
rardata.extract(file_of_interest, output_folder)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 826, in extract
return self._extract_one(inf, path, pwd, True)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 912, in _extract_one
return self._make_file(info, dstfn, pwd, set_attrs)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 927, in _make_file
shutil.copyfileobj(src, dst)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "C:\Users\bernard.kusel\AppData\Local\Continuum\anaconda3\lib\site-packages\rarfile.py", line 2197, in read
raise BadRarFile("Failed the read enough data: req=%d got=%d" % (orig, len(data)))
BadRarFile: Failed the read enough data: req=16384 got=52

python is not reading a gzip file with gzip

I'm trying to make a tokanizer, I have a file that I'm trying to read with gzip. but it gives the following error:
Traceback (most recent call last):
File "extract_sends.py", line 14, in <module>
main()
File "extract_sends.py", line 12, in main
file_content = f.read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 276, in read
return self._buffer.read(size)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 463, in read
if not self._read_gzip_header():
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'# ')
This is my code, I'm just starting but if python can't read the file I'm not comming far.
import gzip
import sys
import re
def main():
file = sys.argv[0]
with gzip.open(file, 'rt') as f:
file_content = f.read()
main()
The file is a .txt.gz file

You should try the simplest ever debugging technique: print the value you are trying to use.
Anyway if you did that you would see that sys.argv[0] is not the filename parameter you put on the commandline after the command to run your code - that is sys.argv[1]
So change:
file = sys.argv[0]
To:
file = sys.argv[1]
print( “Reading from file”,file )

Reading password protected Word Documents with zipfile

I am trying to read a password protected word document on Python using zipfile.
The following code works with a non-password protected document, but gives an error when used with a password protected file.
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
import zipfile
psw = "1234"
WORD_NAMESPACE = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
PARA = WORD_NAMESPACE + 'p'
TEXT = WORD_NAMESPACE + 't'
def get_docx_text(path):
document = zipfile.ZipFile(path, "r")
document.setpassword(psw)
document.extractall()
xml_content = document.read('word/document.xml')
document.close()
tree = XML(xml_content)
paragraphs = []
for paragraph in tree.getiterator(PARA):
texts = [node.text
for node in paragraph.getiterator(TEXT)
if node.text]
if texts:
paragraphs.append(''.join(texts))
return '\n\n'.join(paragraphs)
When running get_docx_text() with a password protected file, I received the following error:
Traceback (most recent call last):
File "<ipython-input-15-d2783899bfe5>", line 1, in <module>
runfile('/Users/username/Workspace/Python/docx2txt.py', wdir='/Users/username/Workspace/Python')
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 680, in runfile
execfile(filename, namespace)
File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
builtins.execfile(filename, *where)
File "/Users/username/Workspace/Python/docx2txt.py", line 41, in <module>
x = get_docx_text("/Users/username/Desktop/file.docx")
File "/Users/username/Workspace/Python/docx2txt.py", line 23, in get_docx_text
document = zipfile.ZipFile(path, "r")
File "zipfile.pyc", line 770, in __init__
File "zipfile.pyc", line 811, in _RealGetContents
BadZipfile: File is not a zip file
Does anyone have any advice to get this code to work?

I don't think this is an encryption problem, for two reasons:
Decryption is not attempted when the ZipFile object is created. Methods like ZipFile.extractall, extract, and open, and read take an optional pwd parameter containing the password, but the object constructor / initializer does not.
Your stack trace indicates that the BadZipFile is being raised when you create the ZipFile object, before you call setpassword:
document = zipfile.ZipFile(path, "r")
I'd look carefully for other differences between the two files you're testing: ownership, permissions, security context (if you have that on your OS), ... even filename differences can cause a framework to "not see" the file you're working on.
Also --- the obvious one --- try opening the encrypted zip file with your zip-compatible command of choice. See if it really is a zip file.
I tested this by opening an encrypted zip file in Python 3.1, while "forgetting" to provide a password. I could create the ZipFile object (the variable zfile below) without any error, but got a RuntimeError --- not a BadZipFile exception --- when I tried to read a file without providing a password:
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 13, in checksum_contents
inner_file = zfile.open(inner_filename, "r")
File "/usr/lib64/python3.1/zipfile.py", line 903, in open
"password required for extraction" % name)
RuntimeError: File apache.log is encrypted, password required for extraction
I was also able to raise a BadZipfile exception, once by trying to open an empty file and once by trying to open some random logfile text that I'd renamed to a ".zip" extension. The two test files produced identical stack traces, down to the line numbers.
Traceback (most recent call last):
File "./zf.py", line 35, in <module>
main()
File "./zf.py", line 29, in main
print_checksums(zipfile_name)
File "./zf.py", line 22, in print_checksums
for checksum in checksum_contents(zipfile_name):
File "./zf.py", line 10, in checksum_contents
zfile = zipfile.ZipFile(zipfile_name, "r")
File "/usr/lib64/python3.1/zipfile.py", line 706, in __init__
self._GetContents()
File "/usr/lib64/python3.1/zipfile.py", line 726, in _GetContents
self._RealGetContents()
File "/usr/lib64/python3.1/zipfile.py", line 738, in _RealGetContents
raise BadZipfile("File is not a zip file")
zipfile.BadZipfile: File is not a zip file
While this stack trace isn't exactly the same as yours --- mine has a call to _GetContents, and the pre-3.2 "small f" spelling of BadZipfile --- but they're close enough that I think this is the kind of problem you're dealing with.

How to open files, web browsers, and URLs in Python, not in IDLE.

I know you can open files, browsers, and URLs in the Python GUI. However, I don't know how to apply this to programs. For example, none of the below work. (The below are snippets from my growing chat bot program):
def browser():
print('OPENING FIREFOX...')
handle = webbroswer.get() # webbrowser is imported at the top of the file
handle.open('http://youtube.com')
handle.open_new_tab('http://google.com')
and
def file():
file = str(input('ENTER THE FILE\'S NAME AND EXTENSION:'))
action = open(file, 'r')
actionTwo = action.read()
print (actionTwo)
These errors occur, in respect to the above order, but in individual runs:
OPENING FIREFOX...
Traceback (most recent call last):
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 202, in <module>
askForQuestions()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 64, in askForQuestions
browser()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 38, in browser
handle = webbroswer.get()
NameError: global name 'webbroswer' is not defined
>>>
ENTER THE FILE'S NAME AND EXTENSION:file.txt
Traceback (most recent call last):
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 202, in <module>
askForQuestions()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 66, in askForQuestions
file()
File "C:/Users/RCOMP/Desktop/Programming/Python Files/AI/COMPUTRON_01.py", line 51, in file
action = open(file, 'r')
IOError: [Errno 2] No such file or directory: 'file.txt'
>>>
Am I handling this wrong, or can I just not use open() and webbrowser in a program?

You should read the errors and try to understand them - they are very helpful in this case - as they often are:
The first one says NameError: global name 'webbroswer' is not defined.
You can see here that webbrowser is spelled wrong in the code. It also tells you the line it finds the error (line 38)
The second one IOError: [Errno 2] No such file or directory: 'file.txt' tells you that you're trying to open a file that doesn't exist. This does not work because you specified
action = open(file, 'r')
which means that you're trying to read a file. Python does not allow reading from a file that does not exist.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

wget loop to all the lines (url) in textfile and download Windows - python

Try to use urllib.urlretrieve. Check the documentation here: https://docs.python.org/2/library/urllib.html#urllib.urlretrieve

Related

Download images with multiprocessing ThreadPool, FileNotFoundError?

BadRarFile when extracting single file using RarFile in Python

python is not reading a gzip file with gzip

Reading password protected Word Documents with zipfile

How to open files, web browsers, and URLs in Python, not in IDLE.

Categories

Resources