Download images from url that is stored in .txt file?

Download images from url that is stored in .txt file? - python

I'm using python 3.6 on Windows 10, I want to download images so that their urls are stored in 1.txt file.
This is my code:
import requests
import shutil
file_image_url = open("test.txt","r")
while True:
image_url = file_image_url.readline()
filename = image_url.split("/")[-1]
r = requests.get(image_url, stream = True)
r.raw.decode_content = True
with open(filename,'wb') as f:
shutil.copyfileobj(r.raw, f)
but when I run the code above it gives me this error:
Traceback (most recent call last):
File "download_pictures.py", line 10, in <module>
with open(filename,'wb') as f:
OSError: [Errno 22] Invalid argument: '03.jpg\n'
test.txt contains:
https://mysite/images/03.jpg
https://mysite/images/26.jpg
https://mysite/images/34.jpg
When I tried to put just one single URL on test.txt, it works and downloaded the picture,
but I need to download several images.

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.
You are passing this filename(with \n) to open function(hence the OSError). So you need to call strip() on filename before passing into open.

Your filename has the new line character (\n) in it, remove that when you’re parsing for the filename and it should fix your issue. It’s working when you only have one file path in the txt file because there is only one line.

Related

Read a large text file and write to another file with Python

I am trying to convert a large text file (size of 5 gig+) but got a
From this post, I managed to convert encoding format of a text file into a format that is readable with this:
path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?

You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

How to handle a UTF-8 string for the printing and the storage into a file simultaneously?

I have a piece of code that reads files form a specific directory. Then it prints the filenames at the console and - simultaneously - writes them into a logfile. If there is a file with a Unicode character in the file name in the directory, the script stops with an error. I figured out how to print the filename. But I didn't figure out how to write the filename to a logfile.
This is my code (on a Mac, Filesystem is UTF-8):
import sys
import os
rootdir = '/Volumes/USB/dir/'
logfile = open('temp.txt', 'a')
for subdir, dirs, files in os.walk(rootdir):
for file in files:
file = os.path.join(subdir, file)
file2 = file.encode('utf-8')
print(file2)
logfile.write('Reading file: "'+file+'"\n')
In this case the error is
b'/Volumes/USB/dir/testa\xcc\x88test.mp4'
Traceback (most recent call last):
File "/temp/list-files-in-dir.py", line 15, in <module>
logfile.write('Reading file: "'+file+'"\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u0308' in position 46: ordinal not in range(128)
When I change the last line to
logfile.write('Reading file: "'+file2+'"\n')
then the error is
Traceback (most recent call last):
File "/temp/list-files-in-dir.py", line 15, in <module>
logfile.write('Reading file: "'+file2+'"\n')
TypeError: must be str, not bytes
I'm doing something wrong with the encoding / decoding. But what?
EDIT
Thanks to the comment from #lenz down below I now can write to the logfile.
Then I added a new line to the code
size = os.path.getsize(file)
and now I get a new error:
Traceback (most recent call last):
File "/temp/list-files-in-dir.py", line 16, in <module>
size = os.path.getsize(file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/Volumes/USB/dir/testa\xcc\x88test.mp4'
It seems that this internal function also has some trouble with UTF-8. I am stuck again.
EDIT 2
No solution but I found a workaround for the filesize by adding a try condition.
try:
size = os.path.getsize(file)
except:
size = 0

Python 3 strings are default Unicode. Open the file with the encoding you want and don't encode manually. This will fix your later problem with os.path.getsize, since it wants a Unicode string as well.
import os
rootdir = '/Volumes/USB/dir/'
# "with" will close the file when its block is exited.
# Specify the encoding when opening the file.
with open('temp.txt','w',encoding='utf8') as logfile:
for subdir, dirs, files in os.walk(rootdir):
for file in files:
file = os.path.join(subdir, file)
print(file)
logfile.write('Reading file: "'+file+'"\n')

I found out that this problem only occurs when running the script from within Visual Studio Code editor with MagicPython Extension. When I run this code from a normal shell everything is working as expected und the UTF-8 handling is done correctly.

FileNotFoundError in opening txt file with python

I am trying to open a txt file for reading with this code:-
type_comments = [] #Declare an empty list
with open ('society6comments.txt', 'rt') as in_file: #Open file for reading of text data.
for line in in_file: #For each line of text store in a string variable named "line", and
type_comments.append(line.rstrip('\n')) #add that line to our list of lines.
Error:-
Error - Traceback (most recent call last):
File "c:/Users/sultan/python/society6/society6_promotion.py", line 6, in <module>
with open ('society6comments.txt', 'rt') as in_file:
FileNotFoundError: [Errno 2] No such file or directory: 'society6comments.txt'
I already have a file name with 'society6comments.txt' in the same directory has my script so why is it showing error?

The fact that the text file is in the same directory as your program does not make that directory the current working directory. Put the full path to the file in your open() call.

You can use os.path.dirname(__file__) to obtain the directory name of the script, and then join the file name you want:
import os
with open (os.path.join(os.path.dirname(os.path.abspath(__file__)), 'society6comments.txt'), 'rt') as in_file:

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.

You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)

Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

Read filenames from a textfile in python (double backslash issue)

I am trying to read a list of files from a text file. I am using the following code to do that:
filelist = input("Please Enter the filelist: ")
flist = open (os.path.normpath(filelist),"r")
fname = []
for curline in flist:
# check if its a coment - do comment parsing in this if block
if curline.startswith('#'):
continue
fname.append(os.path.normpath(curline));
flist.close() #close the list file
# read the slave files 100MB at a time to generate stokes vectors
tmp = fname[0].rstrip()
t = np.fromfile(tmp,dtype='float',count=100*1000)
This works perfectly fine and I get the following array:
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_HH_slv3_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_VV_slv3_08Oct2012.bin'
The problem is that the '\' charecter is escaped and there is a trailing '\n' in the strings. I used the str.rstrip() to get rid of the '\n' - this works, but leaves the problem of the two back slashes.
I have used the following approaches to try getting rid of these:
Used the codecs.unicode_escape_decode() but I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 56-57: malformed \N character escape. Clearly this is not the right approach because I just want to decode the backslashed, not the rest of the string.
This does not work either: tmp = fname[0].rstrip().replace(r'\\','\\');
Is there no way to make readline() read a raw string?
UPDATE:
Basically I have a text file with 4 file names I would like to open and read data from in python. The text file contains:
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_HH_slv3_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_VV_slv3_08Oct2012.bin
I would like to open each file one by one and read 100MBs of data from them.
When I use this command:np.fromfile(flist[0],dtype='float',count=100) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
Update
Full Traceback:
Please Enter the filelist: H:/Shaunak/TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri/NEST_oregistration/Glacier_coreg_Cnv/filelist.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 581, in runfile
execfile(filename, namespace)
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 41, in execfile
exec(compile(open(filename).read(), filename, 'exec'), namespace)
File "H:/Shaunak/Programs/Arnab_glacier_vel/Stokes_generation_2.py", line 28, in <module>
t = np.fromfile(tmp,dtype='float',count=100*1000)
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
>>>

As #volcano stated, double slash is only an internal representation. If you print it, they're gone. The same if you write it to files, there will only be one '\'.
>>> string_with_double_backslash = "Here is a double backslash: \\"
>>> print(string_with_double_backslash)
Here is a double backslash: \

try this:
a_escaped = 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\\\'s Cut"'
a_unescaped = codecs.getdecoder("unicode_escape")(a)[0]
yielding:
'attachment; filename="Nuovo Cinema Paradiso 1988 Director\'s Cut"'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download images from url that is stored in .txt file? - python

Your filename has the new line character (\n) in it, remove that when you’re parsing for the filename and it should fix your issue. It’s working when you only have one file path in the txt file because there is only one line.

Related

Read a large text file and write to another file with Python

How to handle a UTF-8 string for the printing and the storage into a file simultaneously?

FileNotFoundError in opening txt file with python

How to turn a comma seperated value TXT into a CSV for machine learning

Read filenames from a textfile in python (double backslash issue)

Categories

Resources