Read filenames from a textfile in python (double backslash issue)

Read filenames from a textfile in python (double backslash issue) - python

I am trying to read a list of files from a text file. I am using the following code to do that:
filelist = input("Please Enter the filelist: ")
flist = open (os.path.normpath(filelist),"r")
fname = []
for curline in flist:
# check if its a coment - do comment parsing in this if block
if curline.startswith('#'):
continue
fname.append(os.path.normpath(curline));
flist.close() #close the list file
# read the slave files 100MB at a time to generate stokes vectors
tmp = fname[0].rstrip()
t = np.fromfile(tmp,dtype='float',count=100*1000)
This works perfectly fine and I get the following array:
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_HH_slv3_08Oct2012.bin\n'
'H:\\Shaunak\\TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\q_VV_slv3_08Oct2012.bin'
The problem is that the '\' charecter is escaped and there is a trailing '\n' in the strings. I used the str.rstrip() to get rid of the '\n' - this works, but leaves the problem of the two back slashes.
I have used the following approaches to try getting rid of these:
Used the codecs.unicode_escape_decode() but I get this error:
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 56-57: malformed \N character escape. Clearly this is not the right approach because I just want to decode the backslashed, not the rest of the string.
This does not work either: tmp = fname[0].rstrip().replace(r'\\','\\');
Is there no way to make readline() read a raw string?
UPDATE:
Basically I have a text file with 4 file names I would like to open and read data from in python. The text file contains:
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\i_HH_mst_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_HH_slv3_08Oct2012.bin
H:\Shaunak\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\NEST_oregistration\Glacier_coreg_Cnv\q_VV_slv3_08Oct2012.bin
I would like to open each file one by one and read 100MBs of data from them.
When I use this command:np.fromfile(flist[0],dtype='float',count=100) I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
Update
Full Traceback:
Please Enter the filelist: H:/Shaunak/TerraSAR_X- Sep2012-Glacier_Velocity_Gangotri/NEST_oregistration/Glacier_coreg_Cnv/filelist.txt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 581, in runfile
execfile(filename, namespace)
File "G:\WinPython-32bit-3.3.2.3\python-3.3.2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 41, in execfile
exec(compile(open(filename).read(), filename, 'exec'), namespace)
File "H:/Shaunak/Programs/Arnab_glacier_vel/Stokes_generation_2.py", line 28, in <module>
t = np.fromfile(tmp,dtype='float',count=100*1000)
FileNotFoundError: [Errno 2] No such file or directory: 'H:\\Shaunak\\TerraSAR_X-Sep2012-Glacier_Velocity_Gangotri\\NEST_oregistration\\Glacier_coreg_Cnv\\i_HH_mst_08Oct2012.bin'
>>>

As #volcano stated, double slash is only an internal representation. If you print it, they're gone. The same if you write it to files, there will only be one '\'.
>>> string_with_double_backslash = "Here is a double backslash: \\"
>>> print(string_with_double_backslash)
Here is a double backslash: \

try this:
a_escaped = 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\\\'s Cut"'
a_unescaped = codecs.getdecoder("unicode_escape")(a)[0]
yielding:
'attachment; filename="Nuovo Cinema Paradiso 1988 Director\'s Cut"'

Related

Download images from url that is stored in .txt file?

I'm using python 3.6 on Windows 10, I want to download images so that their urls are stored in 1.txt file.
This is my code:
import requests
import shutil
file_image_url = open("test.txt","r")
while True:
image_url = file_image_url.readline()
filename = image_url.split("/")[-1]
r = requests.get(image_url, stream = True)
r.raw.decode_content = True
with open(filename,'wb') as f:
shutil.copyfileobj(r.raw, f)
but when I run the code above it gives me this error:
Traceback (most recent call last):
File "download_pictures.py", line 10, in <module>
with open(filename,'wb') as f:
OSError: [Errno 22] Invalid argument: '03.jpg\n'
test.txt contains:
https://mysite/images/03.jpg
https://mysite/images/26.jpg
https://mysite/images/34.jpg
When I tried to put just one single URL on test.txt, it works and downloaded the picture,
but I need to download several images.

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline.
You are passing this filename(with \n) to open function(hence the OSError). So you need to call strip() on filename before passing into open.

Your filename has the new line character (\n) in it, remove that when you’re parsing for the filename and it should fix your issue. It’s working when you only have one file path in the txt file because there is only one line.

Read a large text file and write to another file with Python

I am trying to convert a large text file (size of 5 gig+) but got a
From this post, I managed to convert encoding format of a text file into a format that is readable with this:
path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?

You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.

How to turn a comma seperated value TXT into a CSV for machine learning

How do I turn this format of TXT file into a CSV file?
Date,Open,high,low,close
1/1/2017,1,2,1,2
1/2/2017,2,3,2,3
1/3/2017,3,4,3,4
I am sure you can understand? It already has the comma -eparated values.
I tried using numpy.
>>> import numpy as np
>>> table = np.genfromtxt("171028 A.txt", comments="%")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py", line 1551, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rb'))
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 151, in open
return ds.open(path, mode)
File "C:\Users\Smith\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\_datasource.py", line 501, in open
raise IOError("%s not found." % path)
OSError: 171028 A.txt not found.
I have (S&P) 500 txt files to do this with.

You can use csv module. You can find more information here.
import csv
txt_file = 'mytext.txt'
csv_file = 'mycsv.csv'
in_txt = csv.reader(open(txt_file, "r"), delimiter=',')
out_csv = csv.writer(open(csv_file, 'w+'))
out_csv.writerows(in_txt)

Per #dclarke's comment, check the directory from which you run the code. As you coded the call, the file must be in that directory. When I have it there, the code runs without error (although the resulting table is a single line with four nan values). When I move the file elsewhere, I reproduce your error quite nicely.
Either move the file to be local, add a local link to the file, or change the file name in your program to use the proper path to the file (either relative or absolute).

Python script to unzip and print one line of a file

I am trying a simple example of retrieving data from a file and printing only one line of the output. I get semicolon error around encoded and 'r'.
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
encoded=data.read()
print encoded[2]
It gives this error:
Traceback (most recent call last):
File "filter_articles.scpt", line 4, in <module> encoded=data.read()
File "/usr/lib/python2.7/gzip.py", line 249, in read self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 308, in _read self._add_read_data( uncompress )
File "/usr/lib/python2.7/gzip.py", line 326, in _add_read_data self.extrabuf = self.extrabuf[offset:] + data MemoryError
I guess this is because the file is huge and was not able to read the content? What could be better way to print few lines of the file?

I am assuming that:
You meant to have quotes around the file name in your script.
You actually want the third line (as your post suggests) and not the third character (as your script suggests)
In this case the following should work:
import gzip
data = gzip.open('pagecounts-20130601-000000.gz', 'r')
data.readline()
data.readline()
print data.readline()

where is the call to encode the string or force the string to need to be encoded in this file?

I know this may seem rude or mean or unpolite, but I need some help to try to figure out why I cant call window.loadPvmFile("f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm") exactly like that as a string. Instead of doing that, it gives me a traceback error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 415, in <module>
window.loadPvmFile("f:\games\#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")
File "F:\Python Apps\pvmViewer_v1_1.py", line 392, in loadPvmFile
file1 = open(path, "rb")
IOError: [Errno 22] invalid mode ('rb') or filename:
'f:\\games\\#DD.ATC3.Root\\common\\models\x07300\x07mu\\dummy.pvm'
Also notice, that in the traceback error, the file path is different. When I try a path that has no letters in it except for the drive letter and filename, it throws this error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 416, in <module>
loadPvmFile('f:\0\0\dummy.pvm')
File "F:\Python Apps\pvmViewer_v1_1.py", line 393, in loadPvmFile
file1 = open(path, "r")
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
I have searched for the place that the encode function is called or where the argument is encoded and cant find it. Flat out, I am out of ideas, frustrated and I have nowhere else to go. The source code can be found here: PVM VIEWER
Also note that you will not be able to run this code and load a pvm file and that I am using portable python 2.7.3! Thanks for everyone's time and effort!

\a and \0 are escape sequences. Use r'' (or R'') around the string to mark it as a raw string.
window.loadPvmFile(r"f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read filenames from a textfile in python (double backslash issue) - python

As #volcano stated, double slash is only an internal representation. If you print it, they're gone. The same if you write it to files, there will only be one '\'. >>> string_with_double_backslash = "Here is a double backslash: \\" >>> print(string_with_double_backslash) Here is a double backslash: \

try this: a_escaped = 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\\\'s Cut"' a_unescaped = codecs.getdecoder("unicode_escape")(a)[0] yielding: 'attachment; filename="Nuovo Cinema Paradiso 1988 Director\'s Cut"'

Related

Download images from url that is stored in .txt file?

Read a large text file and write to another file with Python

How to turn a comma seperated value TXT into a CSV for machine learning

Python script to unzip and print one line of a file

where is the call to encode the string or force the string to need to be encoded in this file?

Categories

Resources