Accessing a file path saved in a .txt file. (Python) - python

I have a text file that contains file paths of files that I wish to open.
The text file looks like this:
28.2 -1.0 46 14 10 .\temp_109.17621\voltage_28.200\power_-1.txt
28.2 -2.0 46 16 10 .\temp_109.17621\voltage_28.200\power_-2.txt
...
I would like to open the files at this filepath.
First step is to load each filepath from the text file.
I've tried this using:
path = np.loadtxt('NonLorentzianData.txt',usecols=[5],dtype='S16')
which generates a path[1] that looks like:
.\\temp_109.17621
...
rather than the entire file path.
Am I using the wrong dtype or is this not possible with loadtxt?

You use S16 as a type and get .\\temp_109.17621 as an result (\\ is escaped \) and was returned string with length=16.
Try to use np.genfromtxt and dtype=None or properly adjust dtype='S45' (in your case)
Inspired by post

If you change the data type to np.str_ it will work:
path = np.loadtxt('NonLorentzianData.txt',usecols=[5],dtype=np.str_)
print(path[1])
.\temp_109.17621\voltage_28.200\power_-2.txt
Or using dtype=("S44") will also work which is the length of your longest of the two paths.
You are specifying a 16 character string so you only get the first 16 characters.
In [17]: s = ".\\temp_109.17621"
In [18]: len(s)
Out[18]: 16
# 43 character string
In [26]: path = np.loadtxt('words.txt',usecols=[5],dtype=("S43"))
In [27]: path[1]
Out[27]: '.\\temp_109.17621\\voltage_28.200\\power_-2.tx'
In [28]: len(path[1])
Out[28]: 43
# 38 character string
In [29]: path = np.loadtxt('words.txt',usecols=[5],dtype=("S38"))
In [30]: path[1]
Out[30]: '.\\temp_109.17621\\voltage_28.200\\power_'
In [31]: len(path[1])
Out[31]: 38
In [32]: path = np.loadtxt('words.txt',usecols=[5],dtype=np.str_)
In [33]: path[1]
Out[33]: '.\\temp_109.17621\\voltage_28.200\\power_-2.txt'
If you look at the docs you will see what every dtype does and how to use them.
If you just want all the file paths you can also use csv.reader:
import csv
with open("NonLorentzianData.txt") as f:
reader = csv.reader(f,delimiter=" ")
for row in reader:
with open(row[-1]) as f:
.....

Related

How to get os.system() output as a string and not a set of characters? [duplicate]

This question already has an answer here:
How can I make a for-loop loop through lines instead of characters in a variable?
(1 answer)
Closed 6 years ago.
I'm trying to get output from os.system using the following code:
p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
ls = p.communicate()[0]
when I print the output I get:
> print (ls)
file1.txt
file2.txt
The output somehow displays as two separate strings, However, when I try to print out the strings of filenames using a for loop i get
a list of characters instead:
>> for i in range(len(ls)):
> print i, ls[i]
Output:
0 f
1 i
2 l
3 e
4 1
5 .
6 t
7 x
8 t
9 f
10 i
11 l
12 e
13 2
14 .
15 t
16 x
17 t
I need help ensuring the os.system() output returns as strings and
not a set of characters.
p.communicate returns a string. It may look like a list of filenames, but it is just a string. You can convert it to a list of filenames by splitting on the newline character:
s = p.communicate()[0]
for line in s.split("\n"):
print "line:", line
Are you aware that there are built-in functions to get a list of files in a directory?
for i in range(len(...)): is usually a code smell in Python. If you want to iterate over the numbered elements of a collection to canonical method is for i, element in enumerate(...):.
The code you quote clearly isn't the code you ran, since when you print ls you see two lines separated by a newline, but when you iterate over the characters of the string the newline doesn't appear.
The bottom line is that you are getting a string back from communicate()[0], but you are then iterating over it, giving you the individual characters. I suspect what you would like to do is use the .splt() or .splitlines() method on ls to get the individual file names, but you are trying to run before you can walk. Forst of all, get a clear handle on what the communicate method is returning to you.
Apparently, in Python 3.6, p.communicate returns bytes object:
In [16]: type(ls)
Out[16]: bytes
Following seems to work better:
In [22]: p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
In [23]: ls = p.communicate()[0].split()
In [25]: for i in range(len(ls)):
...: print(i, ls[i])
...:
0 b'file1.txt'
1 b'file2.txt'
But I would rather use os.listdir() instead of subprocess:
import os
for line in os.listdir():
print line

How can I ignore binary data headers (or just read the serialized data) when enumerating a file with Python?

I have a file that has a number of headers of binary data (I suppose that is what it is) and after that, there are lines of text. I'm just starting to work with it, but I noticed that if I use the Python "enumerate" function it doesn't appear to read the lines I want it to read (I'm using Python 2.7.8). It is not returning the lines I'm interested in. In my text editor I can see the data I want but the result indicates maybe it is "serialized data"? There is more of the same binary at the end of the file.
Sample from Data File (I'm hoping to skip the first 8 lines):
I want to start with the line that starts with "curve".
ÿÿÿÿ  ENetDeedPlotter, Version=5.6.1.0, Culture=neutral, PublicKeyToken=null QSystem.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a Net_Deed_Plotter.SerializeData! LinesOfDataNumberOfTractsSeditortextSedLineStract
SNoteArraySNorthArrow
Slandscape
SPaperSizeSPaperBounds
SPrinterScaleSPrinterScaleStrSAllTractsMouseOffsetNSAllTractsMouseOffsetESAllTractsNOffsetSAllTractsEOffsetSImageScroll_YSImageScroll_XSImage_YSImage_XSImageFilePath
SUpDateMapSttcSttStbSboSnb
STitleText SDateText SPOBLines
SLabelCornersSNAmountTract0HasBeenMovedSEAmountTract0HasBeenMoved Net_Deed_Plotter.LineData[] Net_Deed_Plotter.TractData[] System.Collections.ArrayList+Net_Deed_Plotter.PaperForm+NorthArrowStruct !System.Drawing.Printing.PaperSize System.Drawing.Rectangle  ' Ân40.4635w 191.02
curve right radius 953.50 arc 361.84 chord n60.5705e 359.07
s56.3005e 3.81
s19.4515w 170.63
s13.4145w 60.67
s51.0250w 155.35
n40.4635w 191.02
curve left radius 615.16 arc 202.85 chord s45.19w 201.94
Sample Script
# INPUTS TO BE UPDATED
inputNDP = r"N:\Parcels\Parcels2012\57-11-115.ndp"
outputTXT = r"N:\Parcels\Parcels2012\57-11-115.txt"
# END OF INPUTS TO BE UPDATED
fileNDP = open(inputNDP, 'r')
for line in enumerate(9, fileNDP):
print line
Result
(9, '\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01\x00\x00\x00\x00\x00\x00\x00\x0c\x02\x00\x00\x00ENetDeedPlotter, Version=5.6.1.0, Culture=neutral, PublicKeyToken=null\x0c\x03\x00\x00\x00QSystem.Drawing, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a\x05\x01\x00\x00\x00\x1eNet_Deed_Plotter.SerializeData!\x00\x00\x00\x0bLinesOfData\x0eNumberOfTracts\x0bSeditortext\x07SedLine\x06Stract\n')
(10, 'SNoteArray\x0bSNorthArrow\n')
(11, 'Slandscape\n')
(12, 'SPaperSize\x0cSPaperBounds\rSPrinterScale\x10SPrinterScaleStr\x16SAllTractsMouseOffsetN\x16SAllTractsMouseOffsetE\x11SAllTractsNOffset\x11SAllTractsEOffset\x0eSImageScroll_Y\x0eSImageScroll_X\x08SImage_Y\x08SImage_X\x0eSImageFilePath\n')
(13, 'SUpDateMap\x04Sttc\x03Stt\x03Stb\x03Sbo\x03Snb\n')
(14, 'STitleText\tSDateText\tSPOBLines\rSLabelCorners')
>>>
Be aware that enumerate takes a start parameter that only sets the initial value of the number. It does not cause it to skip over any contents.
If you want to skip lines, you'll need to filter your enumeration:
x=xrange(20)
>>> for num,text in (tpl for tpl in enumerate(x) if tpl[0] >8):
... print num,text
...
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
I figured out that since the file was in a binary format, I needed to read it in that way with open('myfile', 'rb') rather than with open('myfile', 'r') and I got a lot of help from this question.
The re-write looks like this ...
#ToDO write output file
# INPUTS TO BE UPDATED
inputNDP = r"N:\Parcels\Parcels2012\57-11-115.ndp"
# END OF INPUTS TO BE UPDATED
fileNDP = open(inputNDP, 'rb')
def strip_nonascii(b):
return b.decode('ascii', errors='ignore')
n = 0
for line in fileNDP:
if n > 5:
if '|' in line:
break
print(strip_nonascii(line)).strip('\n') # + str(n)
n += 1

Converting a File with Strings into Floats then Adding Them

I have to create a program for my class that reads a file, coverts the lists of numbers within to floats, then adds them all together and prints only the answer onto the screen.
The farthest I've gotten is:
fname = sys.argv[1]
handle = open(fname, "r")
total = 0
for line in handle:
linearr = line.split()
for item in linearr:
item = float(item)
One of the files look like:
0.13 10.2 15.8193
0.09 99.6
100.1
100.2 17.8 56.33 12
19e-2 7.5
Trying to add the converted list to the total (total += item) has not worked. I'm really lost and would greatly appreciate any assistance.
You are almost there. total += item is the correct approach, add that line to your for loop after the conversion to float.
Make sure to print your result at the end with print(total), you probably forgot that too.
For your test file this is giving me the result 419.9593
You can use a generator expression with sum,splitting the lines into lists and casting each subelement to float:
In [9]: cat test.txt
0.13 10.2 15.8193
0.09 99.6
100.1
100.2 17.8 56.33 12
19e-2 7.5
In [10]: with open("test.txt") as f:
sm = sum(float(s) for row in map(str.split, f) for s in row)
....:
In [11]: sm
Out[11]: 419.9593
You can also combine with itertools.chain to flatten the rows:
In [1]: from itertools import chain
In [2]: with open("test.txt") as f:
sm = sum(map(float, chain(*(map(str.split,f)))))
...:
In [3]: sm
Out[3]: 419.9593
On a sidenote, you should always use with to open your files, it will automatically close your files for you.

Parse complex text file for data analysis in Python

I am a complete novice to Python-or programming.
I have a text file to parse into a CSV. I am not able to provide an example of the text file at this time.
The text is several (thousand) lines with no carriage returns.
There are 4 types of records in the file (A, B, C, or I).
Each record type has a specific format based on the size of the data element.
There are no delimiters.
Immediately after the last data element in the record type, the next record type appears.
I have been trying to translate from a different language what this might look like in Python.
Here is an example of what I've written (not correct format)
file=open('TestPython.txt'), 'r' # from current working directory
dataString=file.read()
data=()
i=0
while i < len(dataString):
i = i+2
curChar = dataString(i)
# Need some help on the next line var curChar = dataString[i]
if curChar = "A"
NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
NPI.strip()
PCN = datastring(i+17, 40)
PCN.strip()
seqNo = dataString(i+41, 42)
seqNo.strip()
MRN = dataString(i+43, 66)
MRN.strip()
if curChar = "B"
NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
NPI.strip()
PCN = datastring(i+17, 40)
PCN.strip()
seqNo = dataString(i+41, 42)
seqNo.strip()
RC1 = (i+43, 46)
RC1.strip()
RC2 = (i+47, 50)
RC2.strip()
RC3 = (i+51, 54)
RC3.strip()
if curChar = "C"
NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
NPI.strip()
PCN = datastring(i+17, 40)
PCN.strip()
seqNo = dataString(i+41, 42)
seqNo.strip()
DXVer = (i=43, 43)
DXVer.strip()
AdmitDX = (i+44, 50)
AdmitDX.strip()
RVisit1 = (i+51, 57)
RVisit1.strip()
Here's a Dummied-up version of a piece of the text file.
A 63489564696474677 9845687 777 67834717467764674 TUANU TINBUNIU 47 ERTYNU TDFGH UU748897764 66762589668777486U6764467467774767 7123609989 9 O
B 79466945684634677 676756787344786474634890 7746.66 7 96 4 7 7 9 7 774666 44969 494 7994 99666 77478 767766
B 098765477 64697666966667 9 99 87966 47798 797499
C 63489564696474677 6747494 7494 7497 4964 4976 N7469 4769 N9784 9677
I 79466944696474677 677769U6 8888 67764674
A 79466945684634677 6767994 777 696789989 6464467464764674 UIIUN UITTI 7747 NUU 9 ATU 4 UANU OSASDF NU67479 66567896667697487U6464467476777967 7699969978 7699969978 9 O
As you can see, there can be several of each type in the file. The way this example pastes, it looks like the type is the first character on a line. This is not the case on the actual file (i made this sample in Word).
You might take a look at pyparsing.
You better process the file as you read it.
First, do a file.read(1) to determine which type of record is up next.
Then, depending on the type, read the fields, which if I understand you correctly are fixed width. So for type 'A' this would look like this:
def processA (file):
NPI = file.read(16).strip() #assuming the NPI is 16 bytes long
PCN = file.read(23).strip() #assuming the PCN is 23 bytes long
seqNo = file.read(1).strip() #assuming seqNo is 1 byte long
MRN = file.read(23).strip() #assuming MRN is 23 bytes long
return {"NPI":NPI,"PCN":PCN, "seqNo":seqNo, "MRN":MRN}
If the file is not ASCII, there's a bit more work to get the encoding right and read characters instead of bytes.

Remove punctuation from file name while keeping file extension intact

I would like to remove all punctuation from a filename but keep its file extension intact.
e.g. I want:
Flowers.Rose-Murree-[25.10.11].jpg
Time.Square.New-York-[20.7.09].png
to look like:
Flowers Rose Muree 25 10 11.jpg
Time Square New York 20 7 09.png
I'm trying python:
re.sub(r'[^A-Za-z0-9]', ' ', filename)
But that produces:
Flowers Rose Muree 25 10 11 jpg
Time Square New York 20 7 09 png
How do I remove the punctuation but keep the file extension?
There's only one right way to do this:
os.path.splitext to get the filename and the extension
Do whatever processing you want to the filename.
Concatenate the new filename with the extension.
You could use a negative lookahead, that asserts that you are not dealing with a dot that is only followed by digits and letters:
re.sub(r'(?!\.[A-Za-z0-9]*$)[^A-Za-z0-9]', ' ', filename)
I suggest you to replace each occurrence of [\W_](?=.*\.) with space .
See if this works for you. You can actually do it without Regex
>>> fname="Flowers.Rose-Murree-[25.10.11].jpg"
>>> name,ext=os.path.splitext(fname)
>>> name = name.translate(None,string.punctuation)
>>> name += ext
>>> name
'FlowersRoseMurree251011.jpg'
>>>
#katrielalex beat me to the type of answer, but anyway, a regex-free solution:
In [23]: f = "/etc/path/fred.apple.png"
In [24]: path, filename = os.path.split(f)
In [25]: main, suffix = os.path.splitext(filename)
In [26]: newname = os.path.join(path,''.join(c if c.isalnum() else ' ' for c in main) + suffix)
In [27]: newname
Out[27]: '/etc/path/fred apple.png'

Categories

Resources