python structured array composition and transformation - python

I created a script that collects a huge data from a .txt file into an array in the format I want [3: 4: n] and the information is recorded as follows (I think). The .txt file is in this format
1.000000e-01 1.000000e-01 1.000000e-01
1.000000e-01 2.000000e-01 3.000000e-01
3.000000e-01 2.000000e-01 1.000000e-01
1.000000e-01 2.000000e-01 4.000000e-01
and repeats for N times and I store basically from 4 lines into for lines (like a block) because I'm working with ASCII files from STL parts.
In this sense, I have this code:
f = open("camaSTLfinalmente.txt","r")
b_line = 0
Coord = []
Normal = []
Vertice_coord = []
Tri = []
blook = []
for line in f:
line = line.rstrip()
if(line):
split = line.split()
for axis in range(0,3):
if(b_line == 0): #normal
Normal.append(split[axis])
else: #triangulo
Vertice_coord.append(split[axis])
if(b_line > 0):
Tri.append(Vertice_coord)
Vertice_coord = []
if(b_line == 3):
block.append(Normal)
block.append(Tri)
Coord.append(block)
block = []
Normal = []
Tri = []
b_line = 0
else:
b_line+=1
print(Coord[0]) #prints the follow line that I wrote after the code
the information is store in the way:
[['1.000000e-01', '1.000000e-01', '1.000000e-01'], [['1.000000e-01', '2.000000e-01', '3.000000e-01'], ['3.000000e-01', '2.000000e-01', '1.000000e-01'], ['1.000000e-01', '2.000000e-01', '-4.000000e-01']]]
Is there any way to simplify it?
I would like to take this opportunity to ask: I wanted to convert this information into numbers, and the ideal would be to read the number after the exponential (e) and change the numbers accordingly, that is, 1.000000e-01 goes to 0,1 (in order to make operations with a similar array where I store information from another .txt file with the same format)
Thanks for the attention,
Pedro

You can try changing the line split = line.split() to:
split = [float(x) for x in line.split()]
if you need the result to be in string and not float datatype:
split = [str(float(x)) for x in line.split()]

I'm not 100% sure if I fully understand what you want but the following code produces the same Coord:
coord = []
with open('camaSTLfinalmente.txt','r') as f:
content = [line.strip().split() for line in f]
for i in range(len(content)//4):
coord.append([content[4*i], content[(4*i+1):(4*i+4)]])
Regarding the second question, as remarked in another answer, the easiest way to handle strings containing a number is to convert them to a number and then format them as string.
s = '1.000000e-01'
n = float(s)
m = '{:.1f}'.format(n)
See the section about string formatting in the Python doc.
A couple remarks:
Generally Stackoverflow doesn't like questions of the form "how do I improve this piece of code", try to ask more specific questions.
The above assumes your file contains 4k lines, change the integer division ...//4 accordingly if you have some lines left at the end that do not form a pack of 4.
don't use capital letters for your variables. While style guides are not mandatory, it is good practice to follow them (Look up PEP-8, pylint, ...)

Related

Number not Printing in python when returning amount

I have some code which reads from a text file and is meant to print max and min altitudes but the min altitude is not printing and there is no errors.
altitude = open("Altitude.txt","r")
read = altitude.readlines()
count = 0
for line in read:
count += 1
count = count - 1
print("Number of Different Altitudes: ",count)
def maxAlt(read):
maxA = (max(read))
return maxA
def minAlt(read):
minA = (min(read))
return minA
print()
print("Max Altitude:",maxAlt(read))
print("Min Altitude:",minAlt(read))
altitude.close()
I will include the Altitude text file if it is needed and once again the minimum altitude is not printing
I'm assuming, your file probably contains numbers & line-breaks (\n)
You are reading it here:
read = altitude.readlines()
At this point read is a list of strings.
Now, when you do:
minA = (min(read))
It's trying to get "the smallest string in read"
The smallest string is usually the empty string "" - which most probably exists at the end of your file.
So your minAlt is actually getting printed. But it happens to be the empty string.
You can fix it by parsing the lines you read into numbers.
read = [float(a) for a in altitude.readlines() if a]
Try below solution
altitudeFile = open("Altitude.txt","r")
Altitudes = [float(line) for line in altitudeFile if line] #get file data into list.
Max_Altitude = max(Altitudes)
Min_Altitude = min(Altitudes)
altitudeFile.close()
Change your code to this
with open('numbers.txt') as nums:
lines = nums.read().splitlines()
results = list(map(int, lines))
print(results)
print(max(results))
the first two lines read file and store it as a list. third line convert string list to integer and the last one search in list and return max, use min for minimum.

Removing white space and colon

I have a file with a bunch of numbers that have white spaces and colons and I am trying to remove them. As I have seen on this forum the function line.strip.split() works well to achieve this. Is there a way of removing the white space and colon all in one go? Using the method posted by Lorenzo I have this:
train = []
with open('C:/Users/Morgan Weiss/Desktop/STA5635/DataSets/dexter/dexter_train.data') as train_data:
train.append(train_data.read().replace(' ','').replace(':',''))
size_of_train = np.shape(train)
for i in range(size_of_train[0]):
for j in range(size_of_train[1]):
train[i][j] = int(train[i][j])
print(train)
Although I get this error:
File "C:/Users/Morgan Weiss/Desktop/STA5635/Homework/Homework_1/HW1_Dexter.py", line 11, in <module>
for j in range(size_of_train[1]):
IndexError: tuple index out of range
I think the above syntax is not correct, but anyways as per your question, you can use replace function present in python.
When reading each line as a string from that file you can do something like,
train = []
with open('/Users/sushant.moon/Downloads/dexter_train.data') as f:
list = f.read().split()
for x in list:
data = x.split(':')
train.append([int(data[0]),int(data[1])])
# this part becomes redundant as i have already converted str to int before i append data to train
size_of_train = np.shape(train)
for i in range(size_of_train[0]):
for j in range(size_of_train[1]):
train[i][j] = int(train[i][j])
Here I am using replace function to replace space with blank string, and similar with colon.
You did not provide an example of what your input file looks like so we can only speculate what solution you need. I'm going to suppose that you need to extract integers from your input text file and print their values.
Here's how I would do it:
Instead of trying to eliminate whitespace characters and colons, I will be searching for digits using a regular expression
Consecutive digits would constitute a number
I would convert this number to an integer form.
And here's how it would look like:
import re
input_filename = "/home/evens/Temporaire/Stack Exchange/StackOverflow/Input_file-39359816.txt"
matcher = re.compile(r"\d+")
with open(input_filename) as input_file:
for line in input_file:
for digits_found in matcher.finditer(line):
number_in_string_form = digits_found.group()
number = int(number_in_string_form)
print(number)
But before you run away with this code, you should continue to learn Python because you don't seem to grasp its basic elements yet.

Extracting certain columns from multiple files simultaneously by Python

My purpose is to extract one certain column from the multiple data files.
So, I tried to use glob module to read files and tried to extract one column from each file with for statements like below:
filin = diri + '*_7.txt'
FileList=sorted(glob.glob(filin))
for INPUT in FileList:
a = []
b = []
c = []
T = []
f = open(INPUT,'r')
f.seek(0,0)
for columns in ( raw.strip().split() for raw in f):
b.append(columns[11])
t = np.array(b, float)
print t
t = list(t)
T = T + [t]
f.close()
print T
The number of data files which I used is 32. So, I expected the second 'for' statement ran only 32 times while generating only 32 arrays of t. However, the result doesn't look like what I expected.
I assume that it may be due to the influence from the first 'for' statement, but I am not sure.
Any idea or help would be really appreciated.
Thank you,
Isaac
You clear T = [] for every file. Move T = [] line before first loop.

How do i format the ouput of a list of list into a textfile properly?

I am really new to python and now I am struggeling with some problems while working on a student project. Basically I try to read data from a text file which is formatted in columns. I store the data in a list of list and sort and manipulate the data and write them into a file again. My problem is to align the written data in proper columns. I found some approaches like
"%i, %f, %e" % (1000, 1000, 1000)
but I don't know how many columns there will be. So I wonder if there is a way to set all columns to a fixed width.
This is how the input data looks like:
2 232.248E-09 74.6825 2.5 5.00008 499.482
5 10. 74.6825 2.5 -16.4304 -12.3
This is how I store the data in a list of list:
filename = getInput('MyPath', workdir)
lines = []
f = open(filename, 'r')
while 1:
line = f.readline()
if line == '':
break
splitted = line.split()
lines.append(splitted)
f.close()
To write the data I first put all the row elements of the list of list into one string with a free fixed space between the elements. But instead i need a fixed total space including the element. But also I don't know the number of columns in the file.
for k in xrange(len(lines)):
stringlist=""
for i in lines[k]:
stringlist = stringlist+str(i)+' '
lines[k] = stringlist+'\n'
f = open(workdir2, 'w')
for i in range(len(lines)):
f.write(lines[i])
f.close()
This code works basically, but sadly the output isn't formatted properly.
Thank you very much in advance for any help on this issue!
You are absolutely right about begin able to format widths as you have above using string formatting. But as you correctly point out, the tricky bit is doing this for a variable sized output list. Instead, you could use the join() function:
output = ['a', 'b', 'c', 'd', 'e',]
# format each column (len(a)) with a width of 10 spaces
width = [10]*len(a)
# write it out, using the join() function
with open('output_example', 'w') as f:
f.write(''.join('%*s' % i for i in zip(width, output)))
will write out:
' a b c d e'
As you can see, the length of the format array width is determined by the length of the output, len(a). This is flexible enough that you can generate it on the fly.
Hope this helps!
String formatting might be the way to go:
>>> print("%10s%9s" % ("test1", "test2"))
test1 test2
Though you might want to first create strings from those numbers and then format them as I showed above.
I cannot fully comprehend your writing code, but try working on it somehow like that:
from itertools import enumerate
with open(workdir2, 'w') as datei:
for key, item in enumerate(zeilen):
line = "%4i %6.6" % key, item
datei.write(item)

reusable function: substituting the values returned by another function

Below is the snippet: I'm parsing job log and the output is the formatted result.
def job_history(f):
def get_value(j,n):
return j[n].split('=')[1]
lines = read_file(f)
for line in lines:
if line.find('Exit_status=') != -1:
nLine = line.split(';')
jobID = '.'.join(nLine[2].split('.',2)[:-1]
jData = nLine[3].split(' ')
jUsr = get_value(jData,0)
jHst = get_value(jData,9)
jQue = get_value(jData,3)
eDate = job_value(jData,14)
global LJ,LU,LH,LQ,LE
LJ = max(LJ, len(jobID))
LU = max(LU, len(jUsr))
LH = max(LH, len(jHst))
LQ = max(LQ, len(jQue))
LE = max(LE, len(eDate))
print "%-14s%-12s%-14s%-12s%-10s" % (jobID,jUsr,eDate,jHst,jQue)
return LJ,LU,LE,LH,LQ
In principle, I should have another function like this:
def fmt_print(a,b,c,d,e):
print "%-14s%-12s%-14s%-12s%-10s\n" % (a,b,c,d,e)
to print the header and call the functions like this to print the complete result:
fmt_print('JOB ID','OWNER','E_DATE','R_HOST','QUEUE')
job_history(inFile)
My question is: how can I make fmt_print() to print both the header and the result using the values LJ,LU,LE,LH,LQ for the format spacing. the job_history() will parse a number of log files from the log-directory. The length of the field of similar type will differ from file to file and I don't wanna go static with the spacing (assuming the max length per field) for this as there gonna be lot more columns to print (than the example). Thanks in advance for your help. Cheers!!
PS. For those who know my posts: I don't have to use python v2.3 anymore. I can use even v2.6 but I want my code to be v2.4 compatible to go with RHEL5 default.
Update: 1
I had a fundamental problem in my original script. As I mentioned above that the job_history() will read the multiple files in a directory in a loop, the max_len were being calculated per file and not for the entire result. After modifying
unutbu's script a little bit and following xtofl's (if this is what it meant) suggestion, I came up with this, which seems to be working.
def job_history(f):
result=[]
for line in lines:
if line.find('Exit_status=') != -1:
....
....
global LJ,LU,LH,LQ,LE
LJ = max(LJ, len(jobID))
LU = max(LU, len(jUsr))
LH = max(LH, len(jHst))
LQ = max(LQ, len(jQue))
LE = max(LE, len(eDate))
result.append((jobID,jUsr,eDate,jHst,jQue))
return LJ,LU,LH,LQ,LE,result
# list of log files
inFiles = [ m for m in os.listdir(logDir) ]
saved_ary = []
for inFile in sorted(inFiles):
LJ,LU,LE,LH,LQ,result = job_history(inFile)
saved_ary += result
# format printing
fmt_print = "%%-%ds %%-%ds %%-%ds %%-%ds %%-%ds" % (LJ,LU,LE,LH,LQ)
print_head = fmt_print % ('Job Id','User','End Date','Exec Host','Queue')
print '%s\n%s' % (print_head, len(print_head)*'-')
for lines in saved_ary:
print fmt_print % lines
I'm sure there are lot other better ways of doing this, so suggestion(s) are welcomed. cheers!!
Update: 2
Sorry for brining up this "solved" post again. Later discovered, I was even wrong with my updated script, so I thought I'd post another update for future reference. Even though it appeared to be working, actually length_data were overwritten with the new one for every file in the loop. This works correctly now.
def job_history(f):
def get_value(j,n):
return j[n].split('=')[1]
lines = read_file(f)
for line in lines:
if "Exit_status=" in line:
nLine = line.split(';')
jobID = '.'.join(nLine[2].split('.',2)[:-1]
jData = nLine[3].split(' ')
jUsr = get_value(jData,0)
....
result.append((jobID,jUsr,...,....,...))
return result
# list of log files
inFiles = [ m for m in os.listdir(logDir) ]
saved_ary = []
LJ = 0; LU = 0; LE = 0; LH = 0; LQ = 0
for inFile in sorted(inFiles):
j_data = job_history(inFile)
saved_ary += j_data
for ix in range(len(saved_ary)):
LJ = max(LJ, len(saved_ary[ix][0]))
LU = max(LU, len(saved_ary[ix][1]))
....
# format printing
fmt_print = "%%-%ds %%-%ds %%-%ds %%-%ds %%-%ds" % (LJ,LU,LE,LH,LQ)
print_head = fmt_print % ('Job Id','User','End Date','Exec Host','Queue')
print '%s\n%s' % (print_head, len(print_head)*'-')
for lines in saved_ary:
print fmt_print % lines
The only problem is it's taking a bit of time to start printing the info on the screen, just because, I think, as it's putting all the in the array first and then printing. Is there any why can it be improved? Cheers!!
Since you don't know LJ, LU, LH, LQ, LE until the for-loop ends, you have to complete this for-loop before you print.
result=[]
for line in lines:
if line.find('Exit_status=') != -1:
...
LJ = max(LJ, len(jobID))
LU = max(LU, len(jUsr))
LH = max(LH, len(jHst))
LQ = max(LQ, len(jQue))
LE = max(LE, len(eDate))
result.append((jobID,jUsr,eDate,jHst,jQue))
fmt="%%-%ss%%-%ss%%-%ss%%-%ss%%-%ss"%(LJ,LU,LE,LH,LQ)
for jobID,jUsr,eDate,jHst,jQue in result:
print fmt % (jobID,jUsr,eDate,jHst,jQue)
The fmt line is a bit tricky. When you use string interpolation, each %s gets replaced by a number, and %% gets replaced by a single %. This prepares the correct format for the subsequent print statements.
Since the column header and column content are so closely related, why not couple them into one structure, and return an array of 'columns' from your job_history function? The task of that function would be to
output the header for each colum
create the output for each line, into the corresponding column
remember the maximum width for each column, and store it in the column struct
Then, the prinf_fmt function can 'just'
iterate over the column headers, and print them using the respective width
iterate over the 'rest' of the output, printing each cell with 'the respective width'
This design will separate output definition from actual formatting.
This is the general idea. My python is not that good; but I may think up some example code later...
Depending on how many lines are there, you could:
read everything first to figure out the maximum field lengths, then go through the lines again to actually print out the results (if you have only a handful of lines)
read one page of results at a time and figure out maximum length for the next 30 or so results (if you can handle the delay and have many lines)
don't care about the format and output in a csv or some database format instead - let the final person / actual report generator worry about importing it

Categories

Resources