parsing array contents and adding the values

parsing array contents and adding the values - python

I have several files that end in ".log". Last but three lines contain the data of interest.
Example File contents (Last four lines. fourth line is blank):
Total: 150
Success: 120
Error: 30
I am reading these contents into an array and trying to find an elegant way to:
1)extract the numeric data for each category (Total, Success, Error). Error out if numeric data is not there in the second part
2)Add them all up
I came up with the following code (getLastXLines function excluded for brevity) that returns the aggregate:
def getSummaryData(testLogFolder):
(path, dirs, files) = os.walk(testLogFolder).next()
#aggregate = [grandTotal, successTotal, errorTotal]
aggregate = [0, 0, 0]
for currentFile in files:
fullNameFile = path + "\\" + currentFile
if currentFile.endswith(".log"):
with open(fullNameFile,"r") as fH:
linesOfInterest=getLastXLines(fH, 4)
#If the file doesn't contain expected number of lines
if len(linesOfInterest) != 4:
print fullNameFile + " doesn't contain the expected summary data"
else:
for count, line in enumerate(linesOfInterest[0:-1]):
results = line.split(': ')
if len(results)==2:
aggregate[count] += int(results[1])
else:
print "error with " + fullNameFile + " data. Not adding the total"
return aggregate
Being relatively new to python, and seeing the power of it, I feel there may be a more powerful and efficient way to do this. May be there is a short list comprehension to do this kind of stuff? Please help.

def getSummaryData(testLogFolder):
summary = {'Total':0, 'Success':0, 'Error':0}
(path, dirs, files) = os.walk(testLogFolder).next()
for currentFile in files:
fullNameFile = path + "\\" + currentFile
if currentFile.endswith(".log"):
with open(fullNameFile,"r") as fH:
for pair in [line.split(':') for line in fH.read().split('\n')[-5:-2]]:
try:
summary[pair[0].strip()] += int(pair[1].strip())
except ValueError:
print pair[1] + ' is not a number'
except KeyError:
print pair[0] + ' is not "Total", "Success", or "Error"'
return summary
Piece by peice:
fH.read().split('\n')[-5:-2]
Here we take the last 4 lines except the very last of the file
line.split(':') for line in
From those lines, we break by the colon
try:
summary[pair[0].strip()] += int(pair[1].strip())
Now we try to get a number from the second, and a key from the first and add to our total
except ValueError:
print pair[1] + ' is not a number'
except KeyError:
print pair[0] + ' is not "Total", "Success", or "Error"'
And if we find something that isn't a number, or a key that isn't what we are looking for, we print an error

Related

Python strange "string indices must be integers" error

Problem solved! was newfilename[0,3] instead of newfilename[0: 3]
I know this question has been asked before and I have look around on all the answers and the types of problems people have been having related to this error message, but was unable to find anyone with the same type of problem.
I am sowing the whole method just in case. So here is my problem;
When I am trying to get is a substring of "newfilename" using newfilename[int, int] and the compiler keeps thinking I don't have an integer there when I do, at least from my checking I do.
What I'm doing with this code: I am cutting of the end of a filename such as 'foo.txt' to get 'foo' that is saved as newfilename. Then I am adding the number (converted to a string) to the end of it to get 'foo 1' and after that adding back the '.txt' to get the final result of 'foo 1.txt'. The problem occurs when I try to get the substring out and delete the last four characters of the filename to get just 'foo'. After that, I do another check to see if there is a file like that still in the folder and if so I do another set of cutting and pasting to add 1 to the previous file. To be honest, I have not tested of the while loop will work I just thought it should work technically, but my code does not reach that far because of this error lol.
My error:
File "C:/Users/Reaper/IdeaProjects/Curch Rec Managment/Setup.py", line 243, in moveFiles
print(newfilename[0, 3])
TypeError: string indices must be integers
NOTE this error is from when I tried to hard code the numbers it to see if it would work
Here is the current error with the hard code commented out:
newfilename = newfilename[0, int(newfilename.__len__() - 4)] + " 1.m4a"
TypeError: string indices must be integers
What I have tried: I have tried hard coding the numbers is by literally typing in newfilename[0, 7] and still got the same error. I have tried doing this in a separate python file and it seems to work there fine. Also, what is really confusing me is that it works in another part of my program just fine as shown here:
nyear = str(input("Enter new Year: "))
if nyear[0:2] != "20" or nyear.__len__() > 4:
print("Sorry incorrect year. Please try again")
So I have been at it for a while now trying to figure out what in the world is going on and can't get there. Decided I would sleep on it but would post the question just in case. If someone could point out what may be wrong that would be awesome! Or tell me the compilers are just being stupid, well I guess that will do as well.
My function code
def moveFiles(pathList, source, filenameList):
# moves files to new location
# counter keeps track of file name position in list
cnter = 0
for x in pathList:
filename = filenameList[cnter]
#print(x + "\\" + filename)
# new filename
if filename.find("PR") == 0:
newfilename = filename[3:filename.__len__()]
else:
newfilename = filename[2:filename.__len__()]
# checking if file exists and adding numbers to the end if it does
if os.path.isfile(x + "\\" + newfilename):
print("File Name exists!!")
# adding a 1 to the end
print(newfilename)
# PROBLEM ON NEXT TWO LINES, also prob. on any line with the following calls
print(newfilename[0, 3])
newfilename = newfilename[0, int(newfilename.__len__() - 4)] + " 1.m4a"
print("Adding 1:", newfilename)
# once again check if the file exists and adding 1 to the last number
while os.path.isfile(x + "\\" + newfilename):
# me testing if maybe i just can't have math operations withing the substring call
print("File exists again!!")
num = newfilename.__len__() - 6
num2 = newfilename.__len__() - 4
num3 = int(newfilename[num, num2])
num = newfilename.__len__() - 5
newfilename = newfilename[0, num] + str(num3 + 1)
print("Adding 1:", newfilename)
# moving file and deleting prefix
if not os.path.isdir(x):
os.makedirs(x)
os.rename(source + "\\" + filename, x + "\\" + newfilename)
cnter += 1

I think you need this:
print(newfilename[0:3])

The String is Not Read Fully

I wrote a programme to generate a string of number, consisting of 0,1,2,and 3 with the length s and write the output in decode.txt file. Below is the code :
import numpy as np
n_one =int(input('Insert the amount of 1: '))
n_two =int(input('Insert the amount of 2: '))
n_three = int(input('Insert the amount of 3: '))
l = n_one+n_two+n_three
n_zero = l+1
s = (2*(n_zero))-1
data = [0]*n_zero + [1]*n_one + [2]*n_two + [3]*n_three
print ("Data string length is %d" % len(data))
while data[0] == 0 and data[s-1]!=0:
np.random.shuffle(data)
datastring = ''.join(map(str, data))
datastring = str(int(datastring))
files = open('decode.txt', 'w')
files.write(datastring)
files.close()
print("Data string is : %s " % datastring)
The problem occur when I try to read the file from another program, the program don't call the last value of the string.
For example, if the string generated is 30112030000 , the other program will only call 3011203000, means the last 0 is not called.
But if I key in 30112030000 directly to the .txt file, all value is read. I can't figure out where is wrong in my code.
Thank you

Some programs might not like the fact that the file doesn't end with a newline. Try adding files.write('\n') before you close it.

Python - splitting lines in txt file by semicolon in order to extract a text title...except sometimes the title has semicolons in it

So, I have an extremely inefficient way to do this that works, which I'll show, as it will help illustrate the problem more clearly. I'm an absolute beginner in python and this is definitely not "the python way" nor "remotely sane."
I have a .txt file where each line contains information about a large number of .csv files, following format:
File; Title; Units; Frequency; Seasonal Adjustment; Last Updated
(first entry:)
0\00XALCATM086NEST.csv;Harmonized Index of Consumer Prices: Overall Index Excluding Alcohol and Tobacco for AustriaÂ©; Index 2005=100; M; NSA; 2015-08-24
and so on, repeats like this for a while. For anyone interested, this is the St.Louis Fed (FRED) data.
I want to rename each file (currently named the alphanumeric code # the start, 00XA etc), to the text name. So, just split by semicolon, right? Except, sometimes, the text title has semicolons within it (and I want all of the text).
So I did:
data_file_data_directory = 'C:\*****\Downloads\FRED2_csv_3\FRED2_csv_2'
rename_data_file_name = 'README_SERIES_ID_SORT.txt'
rename_data_file = open(data_file_data_directory + '\\' + rename_data_file_name)
for line in rename_data_file.readlines():
data = line.split(';')
if len(data) > 2 and data[0].rstrip().lstrip() != 'File':
original_file_name = data[0]
These last 2 lines deal with the fact that there is some introductory text that we want to skip, and we don't want to rename based on the legend # the top (!= 'File'). It saves the 00XAL__.csv as the oldname. It may be possible to make this more elegant (I would appreciate the tips), but it's the next part (the new, text name) that gets really ugly.
if len(data) ==6:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==7:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==8:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==9:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==10:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[5][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
(etc)
What I'm doing here is that there is no way to know for each line in the .csv how many items are in the list created by splitting it by semicolons. Ideally, the list would be length 6 - as follows the key # the top of my example of the data. However, for every semicolon in the text name, the length increases by 1...and we want everything before the last four items in the list (counting backwards from the right: date, seasonal adjustment, frequency, units/index) but after the .csv code (this is just another way of saying, I want the text "title" - everything for each line after .csv but before units/index).
Really what I want is just a way to save the entirety of the text name as "new_name" for each line, even after I split each line by semicolon, when I have no idea how many semicolons are in each text name or the line as a whole. The above code achieves this, but OMG, this can't be the right way to do this.
Please let me know if it's unclear or if I can provide more info.

Converting/concatenating integer to strying with python

I'm trying to read the last line from a text file. Each line starts with a number, so the next time something is inserted, the new number will be incremented by 1.
For example, this would be a typical file
1. Something here date
2. Something else here date
#next entry would be "3. something date"
If the file is blank I can enter an entry with no problem. However, when there are already entries I get the following error
LastItemNum = lineList[-1][0:1] +1 #finds the last item's number
TypeError: cannon concatenate 'str' and 'int objects
Here's my code for the function
def AddToDo(self):
FILE = open(ToDo.filename,"a+") #open file for appending and reading
FileLines = FILE.readlines() #read the lines in the file
if os.path.getsize("EnteredInfo.dat") == 0: #if there is nothing, set the number to 1
LastItemNum = "1"
else:
LastItemNum = FileLines[-1][0:1] + 1 #finds the last items number
FILE.writelines(LastItemNum + ". " + self.Info + " " + str(datetime.datetime.now()) + '\n')
FILE.close()
I tried to convert LastItemNum to a string but I get the same "cannot concatenate" error.

LastItemNum = int(lineList[-1][0:1]) +1
then you've to convert LastItemNum back to string before writing to file, using :
LastItemNum=str(LastItemNum) or instead of this you can use string formatting.

Python - Output creating blank spaces

I'm encountering a very strange issue in a python script I've written. This is the piece of code that is producing abnormal results:
EDIT: I've included the entire loop in the code segment now.
data = open(datafile,'r')
outERROR = open(outERRORfile,'w')
precision=[0]
scale=[0]
lines = data.readlines()
limit = 0
if filetype == 'd':
for line in lines:
limit += 1
if limit > checklimit:
break
columns = line.split(fieldDelimiter)
for i in range(len(columns) - len(precision)):
precision.append(0)
for i in range(len(columns) - len(scale)):
scale.append(0)
if len(datatype) != len(precision):
sys.exit() #Exits the script if the number of data types (fields found in the DDL file) doesn't match the number of columns found in the data file
i = -1
for eachcolumn in columns:
i += 1
if len(rstrip(columns[i])) > precision[i]:
precision[i] = len(rstrip(columns[i]))
if columns[i].find('.') != -1 and (len(rstrip(columns[i])) - rstrip(columns[i]).find('.')) > scale[i]:
scale[i] = len(rstrip(columns[i])) - rstrip(columns[i]).find('.') -1
if datatype[i][0:7] == 'integer':
if int(columns[i]) < -2147483648 or int(columns[i]) > 2147483647:
outERROR.write("Integer value too high or too low to fit inside Integer data type, column: " + str(i + 1) + ", value: " + columns[i] + "\n")
if datatype[i][0:9] == 'smallint':
if int(columns[i]) < -32768 or int(columns[i]) > 32767:
outERROR.write("Smallint value too high or too low to fit inside Smallint data type, column: " + str(i + 1) + ", value: " + columns[i] + "\n")
if datatype[i][0:7] == 'byteint':
if int(columns[i]) < -128 or int(columns[i]) > 127:
outERROR.write("Byteint value too high or too low to fit inside Byteint data type, column: " + str(i + 1) + ", value: " + columns[i] + "\n")
if datatype[i][0:4] == 'date':
if DateParse(columns[i],format1[i]) > -1:
pass
elif DateParse(columns[i],format2[i]) > -1:
pass
elif DateParse(columns[i],format3[i]) > -1:
pass
else:
outERROR.write('Date format error, column: ' + str(i + 1) + ', value: ' + columns[i])
if datatype[i][0:9] == 'timestamp':
if DateParse(columns[i],timestamp1[i]) > -1:
pass
elif DateParse(columns[i],timestamp2[i]) > -1:
pass
elif DateParse(columns[i],timestamp3[i]) > -1:
pass
else:
outERROR.write('Timestamp format error, column: ' + str(i + 1) + ', value: ' + columns[i] + '\n')
if (datatype[i][0:7] == 'decimal'
or datatype[i][0:7] == 'integer'
or datatype[i][0:7] == 'byteint'
or datatype[i][0:5] == 'float'
or datatype[i][0:8] == 'smallint'):
try:
y = float(columns[i])
except ValueError:
outERROR.write('Character found in numeric data type, column: ' + str(i + 1) + ', value: ' + columns[i] + "\n")
else:
pass
This is part of a loop that reads a data file, basically its checking the type of the data (to determine if its supposed to be a numeric type) and 'trying' to turn it into a float to see if its actually numeric data in the data file. If its not numeric data it outputs the error you see above to a text file (defined currently as outERROR). Now when I wrote this and tested it on a small data file (4 lines) it worked fine, but when I run this on a larger file (several thousand rows) my error file is suddenly filling with a bunch of blank spaces, and only a few of the error messages are being created.
Here is what the error file looks like when I run the script with 4 rows:
Character found in numeric data type, column: 6, value: 24710a35
Character found in numeric data type, column: 7, value: 0a04
Character found in numeric data type, column: 8, value: 0a02
Character found in numeric data type, column: 6, value: 56688a12
Character found in numeric data type, column: 7, value: 0a09
Character found in numeric data type, column: 8, value: 0a06
Character found in numeric data type, column: 6, value: 12301a04
Character found in numeric data type, column: 7, value: 0a10
Character found in numeric data type, column: 8, value: 0a02
Character found in numeric data type, column: 6, value: 25816a56
Character found in numeric data type, column: 7, value: 0a09
Character found in numeric data type, column: 8, value: 0a06
This is the expected output.
When I run it on larger files, I start to get blank spaces at the top of the error file, and only the last 40-50 or so error writes actually get output as text in the file. The larger the file, the more blank spaces it outputs. I'm completely lost on this, I've read some of the other questions regarding mysterious blank lines and spaces on stackoverflow.com here but they dont seem to address my issue.
EDIT: outERROR is the name I've given to the error file that the output is writing to. It is a simple .txt file.
This is a sample of the data file:
257|1463|64|1|7|9551a22|0a05|0a02|N|O|1998-06-18|1998-05-15|1998-06-27|COLLECT COD|FOB|ackages sleep bold realmsa f|
258|1062|68|1|8|7704a48|0a00|0a07|R|F|1994-01-20|1994-03-21|1994-02-09|NONE|REG AIR|ully about the fluffily silent dependencies|
258|1962|95|2|40|74558a40|0a10|0a01|A|F|1994-03-13|1994-02-23|1994-04-05|DELIVER IN PERSON|FOB|silent frets nod daringly busy, bold|
258|1618|19|3|45|68382a45|0a07|0a07|R|F|1994-03-04|1994-02-13|1994-03-30|DELIVER IN PERSON|TRUCK|regular excuses-- fluffily ruthl|
Specifically the columns that are causing output to the error file are:
|9551a22|0a05|0a02|
|7704a48|0a00|0a07|
|74558a40|0a10|0a01|
|68382a45|0a07|0a07|
So each line should cause 3 writes to the error file, specifying these values. It works fine for a small number of lines, but when it reads a large number of lines I start getting these mysterious blank spaces. This problem occurs only when I have numeric fields that contain characters.

At a guess, perhaps you have control characters in the input stream that cause some unexpected behaviour. No sure exactly what outERROR is in the above context, but you can imagine, for example, that a form feed character in the input could have this sort of effect.
Try cleaning the data of non-printable characters first and see if that helps.

Call open with 'rb' and 'wb' to ensure binary mode, otherwise the data can be altered by the system trying to mess with line endings

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

parsing array contents and adding the values - python

Related

Python strange "string indices must be integers" error

The String is Not Read Fully

Python - splitting lines in txt file by semicolon in order to extract a text title...except sometimes the title has semicolons in it

Converting/concatenating integer to strying with python

Python - Output creating blank spaces

Categories

Resources