Data Formatting within a txt. File - python

I have the following txt file that needs to be formatted with specific start and end positions for data throughout the file. For instance, column 1 is blank and will be read as an entry number. The values for this data type is a numeric 9 and should have the following positions (1-9). Next is employee ID with positions (10-15).. and so on. Values do not need a delimiter.
,MB4858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,MD6535,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,PM7858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,RM0111,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,RY2585,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,TM0617 ,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,VE2495,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,VJ8913,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,FJ4815 ,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,OM0188,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225D,,,DF2016,CA4310,,0172CA,,,,,Y,
,H00858,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H08392,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H15624,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H27573,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H40249,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H44581,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H48473,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H51570,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H55768,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H64315,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H71507,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H72248,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H78527,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H90393,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,
,H95973,01,1,CA,07/18/20,0,0,4.8,,,,,,14.77,,Y,2225DH,,,DF2016,CA4311,,0172CA,,,,,Y,

You can try starting here:
import sys
inFile = sys.argv[1]
outFile = "newFile.txt"
with open(inFile, 'r') as inf, open(outFile, 'w') as outf:
for line in inf:
line = line.split(',')
print(line)
Where sys argv[1] is the name of your txt file when you run the python script from the command line.
You can see that it will print out a list containing individual strings between the comma delimiters you have in your txt data file. From there you can do list manipulations to format the data. And then write it to outf like so (example):
# do what ever manipulations here to the output line
output_line = line[0] + " " + line[1]
outf.write(output_line)
outf.write('\n'

Related

How to parse this particular console output and make csv?

I need to process console output which looks like this and make a csv from it:
ID,FLAG,ADDRESS,MAC-ADDRESS,HOST-NAME,SERVER,STATUS,LAST-SEEN
0 10.0.0.11 00:1D:72:29:F2:4F lan waiting never
;;; test comment
1 10.0.0.19 00:13:21:15:D4:00 lan waiting never
2 10.0.0.10 00:60:6E:05:0C:E0 lan waiting never
3 D 10.0.1.199 24:E9:B3:20:FA:C7 home server1 bound 4h54m52s
4 D 100.64.1.197 E6:17:AE:21:EA:00 Suzana-s-A51 dhcp1 bound 2h16m45s
I have managed to split lines but regex is not working for tabs and spaces. Can someone point me in the right direction?
The code I am using is this:
import csv
import re
# Open the input file in read-only mode
with open('output.txt', 'r') as input_file:
# Open the output file in write-only mode
with open('output.csv', 'w') as output_file:
# Create a CSV writer that will write to the output file
writer = csv.writer(output_file)
# Read the first line of the input file (the header)
# and write it to the output file as a single value
# (i.e. do not split it on commas)
header = input_file.readline()
writer.writerow([header.strip()])
# Iterate over the remaining lines of the input file
for line in input_file:
# Ignore lines that start with ";;;" (these are comments)
if line.startswith(';;;'):
continue
# Split the line on newlines
values = line.split('\n')
line = re.sub(r'[\t ]+', ',', line)
# Iterate over the resulting values
for i, value in enumerate(values):
# If the value contains a comma, split it on commas
# and assign the resulting values to the `values` list
if ',' in value:
values[i:i+1] = value.split(',')
# Write the values to the output file
writer.writerow(values)
The regular expression can be handy here, make a mask, and then take each value from line read.
you can refer the regex and will give you great visuals.
so for each line will put a regex reg_format=r"(\d*?)(?:\s+)(.*?)(?:\s)(?:\s*?)(\w*\.\w*\.\w*\.\w*)(?:\s*)(\w*?:\w*?:\w*?:\w*?:\w*?:\w*)(?:\s*)(\w*)(?:\s*)(\w*)(?:\s*)(\w*)"
pls note that when we write to csv using writer.writerow it expects a list.
following would work for you, and you can tweak it as needed.
tweaked your code, and added the comments
Update:
Added masking for records
import csv
import re
#reg_format=r"(\d*?)(?:\s+)(.*?)(?:\s)(?:\s*?)(\w*\.\w*\.\w*\.\w*)(?:\s*)(\w*?:\w*?:\w*?:\w*?:\w*?:\w*)(?:\s*)(\w*)(?:\s*)(\w*)(?:\s*)(\w*)"
all_fields=r"(\d*?)(?:\s+)(.*?)(?:\s)(?:\s*?)(\w*\.\w*\.\w*\.\w*)(?:\s*)(\w*?:\w*?:\w*?:\w*?:\w*?:\w*)(?:\s{1,2})([\w-]{1,14})(?:\s*?)(\w+)(?:\s*)(\w+)(?:\s*)(\w*)(?:\s*)(\w*)"
all_fields_minus_host=r"(\d*?)(?:\s+)(.*?)(?:\s)(?:\s*?)(\w*\.\w*\.\w*\.\w*)(?:\s*)(\w*?:\w*?:\w*?:\w*?:\w*?:\w*)(?:\s{1,})([\w-]{1,14})(?:\s*?)(\w+)(?:\s*)(\w+)(?:\s*)(\w*)(?:\s*)(\w*)"
# Open the input file in read-only mode
with open('testreg.txt', 'r') as input_file:
# Open the output file in write-only mode
with open('output.csv', 'w') as output_file:
# Create a CSV writer that will write to the output file
writer = csv.writer(output_file)
# Read the first line of the input file (the header)
# and write it to the output file as a single value
# (i.e. do not split it on commas)
header = input_file.readline()
writer.writerow(header.split(',')) # split by "," as write row need list
#writer.writerow([header.strip()])
# Iterate over the remaining lines of the input file
for line in input_file:
# Ignore lines that start with ";;;" (these are comments)
if line.startswith(';;;'):
continue
#print(line)
gps=re.findall(all_fields,line)
if gps:
line_write=(['"'+gp+'"' for gp in list(gps[0])]) # if dont need quotes, put like gp for gp in list(gps[0])]
writer.writerow(line_write[:-1])
else:
gps=re.findall(all_fields_minus_host,line)
line_write=(['"'+gp+'"' for gp in list(gps[0])]) # if dont need quotes, put like gp for gp in list(gps[0])]
line_write.insert(4,'""')
writer.writerow(line_write[:-2])
#writer.writerow(line_write)
# commented below line
'''
# Split the line on newlines
values = line.split('\n')
line = re.sub(r'[\t ]+', ',', line)
# Iterate over the resulting values
for i, value in enumerate(values):
# If the value contains a comma, split it on commas
# and assign the resulting values to the `values` list
if ',' in value:
values[i:i+1] = value.split(',')
# Write the values to the output file
#writer.writerow(values)
'''

How do I split each line into two strings and print without the comma?

I'm trying to have output to be without commas, and separate each line into two strings and print them.
My code so far yields:
173,70
134,63
122,61
140,68
201,75
222,78
183,71
144,69
But i'd like it to print it out without the comma and the values on each line separated as strings.
if __name__ == '__main__':
# Complete main section of code
file_name = "data.txt"
# Open the file for reading here
my_file = open('data.txt')
lines = my_file.read()
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
print(lines)
In your sample code, line contains the full content of the file as a str.
my_file = open('data.txt')
lines = my_file.read()
You then later re-open the file to iterate the lines:
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
Note, however, str.split and str.replace do not modify the existing value, as strs in python are immutable. Also note you are operating on lines there, rather than the for-loop variable line.
Instead, you'll need to assign the result of those functions into new values, or give them as arguments (E.g., to print). So you'll want to open the file, iterate over the lines and print the value with the "," replaced with a " ":
with open("data.txt") as f:
for line in f:
print(line.replace(",", " "))
Or, since you are operating on the whole file anyway:
with open("data.txt") as f:
print(f.read().replace(",", " "))
Or, as your file appears to be CSV content, you may wish to use the csv module from the standard library instead:
import csv
with open("data.txt", newline="") as csvfile:
for row in csv.reader(csvfile):
print(*row)
with open('data.txt', 'r') as f:
for line in f:
for value in line.split(','):
print(value)
while python can offer us several ways to open files this is the prefered one for working with files. becuase we are opening the file in lazy mode (this is the prefered one espicialy for large files), and after exiting the with scope (identation block) the file io will be closed automaticly by the system.
here we are openening the file in read mode. files folow the iterator polices, so we can iterrate over them like lists. each line is a true line in the file and is a string type.
After getting the line, in line variable, we split (see str.split()) the line into 2 tokens, one before the comma and the other after the comma. split return new constructed list of strings. if you need to omit some unwanted characters you can use the str.strip() method. usualy strip and split combined together.
elegant and efficient file reading - method 1
with open("data.txt", 'r') as io:
for line in io:
sl=io.split(',') # now sl is a list of strings.
print("{} {}".format(sl[0],sl[1])) #now we use the format, for printing the results on the screen.
non elegant, but efficient file reading - method 2
fp = open("data.txt", 'r')
line = None
while (line=fp.readline()) != '': #when line become empty string, EOF have been reached. the end of file!
sl=line.split(',')
print("{} {}".format(sl[0],sl[1]))

Removing quotes when writing list items to CSV

What I am trying to do is remove the quotes while writing the data to a new CSV file.
I have tried using s.splits, and .replaces with no luck. Can you guys point me in the right direction?
Current Code:
def createParam():
with open('testcsv.csv', 'r') as f:
reader = csv.reader(f)
csvList = list(reader)
for item in csvList:
os.mkdir(r"C:\Users\jefhill\Desktop\Test Path\\" + item[0])
with open(r"C:\Users\jefhill\Desktop\Test Path\\" + item[0] + r"\prm.263", "w+") as f:
csv.writer(f).writerow(item[1:])
f.close
Data within testcsv.csv:
0116,"139,data1"
0123,"139,data2"
0130,"35,data678"
Data output when script is ran (in each individual file):
"139,data1"
"139,data2"
"35,data678"
Data I would like:
139,data1
139,data2
35,data678
You can use str.replace to replace the " (double quotes) with '' (null).
Then split and print all but first item in the list.
with open('outputfile.csv', w) as outfile: # open the result file to be written
with open('testcsv.csv', 'r') as infile: # open the input file
for line in infile: # iterate through each line in input file
newline = line.replace('"', '') # replace double quotes with no space
outfile.write(newline.split(',',maxsplit=1)[1]) # write second element to output file after splitting the newline once
You don't need f.close() when you use with open...

How to read quoted string from File and write it without quotes?

I am trying to write a python script to convert rows in a file to json output, where each line contains a json blob.
My code so far is:
with open( "/Users/me/tmp/events.txt" ) as f:
content = f.readlines()
# strip to remove newlines
lines = [x.strip() for x in content]
i = 1
for line in lines:
filename = "input" + str(i) + ".json"
i += 1
f = open(filename, "w")
f.write(line)
f.close()
However, I am running into an issue where if I have an entry in the file that is quoted, for example:
client:"mac"
This will be output as:
"client:""mac"""
Using a second strip on writing to file will give:
client:""mac
But I want to see:
client:"mac"
Is there any way to force Python to read text in the format ' "something" ' without appending extra quotes around it?
Instead of creating an auxiliary list to strip the newline from content, just open the input and output files at the same time. Write to the output file as you iterate through the lines of the input and stripping whatever you deem necessary. Try something like this:
with open('events.txt', 'rb') as infile, open('input1.json', 'wb') as outfile:
for line in infile:
line = line.strip('"')
outfile.write(line)

python - extract a part of each line from text file

I have a test.txt file that contains:
yellow.blue.purple.green
red.blue.red.purple
And i'd like to have on output.txt just the second and the third part of each line, like this:
blue.purple
blue.red
Here is python code:
with open ('test.txt', 'r') as file1, open('output.txt', 'w') as file2:
for line in file1:
file2.write(line.partition('.')[2]+ '\n')
but the result is:
blue.purple.green
blue.red.purple
How is possible take only the second and third part of each line?
Thanks
You may want
with open('test.txt', 'r') as file1, open('output.txt', 'w') as file2:
for line in file1:
file2.write(".".join(line.split('.')[1:3])+ '\n')
When you apply split('.') to the line e.g. yellow.blue.purple.green, you get a list of values
["yellow", "blue", "purple", "green"]
By slicing [1:3], you get the second and third items.
First I created a .txt file that had the same data that you entered in your original .txt file. I used the 'w' mode to write the data as you already know. I create an empty list as well that we will use to store the data, and later write to the output.txt file.
output_write = []
with open('test.txt', 'w') as file_object:
file_object.write('yellow.' + 'blue.' + 'purple.' + 'green')
file_object.write('\nred.' + 'blue.' + 'red.' + 'purple')
Next I opened the text file that I created and used 'r' mode to read the data on the file, as you already know. In order to get the output that you wanted, I read each line of the file in a for loop, and for each line I split the line to create a list of the items in the line. I split them based on the period in between each item ('.'). Next you want to get the second and third items in the lines, so I create a variable that stores index[1] and index[2] called new_read. After we will have the two pieces of data that you want and you'll likely want to write to your output file. I store this data in a variable called output_data. Lastly I append the output data to the empty list that we created earlier.
with open ('test.txt', 'r') as file_object:
for line in file_object:
read = line.split('.')
new_read = read[1:3]
output_data = (new_read[0] + '.' + new_read[1])
output_write.append(output_data)
Lastly we can write this data to a file called 'output.txt' as you noted earlier.
with open('output.txt', 'w') as file_object:
file_object.write(output_write[0])
file_object.write('\n' + output_write[1])
print(output_write[0])
print(output_write[1])
Lastly I print the data just to check the output:
blue.purple
blue.red

Categories

Resources