I have a script which predicts product names from input files. The code is as follows:
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as myFile:
reader = csv.reader(myFile)
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
print(filenamez, ent.label_, ent.text)
break
except Exception as e:`
which gives me output in the form of a stringas:
07-09-18 N021024s16PASBUNDLEACK - Acknowledgement P.txt PRODUCT ABC1
06-22-18 Letter from Supl.txt PRODUCT ABC2
06-22-18 Letter from Req to Change .txt PRODUCT ABC3
Now I want to export all these details to a csv with 2 columns, one column as FILENAME and one column with PRODUCT having all filenames and product names under the respective column names. All product names start with PRODUCT and then the name in the string. How can I solve this:
Output csv should look like:
Filename PRODUCT
07-09-18 Acknowledgement P.txt ABC1
06-22-18 Letter Req to Change.txt ABC2
You can make a csv.writer to write each row to the output file, using writerow instead of printing to the screen.
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as input_file, \
open('output.csv', 'w') as output_file:
reader = csv.reader(input_file)
writer = csv.writer(output_file)
writer.writerow(["Filename", "Product"]) # this is the header row
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
writer.writerow([filenamez, ent.text])
break
I'm assuming here that filenamez and ent.text contain the information you want in each column. If that's not the case then you can manipulate them to get what you need before writing to the CSV.
There are many ways you can achieve this. One that I prefer is by using Pandas, which is a powerful library to work with CSV files.
You can create a dictionary:
predicted_products = {'FILENAME': [], 'PRODUCT': []}
and iteratively append filenames and products to the corresponding lists.
After that is done, convert predicted_products to a DataFrame, and call to_csv function:
import Pandas as pd
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')
I prefer this way, since you can edit data easier before you save the file.
To your existing code, I suppose that print(filenamez, ent.label_, ent.text) prints the output. If so then:
import Pandas as pd
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
predicted_products = {'FILENAME': [], 'PRODUCT': []}
with open('eng_productnames.csv', newline='') as myFile:
reader = csv.reader(myFile)
for rowz in reader:
try:
filenamez = rowz[1]
file = open(DIR+filenamez, "r", encoding ='utf-8')
filecontentszz = file.read()
for s in filecontentszz:
filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
#filecontents = filecontents.encode().decode('unicode-escape')
filecontentszz = ''.join([line.lower() for line in filecontentszz])
doc2 = nlp2(filecontentszz)
for ent in doc2.ents:
print(filenamez, ent.label_, ent.text)
predicted_products['FILENAME'].append(filenamez + ' ' + ent.label_)
predicted_products['PRODUCT'].append(ent.text)
break
except Exception as e:
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')
Related
I have multiple text files which I need to convert to json files. For each text file I want an individual json file.
Text file content
File-1.txt
['education~25,850,103,23', 'experience~28,94,107,27', 'skills~29,904,59,27']
File-2.txt
['introduction~211,143,87,13', 'education~169,302,131,17', 'skills~322,421,84,15', 'experience~325,142,112,14', 'reference~320,699,68,14']
and so on ...
The expected output is a json file which contains:
Keyword(Class name)
Values(Coordinates)
This is what I tried with this code I was able to write data into txt--
with open(PATH_TO_RESULTS + '/' + os.path.join(os.path.basename(os.path.dirname(image_path))) + '.txt', 'w') as f:
image_name = os.path.splitext(os.path.basename(image_path))[0]
# f.write((image_path + '|'))
req_fields = []
for key, value in field_item.items():
#print("=====================")
# print(key)
# print((scores[0, index]))
# print(value)
# print("==================")
merge = str(key.decode('utf-8')) + '~' + str(value)
req_fields.append(merge)
f.write(str(req_fields))
print("#######################Required Fields###########################",req_fields)
And one more thing, the json file name should also be the same as txt file name.
I think that it's what you need. Or at least so close.
You can improve and adapt it(Best naming)
import glob, os
import json
os.chdir(".")
def read_file(file):
with open(file, 'r') as file:
return file.read()
def write_json(file, data):
with open(file, 'w') as fout:
json.dump(data, fout, indent=4)
for file in glob.glob("*.txt"):
content = read_file(file)
to_parse_in_rows = content.replace('[', '').replace(']', '').split(', ')
rows = []
for part in to_parse_in_rows:
field12, field3, field4, field5 = part.replace("'", '').split(',')
field1, field2 = field12.split('~')
row = {
'class': field1,
'field2': int(field2),
'field3': int(field3),
'field4': int(field4),
'field5': int(field5)
}
rows.append(row)
write_json(file.replace('.txt', '.json'), rows)
I don't understand why Python don't save results correctly, while it prints it correct. Code look like this:
import csv
with open("dataset_1.csv", "r") as WBI:
data = csv.reader(WBI, delimiter = ";")
data = list(data)
header = data[0]
data = data[1:]
WaterBandIndex = []
for row in data:
WaterBandIndex.append(float(row[54])/float(row[83]))
print (WaterBandIndex)
with open("WBI.csv", "w+") as WBI:
csvwriter = csv.writer(WaterBandIndex, delimiter = "|", lineterminator = "\n")
csvwriter.writerows(WaterBandIndex)
Printed results are correct, but saves to csv nothing.
I'm green in programming.
The code should work if your variable WaterBandIndex is not empty.
import csv
WaterBandIndex = ['1','2','3']
with open("WBI.csv", "w") as f:
csvwriter = csv.writer(f , delimiter = "|", lineterminator = "\n")
csvwriter.writerows(WaterBandIndex)
I'm a bit new to Python and I am trying to simplify my existing code.
Right now, I have the code repeated 5 times with different strings. I'd like to have the code one time and have it run through a list of strings.
Currently what I have:
def wiScanFormat():
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
MAC = data.replace("Address:", "\nAddress, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(MAC)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
SSID = data.replace("ESSID:", "\nESSID, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(SSID)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
FREQ = data.replace("Frequency:", "\nFrequency, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(FREQ)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
QUAL = data.replace("Quality", "\nQuality, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(QUAL)
File.close()
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
SIG = data.replace("Signal level", "\nSignal Level, ")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(SIG)
File.close()
What I'd like to have:
ORG = ['Address:', 'ESSID:'...etc]
NEW = ['\nAddress, ' , '\nESSID, ' , ... etc]
and run that through:
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
ID = data.replace("ORG", "NEW")
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(ID)
File.close()
I've tried running exactly what I put up, but it does not seem to format it the way I need to.
The output from above looks like:
Cell 46 - Address: xx:xx:xx:xx:xx:xx ESSID:"MySSID" Frequency:2.412 GHz (Channel 1) Quality=47/100 Signal level=48/100 Quality=47/100 Signal level=48/100
But it is supposed to look like this (And it does when I run that same block over the strings separately):
xx:xx:xx:xx:xx:xx MySSID 5.18 GHz (Channel 36) 0.81 0.99
How should I go about looping this block of code through my list of strings?
There two strings that I would need for the find and replace, old and new, so they would have to work together. These lists will be the same size, obviously, and I need them to be in the correct order. Address with address, ESSID with ESSID, etc.
Thanks in advance!
Try something like this:
ORG = ['Address:', 'ESSID:'...etc]
NEW = ['\nAddress, ' , '\nESSID, ' , ... etc]
File = open("/home/pi/gpsMaster/WiScan.txt", "r")
data = File.read()
File.close()
for org, new in zip(ORG, NEW):
data = data.replace(org, new)
File = open("/home/pi/gpsMaster/WiScan.txt", "w")
File.write(data)
File.close()
(Note the way zip works: https://docs.python.org/2/library/functions.html#zip)
If I am reading your question right, you are opening the same file, making a small alteration, saving it, and then closing it again, five times. You could just open it once, make all the alterations, and then save it. For instance, like this:
filename = "/home/pi/gpsMaster/WiScan.txt"
with open(filename, 'r') as fin:
data = fin.read()
data = data.replace("Address:", "\nAddress, ")
data = data.replace("ESSID:", "\nESSID, ")
data = data.replace("Frequency:", "\nFrequency, ")
data = data.replace("Quality", "\nQuality, ")
data = data.replace("Signal level", "\nSignal Level, ")
with open(filename, 'w') as fout:
fout.write(data)
If you want to use lists (ORG and NEW) for your replacements, you could do this:
with open(filename, 'r') as fin:
data = fin.read()
for o,n in zip(ORG, NEW):
data = data.replace(o,n)
with open(filename, 'w') as fout:
fout.write(data)
Given your ORG and NEW, the simplest way to do this would be something like:
# Open once for both read and write; use with statement for guaranteed close at end of block
with open("/home/pi/gpsMaster/WiScan.txt", "r+") as f:
data = f.read() # Slurp file
f.seek(0) # Seek back to beginning of file
# Perform all replacements
for orig, repl in zip(ORG, NEW):
data = data.replace(orig, repl)
f.write(data) # Write new data over old
f.truncate() # If replacement shrunk file, truncate extra
You could just do this:
def wiScanFormat(path = "/home/pi/gpsMaster/WiScan.txt"):
# List of tuples with strings to find and strings to replace with
replacestr = [
("Address:", "\nAddress, "),
("ESSID:", "\nESSID, "),
("Frequency:", "\nFrequency, "),
("Quality", "\nQuality, "),
("Signal level", "\nSignal Level, ")
]
with open(path, "r") as file: # Open a file
data = file.read()
formated = data
for i in replacestr: # Loop over each element (tuple) in the list
formated = formated.replace(i[0], i[1]) # Replace the data
with open(path, "w") as file:
written = file.write(formated) # Write the data
return written
I have an application that works. But in the interest of attempting to understand functions and python better. I am trying to split it out into various functions.
I"m stuck on the file_IO function. I'm sure the reason it does not work is because the main part of the application does not understand reader or writer. To better explain. Here is a full copy of the application.
Also I'm curious about using csv.DictReader and csv.DictWriter. Do either provide any advantages/disadvantages to the current code?
I suppose another way of doing this is via classes which honestly I would like to know how to do it that way as well.
#!/usr/bin/python
""" Description This script will take a csv file and parse it looking for specific criteria.
A new file is then created based of the original file name containing only the desired parsed criteria.
"""
import csv
import re
import sys
searched = ['aircheck', 'linkrunner at', 'onetouch at']
def find_group(row):
"""Return the group index of a row
0 if the row contains searched[0]
1 if the row contains searched[1]
etc
-1 if not found
"""
for col in row:
col = col.lower()
for j, s in enumerate(searched):
if s in col:
return j
return -1
#Prompt for File Name
def file_IO():
print "Please Enter a File Name, (Without .csv extension): ",
base_Name = raw_input()
print "You entered: ",base_Name
in_Name = base_Name + ".csv"
out_Name = base_Name + ".parsed.csv"
print "Input File: ", in_Name
print "OutPut Files: ", out_Name
#Opens Input file for read and output file to write.
in_File = open(in_Name, "rU")
reader = csv.reader(in_File)
out_File = open(out_Name, "wb")
writer = csv.writer(out_File, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
return (reader, writer)
file_IO()
# Read header
header = reader.next()
stored = []
writer.writerow([header[0], header[3]])
for i, row in enumerate(reader):
g = find_group(row)
if g >= 0:
stored.append((g, i, row))
stored.sort()
for g, i, row in stored:
writer.writerow([row[0], row[3]])
# Closing Input and Output files.
in_File.close()
out_File.close()
If I were you, I'd only separate find_group.
import csv
def find_group(row):
GROUPS = ['aircheck', 'linkrunner at', 'onetouch at']
for idx, group in enumerate(GROUPS):
if group in map(str.lower, row):
return idx
return -1
def get_filenames():
# this might be the only other thing you'd want to factor
# into a function, and frankly I don't really like getting
# user input this way anyway....
basename = raw_input("Enter a base filename (no extension): ")
infilename = basename + ".csv"
outfilename = basename + ".parsed.csv"
return infilename, outfilename
# notice that I don't open the files yet -- let main handle that
infilename, outfilename = get_filenames()
with open(infilename, 'rU') as inf, open(outfilename, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter=',',
quotechar='"', quoting=csv.QUOTE_ALL)
header = next(reader)
writer.writerow([[header[0], header[3]])
stored = sorted([(find_group(row),idx,row) for idx,row in
enumerate(reader)) if find_group(row) >= 0])
for _, _, row in stored:
writer.writerow([row[0], row[3]])
here is my code for readinng individual cell of one csv file. but want to read multiple csv file one by one from .txt file where csv file paths are located.
import csv
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
data = list(csv.reader(ifile, delimiter = ';'))
REQ = []
RES = []
n = len(data)
for i in range(n):
x = data[i][1]
y = data[i][2]
REQ.append (x)
RES.append (y)
i += 1
for j in range(2,n):
try:
if REQ[j] != '' and RES[j]!= '': # ignore blank cell
print REQ[j], ' ', RES[j]
except:
pass
j += 1
And csv file paths are stored in a .txt file like
C:\Desktop\Test_Specification\RDBI.csv
C:\Desktop\Test_Specification\ECUreset.csv
C:\Desktop\Test_Specification\RDTC.csv
and so on..
You can read stuff stored in files into variables. And you can use variables with strings in them anywhere you can use a literal string. So...
with open('mytxtfile.txt', 'r') as txt_file:
for line in txt_file:
file_name = line.strip() # or was it trim()? I keep mixing them up
ifile = open(file_name, 'rb')
# ... the rest of your code goes here
Maybe we can fix this up a little...
import csv
with open('mytxtfile.txt', 'r') as txt_file:
for line in txt_file:
file_name = line.strip()
csv_file = csv.reader(open(file_name, 'rb', delimiter=';'))
for record in csv_file[1:]: # skip header row
req = record[1]
res = record[2]
if len(req + res):
print req, ' ', res
you just need to add a while which will read your file containing your list of files & paths upon your first open statement, for example
from __future__ import with_statement
with open("myfile_which_contains_file_path.txt") as f:
for line in f:
ifile = open(line, 'rb')
# here the rest of your code
You need to use a raw string string your path contains \
import csv
file_list = r"C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
with open(file_list) as f:
for line in f:
with open(line.strip(), 'rb') as the_file:
reader = csv.reader(the_file, delimiter=';')
for row in reader:
req,res = row[1:3]
if req and res:
print('{0} {1}'.format(req, res))