Python script to run FME workbench

Python script to run FME workbench - python

I have more than 500 xml files and each xml file should processed on FME workbench individually (iteration of FME workbench for each xml file).
For such a propose i have to run a python file (loop.py) to iterate FME workbench for each xml file.
The whole process was working in past on other PC without any problem. Now Once i run Module i got the following error:
Traceback (most recent call last):E:\XML_Data
File "E:\XML_Data\process\01_XML_Tile_1.py", line 28, in
if "Translation was SUCCESSFUL" in open(path_log + "\" + data + ".log").read():
IOError: [Errno 2] No such file or directory: 'E:\XML_Data\data_out\log_01\re_3385-5275.xml.log'
Attached the python code(loop.py).
Any help is greatly appreciated.
import os
import time
# Mainpath and Working Folder:
#path_main = r"E:\XML_Data"
path_main = r"E:\XML_Data"
teil = str("01")
# variables
path_in = path_main + r"\data_in\03_Places\teil_" + teil # "Source folder of XML files"
path_in_tile10 = path_main + r"\data_in\01_Tiling\10x10.shp" # "Source folder of Grid shapefile"
path_in_commu = path_main + r"\data_in\02_Communities\Communities.shp" # "Source folder of Communities shapefile"
path_out = path_main + r"\data_out\teil_" + teil # "Output folder of shapefiles that resulted from XML files (tile_01 folder)"
path_log = path_main + r"\data_out\log_" + teil # "Output folder of log files for each run(log_01 folder)"
path_fme = r"%FME_EXE_2015%" # "C:\Program Files\FME2015\fme.exe"
path_fme_workbench = path_main + r"\process\PY_FME2015.fmw" # "path of FME workbench"
datalists = os.listdir(path_in)
count = 0
# loop each file individually in FME
for data in datalists:
if data.find(".xml") != -1:
count +=1
print ("Run-No." + str(count) + ": with data " + data)
os.system (path_fme + " " + path_fme_workbench + " " + "--SourceDataset_XML"+ " " + path_in + "\\" + data + " " + "--SourceDataset_SHAPE" + " " + path_in_tile10 + " " + "--SourceDataset_SHAPE_COMU" + " " + path_in_commu + " " + "--DestDataset_SHAPE" +" " +path_out + " " +"LOG_FILENAME" + " " + path_log + "\\" + data + ".log" )
print ("Data processed: " + data)
shape = str(data[19:28]) + "_POPINT_CENTR_UTM32N.shp"
print ("ResultsFileName: " + shape)
if "Translation was SUCCESSFUL" in open(path_log + "\\" + data + ".log").read():
# Translation was successful and SHP file exists:
if os.path.isfile(path_out + "\\" + shape):
write_log = open(path_out + "\\" + "result_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " " + shape + "\n")
write_log.close()
print("Everything ok")
#Translation was successful, but SHP file does not exist:
else:
write_log = open(path_out + "\\" + "error_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " Data: " + shape + " unavailable.\n")
write_log.close()
# Translation was not successful:
else:
write_log = open(path_out + "\\" + "error_xml.log", "a")
write_log.write(time.asctime(time.localtime()) + " Translation " + Data + " not successful.\n")
write_log.close()
print ("Number of calculated files: " + str(count))

Most likely, the script failed at the os.system line, so the log file was not created from the command. Since you mentioned a different computer, it could be caused by many reasons, such as a different version of FME (so the environment variable %FME_EXE_2015% would not exist).

Use a workspace runner transformer to do this.

The FME version is outdated.so first check the version whether it is creating the problem.

subprocess.call(["C:/Program Files/fme/FMEStarter/FMEStarter.exe", "C:/Program Files/fme/fme20238/fme.exe", "/fmefile.fmw" "LOG_FILENAME","logfile"], stdin=None, stdout=None, stderr=None, shell=True, timeout=None)

Related

tar: cowardly refusing to create an empty archive and I don't know why it would give me that error (files to be archived are not empty)

I tried looking on stack overflow for a solution to this and other online resources but my specific situation didn't apply to what I found online.
When I run my script, it works until the line that I expect to create a tar archive of 2 files using the os.system command and store that archive in /home/dahmed26/backups/xmlfiles
It gives the following error:
tar: Cowardly refusing to create an empty archive
Try 'tar --help' or 'tar --usage' for more information.
sh: line 1: .tar.gz: command not found
This is my code:
import os
currentuser = os.popen('whoami')
username = username.strip()
if username != 'root':
print("Must be root")
exit()
else:
vmChoice = input("Choose a VM")
now = os.popen('date +%Y%m%d').read()
name = (vmChoice + '-' + now)
os.system('virsh dumpxml ' + vmChoice + ' > /home/dahmed26/backups/xmlfiles/' + vmChoice + '.xml')
location = os.popen('cat /home/dahmed26/backups/xmlfiles/' + vmChoice + '.xml | grep "source file" | cut -d "\'" -f2').read()
os.system('tar czf' + ' ' + '/home/dahmed26/backups/xmlfiles/' + name + '.tar.gz' + ' ' + '/home/dahmed26/backups/xmlfiles/' + vmChoice + '.xml' + location)

Mongodump specify the output and filename

Is there a way to specify the whole path when running mongodump? I tried using --out but what it does at the moment is saving into a file at my_given_path/database/collection_name.json.gz
I have the following:
path = file_path + '/' + database + '/' + collection + '/'
query_input = "{\\\"metadata_id\\\": {\\\"\$oid\\\": \\\"" + metadata_id + "\\\"}}"
command = "mongodump --uri " + connection_string + database + " " \
"--collection=" + collection + " --query=\"" + query_input + "\" --gzip --out=" + path + " --quiet"
for which the file is saved at:
file_path/database/collection/database/collection.json.gz
Ideally I would like to save it into
file_path/database/collection/metadata_id.json.gz
Would this be possible?

You can use the --archive=<file> flag instead of --out

Python Loop Count Stops at 67381

I'm assuming this has to be a memory issue but I'm not sure. The program loops through PDF's to look for corrupted files. When a file is corrupted, it writes that location to a txt file for me to review later. When running it the first time, I logged both pass and fail scenarios to the log. After 67381 log entries, it stopped. Then I changed this logic so it only logs errors, however, in the console I did display a count of the loop so I can tell how far along the process is. There are about 190k files to loop through and at exactly 67381 the count stops every time. It looks like the python program is still running in the background as the memory and cpu keeps fluctuating but it's hard to be sure. I also don't know now if it will still write errors to the log.
Here is the code,
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
PyPDF2.PdfFileReader(open(file, "rb"))
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
else:
pass
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()
Edit 1: (New Code)
New code and the same issue:
import PyPDF2, os
from time import gmtime,strftime
path = raw_input("Enter folder path of PDF files:")
t = open(r'c:\pdf_check\log.txt','w')
count = 1
for dirpath,dnames,fnames in os.walk(path):
for file in fnames:
print count
count = count + 1
if file.endswith(".pdf"):
file = os.path.join(dirpath, file)
try:
with open(file,'rb') as f:
PyPDF2.PdfFileReader(f)
except PyPDF2.utils.PdfReadError:
curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
t.write (str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "fail" + "\n")
f.close()
else:
pass
f.close()
#curdate = strftime("%Y-%m-%d %H:%M:%S", gmtime())
#t.write(str(curdate) + " " + "-" + " " + file + " " + "-" + " " + "pass" + "\n")
t.close()
Edit 2: I am trying to now run this from a different machine with beefier hardware and a different version of windows (10 pro instead of server 2008 r2) but I don't think this is the issue.

Try to edit one of the .pdf files to make it larger. That way, if the loop number your program "stops" at is smaller, you can identify the problem as a memory issue.
Else, it might be an unusually large pdf file that is taking your program a while to verify integrity.
Debugging this, you could print the file location of the .pdf files you open to find this particular .pdf and manually open it to investigate further..

Figured it out. The issue is actually due to a random and very large corrupted PDF. So this is not a loop issue, it's a corrupted file issue.

How to get Python to look at Sub-Folders?

I am trying to create a python script that will look in a series of sub-folders and delete empty shapefiles. I have successfully created the part of the script that will delete the empty files in one folder, but there are a total of 70 folders within the "Project" folder. While I could just copy and paste the code 69 times I'm sure must be a way to get it to look at each sub-folder and run the code for each of those sub-folders. Below is the what I have so far. Any ideas? I'm very new to this and I have simply edited an existing code to get this far. Thanks!
import os
# Set the working directory
os.chdir ("C:/Naview/Platypus/Project")
# Get the list of only files in the current directory
file = filter(os.path.isfile, os.listdir('C:/Naview/Platypus/Project'))
# For each file in directory
for shp in file:
# Get only the files that end in ".shp"
if shp.endswith(".shp"):
# Get the size of the ".shp" file.
# NOTE: The ".dbf" file can vary is size whereas
# the shp & shx are always the same when "empty".
size = os.path.getsize(shp)
print "\nChecking " + shp + "'s file size..."
#If the file size is greater than 100 bytes, leave it alone.
if size > 100:
print "File is " + str(size) + " bytes"
print shp + " will NOT be deleted \n"
#If the file size is equal to 100 bytes, delete it.
if size == 100:
# Convert the int output from (size) to a string.
print "File is " + str(size) + " bytes"
# Get the filename without the extention
base = shp[:-4]
# Remove entire shapefile
print "Removing " + base + ".* \n"
if os.path.exists(base + ".shp"):
os.remove(base + ".shp")
if os.path.exists(base + ".shx"):
os.remove(base + ".shx")
if os.path.exists(base + ".dbf"):
os.remove(base + ".dbf")
if os.path.exists(base + ".prj"):
os.remove(base + ".prj")
if os.path.exists(base + ".sbn"):
os.remove(base + ".sbn")
if os.path.exists(base + ".sbx"):
os.remove(base + ".sbx")
if os.path.exists(base + ".shp.xml"):
os.remove(base + ".shp.xml")

There are several ways to do this. I'm a fan of glob
for shp in glob.glob('C:/Naview/Platypus/Project/**/*.shp'):
size = os.path.getsize(shp)
print "\nChecking " + shp + "'s file size..."
#If the file size is greater than 100 bytes, leave it alone.
if size > 100:
print "File is " + str(size) + " bytes"
print shp + " will NOT be deleted \n"
continue
print "Removing", shp, "files"
for file in glob.glob(shp[:-3] + '*'):
print " removing", file
os.remove(file)

Time to learn about procedural programming: Defining Functions.
Put your code into a function with a path parameter and call it for each of your 70 paths:
def delete_empty_shapefiles(path):
# Get the list of only files in the current directory
file = filter(os.path.isfile, os.listdir(path))
...
paths = ['C:/Naview/Platypus/Project', ...]
for path in paths:
delete_empty_shapefiles(path)
Bonus points for creating a function that performs the os.path.exists() and os.remove() calls.

Equivalent to grep, but not using open()

I have the following script (see below) which is taken stdin and manipulating into some simple files.
# Import Modules for script
import os, sys, fileinput, platform, subprocess
# Global variables
hostsFile = "hosts.txt"
hostsLookFile = "hosts.csv"
# Determine platform
plat = platform.system()
if plat == "Windows":
# Define Variables based on Windows and process
currentDir = os.getcwd()
hostsFileLoc = currentDir + "\\" + hostsFile
hostsLookFileLoc = currentDir + "\\" + hostsLookFile
ipAddress = sys.argv[1]
hostName = sys.argv[2]
hostPlatform = sys.argv[3]
hostModel = sys.argv[4]
# Add ipAddress to the hosts file for python to process
with open(hostsFileLoc,"a") as hostsFilePython:
for line in open(hostsFilePython, "r"):
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in hosts file"
else:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
# Add all details to the lookup file for displaying on-screen and added value
with open(hostsLookFileLoc,"a") as hostsLookFileCSV:
for line in open(hostsLookFileCSV, "r"):
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in lookup file"
else:
print "Adding details: " + ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + " to file"
hostsLookFileCSV.write(ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + "\n")
if plat == "Linux":
# Define Variables based on Linux and process
currentDir = os.getcwd()
hostsFileLoc = currentDir + "/" + hostsFile
hostsLookFileLoc = currentDir + "/" + hostsLookFile
ipAddress = sys.argv[1]
hostName = sys.argv[2]
hostPlatform = sys.argv[3]
hostModel = sys.argv[4]
# Add ipAddress to the hosts file for python to process
with open(hostsFileLoc,"a") as hostsFilePython:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
# Add all details to the lookup file for displaying on-screen and added value
with open(hostsLookFileLoc,"a") as hostsLookFileCSV:
print "Adding details: " + ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + " to file"
hostsLookFileCSV.write(ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + "\n")
This code obviously does not work, because the for line in open(hostsFilePython, "r"): syntax is wrong... I can not use a current file object with "open()". However this is want I want to achieve, how can I do this?

You want to open your file using the a+ mode so that you can both read and write, then simply use the existing file object (hostsFilePython).
However, this still won't work as you can only iterate over a file once before it is exhausted.
It's worth noting that this isn't very efficient. The better plan is to read the data into a set, update the set with your new values, then write the set to the file. (As pointed out in the comments, sets don't preserve duplicates (good for your purposes), and order, which may or may not work for you. If not, then you might need to use a list, which will be less efficient).

with open(hostsFileLoc) as hostsFilePython:
lines = hostsFilePython.readlines()
for filename in lines:
with open(hostsFileLoc, 'a') as hostFilePython:
with open(filename) as hostsFile:
for line in hostsFile.readlines():
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in hosts file"
else:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
The default mode is read, so you don't need to pass in r explicitly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python script to run FME workbench - python

Most likely, the script failed at the os.system line, so the log file was not created from the command. Since you mentioned a different computer, it could be caused by many reasons, such as a different version of FME (so the environment variable %FME_EXE_2015% would not exist).

Use a workspace runner transformer to do this.

The FME version is outdated.so first check the version whether it is creating the problem.

subprocess.call(["C:/Program Files/fme/FMEStarter/FMEStarter.exe", "C:/Program Files/fme/fme20238/fme.exe", "/fmefile.fmw" "LOG_FILENAME","logfile"], stdin=None, stdout=None, stderr=None, shell=True, timeout=None)

Related

tar: cowardly refusing to create an empty archive and I don't know why it would give me that error (files to be archived are not empty)

Mongodump specify the output and filename

Python Loop Count Stops at 67381

How to get Python to look at Sub-Folders?

Equivalent to grep, but not using open()

Categories

Resources