Feeding list position as parameter to arcpy.SymDiff_analysis function - python

New to python.
I have two feature datasets, Input features (NewDemoFC) and Update Features (ExistingFC) with 5 feature classes each. One set contains demolished features, the other set contains all active features. The objective is to compare the two and wherever the demolished features (from NewDemoFC) overlap with an active feature (from ExistingFC), delete the overlapping active features (from ExistingFC) and output a new feature class.
I want to use the while function and be able to feed a particular position from the list for both input features and update feature parameters. Also would like to maintain the same names and order for output feature class names.
Trying to achieve the results of the below model for a dataset with multiple files as the SymDiff_analysis tool doesn't work on more than one FC as input, unless you add each feature class as a line item, specifying input, output and any intermediate temporary files. This is not practical for a dataset with 100 odd feature classes.
enter image description here
CODE IS AS UNDER
# Import arcpy module
import arcpy
# Set environment to generate new input feature class list and count
arcpy.env.workspace = "T:\eALP_Update.gdb\Point_DemoNew"
NewDemoFC = arcpy.ListFeatureClasses()
NewDemoFCCount = len(NewDemoFC)
# Set environment to generate existing feature class list
arcpy.env.workspace = "T:\eALP_Update.gdb\Point_InputExisting"
ExistingFC = arcpy.ListFeatureClasses()
E_PointFeatures_ActiveOnly = []
i = 0
#arcpy.env.workspace = "T:\eALP_Update.gdb\Point_ActiveExisting"
while i < NewDemoFCCount:
# Process: Symmetrical Difference (2)
arcpy.SymDiff_analysis(NewDemoFC[i], ExistingFC[i], E_PointFeatures_ActiveOnly[i], "ALL", "0.01 Feet")
i = i + 1
ERROR I GET IS AS UNDER
Traceback (most recent call last):
File "C:\Python27\ArcGIS10.5\Lib\site-packages\pythonwin\pywin\framework\intpyapp.py", line 345, in OnFileRun
scriptutils.RunScript(None, None, showDlg)
File "C:\Python27\ArcGIS10.5\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 353, in RunScript
del main.file
AttributeError: file
5
[u'Demo_New_UTILITYPOINT', u'Demo_New_ROADPOINT', u'Demo_New_AIRPORTSIGN', u'Demo_New_AIRPORTCONTROLPOINT', u'Demo_New_AIRFIELDLIGHT']
5
[u'UtilityPoint', u'RoadPoint', u'AirportSign', u'AirportControlPoint', u'AirfieldLight']
Traceback (most recent call last):
File "C:\Python27\ArcGIS10.5\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 326, in RunScript
exec codeObject in main.dict
File "T:\Data\GOAA\eALPUpdates\Point_2-SymmetricalDifferenceOnly.py", line 41, in
arcpy.SymDiff_analysis(NewDemoFC[i], ExistingFC[i], E_PointFeatures_ActiveOnly[i], "ALL", "0.01 Feet")
IndexError: list index out of range
[Dbg]>>>

What you want to do is use a for loop that iterates over each feature class to avoid the odd indexing process you have going on inside your call to arcpy.SymDiff. For example, the use of i to index E_PointFeatures_ActiveOnly (an empty list) as an output path won't work. To do this the way you want to, you'll need to dynamically generate file names. Make sure that the output folder is empty when you do this to avoid naming conflicts. The code you have also duplicates everything for each folder, so we can define functions to eliminate that, that way you can re-use it easily. Lastly, you really want to avoid altering global variables like arcpy.env.workspace multiple times - the function below for this is really verbose, but since it's a function, you only have to do it once! I'll assume you have access to arcgis version >= 10.1 The following code is long and untested, but I imagine it should do the trick:
import arcpy
arcpy.env.workspace = "T:\eALP_Update.gdb\Point_ActiveExisting"
def getFCs(folderOne, folderTwo):
"""Get feature classes from two folders"""
from os.path import join
x = []
y = []
folders = [folderOne, folderTwo]
for folder in folders:
for paths, directory, filenames in arcpy.da.Walk(
folder,
topdown=True,
followlinks=False,
datatype='FeatureClass',
type='ALL'):
for file in filenames:
if folder == folder[0]:
x.append(join(directory, file))
else:
y.append(join(directory, file))
return x, y
def batchSymDiff(inFolder, updateFolder, joinAttr, clusterTolerance):
"""Performs SymDiff analysis for every feature in a set of folders"""
inFeatures, updateFeatures = getFCs(inFolder, updateFolder)
for fc1, fc2 in zip(inFeatures, updateFeatures):
output = fc2.replace(".shp", "_sym.shp") # this naming pattern assumes ".shp" ending
arcpy.SymDiff_analysis(fc1, fc2, output, joinAttr, clusterTolerance)
# set variables for batch process
inFolder = "T:\eALP_Update.gdb\Point_DemoNew"
updateFolder = "T:\eALP_Update.gdb\Point_InputExisting"
joinAttr = "ALL"
clusterTolerance = "0.01"
# execute batchSymDiff
batchSymDiff(inFolder, updateFolder, joinAttr, clusterTolerance)
This code is probably more verbose than it has to be, but doing it this way means you can avoid changing global env vars over and over - a risky business since the errors it causes are really difficult to diagnose sometimes - and it makes your code reusable. Also note that it eliminates the need to use a "manual" counter (i). Hope it helps! I suggest testing the code on test data first.

Related

Execute python code and evaluate/test results

Admittedly I am not sure how to ask this, as I know how to handle this in R (code execution in a new environment), but equivalent searches for the python solution are not yielding what I was hoping.
In short, I will receive a spreadsheet (or csv) where the contents of the column will contain, hopefully, valid python code. This could be the equivalent of a script, but just contained in the csv/workbook. For a use case, think teaching programming and the output is an LMS.
What I am hoping to do is loop over the file, and for each cell, run the code, and with the results in memory, test to see if certain things exist.
For example: https://docs.google.com/spreadsheets/d/1D-zC10rUTuozfTR5yHfauIGbSNe-PmfrZCkC7UTPH1c/edit?usp=sharing
When evaluating the first response in the spreadsheet above, I would want to test that x, y, and z are all properly defined and have the expected values.
Because there would be multiple rows in the file, one per student, how can I run each row separately, evaluate the results, and ensure that I isolate the results to only that cell. Simply, when moving on, I do not retain any of the past evaluations.
(I am unaware of tools to do code checking, so I am dealing with it in a very manual way.)
It is possible to use Python's exec() function to execute strings such as the content in the cells.
Ex:
variables = {}
exec("""import os
# a comment
x = 2
y = 6
z = x * y""", variables)
assert variables["z"] == 12
Dealing with the csv file:
import csv
csv_file = open("path_to_csv_file", "rt")
csv_reader = csv.reader(csv_file)
iterator = iter(csv_reader)
next(iterator) # To skip the titles of the columns
for row in iterator:
user = row[0]
answer = row[1]
### Any other code involving the csv file must be put here to work properly,
### that is, before closing csv_file.
csv_file.close() # Remember to close the file.
It won't be able to detect whether some module was imported (Because when importing from an exec() function, the module will remain in cache for the next exec's). One way to test this would be to 'unimport' the module and test the exec for Exceptions.
Ex:
# This piece of code would be before closing the file,
# INSIDE THE FOR LOOP AND WITH IT IDENTED (Because you want
# it to run for each student.).
try:
del os # 'unimporting' os (This doesn't 'unimport' as much as deletes a
# reference to the module, what could be problematic if a 'from
# module import object' statement was used.)
except NameError: # So that trying to delete a module that wasn't imported
# does not lead to Exceptions being raised.
pass
namespace = dict()
try:
exec(answer, namespace)
except:
# Answer code could not be run without raising exceptions, i.e., the code
# is poorly written.
# Code you want to run when the answer is wrong.
else:
# The code hasn't raised Exceptions, time to test the variables.
x, y, z = namespace['x'], namespace['y'], namespace['z']
if (x == 2) and (y == 6) and (z == x * y):
# Code you want to run when the answer is right.
else:
# Code you want to run when the answer is wrong.
I sense that this is not the best way to do this, but it is certainly an attempt.
I hope this helped.
EDIT: Removed some bad code and added part of Tadhg McDonald-Jensen's comment.

Python scripting in ABAQUS

I have a python script to create ABAQUS model and run a job.
I want to create a loop over a variable
index=1:1:4,
create four different models and run the four jobs for each model.
A model is named 'Model-1' for instance in the following line:
##-----------------------------------------------------------------------
mdb.models['Model-1'].ConstrainedSketch(name='__profile__', sheetSize=sqrlen)
##-----------------------------------------------------------------------
In creating a loop, I create a string as follows:
##-----------------------------------------------------------------------
index='1'
modelname='\''+'Model' + index+ '\''
# Square Part is created
mdb.models[modelname].ConstrainedSketch(name='__profile__', sheetSize=sqrlen)
##-------------------------------------------------------------------------
When I run the script in ABAQUS, it gives error saying 'Model1'as follows:
##-------------------------------------------------------------------------
File "d:/abaqus_working_directory/scripting_example/simulation/scripting_loop.py", line 22, in <module>
mdb.models[modelname].ConstrainedSketch(name='__profile__', sheetSize=sqrlen) #### sqrlen
KeyError: 'Model1'
Exit from main file [Kernel]: d:/abaqus_working_directory/scripting_example/simulation/scripting_loop.py
##-------------------------------------------------------------------------
I want to use the string modelname( with value ='Model-1') instead of writing 'Model-1' in the python script
mdb.models['Model-1'].ConstrainedSketch(name=....)
mdb.models[modelname].ConstrainedSketch(name=...)
when it is called.
Any help is deeply appreciated.
Sincerely,
Me.
You are mixing two different names, Model-1 and Model1
In your loop creation, include - in the modelname. You can do something like this:
##-----------------------------------------------------------------------
index='1'
modelname='\''+'Model-' + index+ '\''
# Square Part is created
mdb.models[modelname].ConstrainedSketch(name='__profile__', sheetSize=sqrlen)
##-------------------------------------------------------------------------
Also, you should use
modelname='Model-' + index
since that will give you a string without the extra quotes.
don't work with the string names at all. Early in the script define:
model=mdb.models['Model-1']
then do for example:
model.ConstrainedSketch..
if you are working with multiple models then similarly create a list of model objects.

Multiprocessing to speed up for loop

Just trying to learn and I"m wondering if multiprocessing would speed
up this for loop ,.. trying to compare
alexa_white_list(1,000,000 lines) and
dnsMISP (can get up to 160,000 lines)
Code checks each line in dnsMISP and looks for it in alexa_white_list.
if it doesn't see it, it adds it to blacklist.
Without mp_handler function the code works fine but it takes
around 40-45 minutes. For brevity, I've omitted all the other imports and
the function that pulls down and unzips the alexa white list.
The below gives me the following error -
File "./vetdns.py", line 128, in mp_handler
p.map(dns_check,dnsMISP,alexa_white_list)
NameError: global name 'dnsMISP' is not defined
from multiprocessing import Pool
def dns_check():
awl = []
blacklist = []
ctr = 0
dnsMISP = open(INPUT_FILE,"r")
dns_misp_lines = dnsMISP.readlines()
dnsMISP.close()
alexa_white_list = open(outname, 'r')
alexa_white_list_lines = alexa_white_list.readlines()
alexa_white_list.close()
print "converting awl to proper format"
for line in alexa_white_list_lines:
awl.append(".".join(line.split(".")[-2:]).strip())
print "done"
for host in dns_misp_lines:
host = host.strip()
host = ".".join(host.split(".")[-2:])
if not host in awl:
blacklist.append(host)
file_out = open(FULL_FILENAME,"w")
file_out.write("\n".join(blacklist))
file_out.close()
def mp_handler():
p = Pool(2)
p.map(dns_check,dnsMISP,alexa_white_list)
if __name__ =='__main__':
mp_handler()
If I label it as global etc I still get the error. I'd appreciate any
suggestions!!
There's no need for multiprocessing here. In fact this code can be greatly simplified:
def get_host_form_line(line):
return line.strip().split(".", 1)[-1]
def dns_check():
with open('alexa.txt') as alexa:
awl = {get_host_from_line(line) for line in alexa}
blacklist = []
with open(INPUT_FILE, "r") as dns_misp_lines:
for line in dns_misp_lines:
host = get_host_from_line(line)
if host not in awl:
blacklist.append(host)
with open(FULL_FILENAME,"w") as file_out:
file_out.write("\n".join(blacklist))
Using a set comprehension to create your Alexa collection has the advantage of being O(1) lookup time. Sets are similar to dictionaries. They are pretty much dictionaries that only have keys with no values. There is some additional overhead in memory and the initial creation time will likely be slower since the values you put in to a set need to be hashed and hash collisions dealt with but the increase in performance you gain from the faster look ups should make up for it.
You can also clean up your line parsing. split() takes an additional parameter that will limit the number of times the input is split. I'm assuming your lines look something like this:
http://www.something.com and you want something.com (if this isn't the case let me know)
It's important to remember that the in operator isn't magic. When you use it to check membership (is an element in the list) what it's essentially doing under the hood is this:
for element in list:
if element == input:
return True
return False
So every time in your code you did if element in list your program had to iterate across each element until it either found what you were looking for or got to the end. This was probably the biggest bottleneck of your code.
You tried to read a variable named dnsMISP to pass as an argument to Pool.map. It doesn't exist in local or global scope (where do you think it's coming from?), so you got a NameError. This has nothing to do with multiprocessing; you could just type a line with nothing but:
dnsMISP
and have the same error.

ArcPy Traceback (most recent call last): error

I used Model Builder to convert feature classes within a geodatabase into shapefiles into a preexisting folder. It ran successfully. However, when i exported the Model into a Python Script and ran it in Python, I get an error saying:
Traceback (most recent call last):
File "C:\Users\Mark.Novicio\Desktop\New folder\FSA_Counties_delivered_by_GISO\Updated_Iterators.py", line 13, in
arcpy.ImportToolbox("Model Functions")
The python script is attached in the image:
ArcPy code exported from ModelBuilder often needs a lot of tweaking, although it can be a moderately useful starting point.
IterateFeatureClasses_mb is the python-code of a ModelBuilder only tool.
This tool is intended for use in ModelBuilder and not in Python scripting.
Since you want to use Python instead, you need to use a normal iterator (generally, a for loop running through a list of feature classes). You can automatically build the list with arcpy.ListFeatureClasses, and then just loop:
# set the workspace
arcpy.env.workspace = Test_gdb
# get a list of feature classes in arcpy.env.workspace
listFC = arcpy.ListFeatureClasses()
# iterate
for fc in listFC:
#
# code to do to fc
#
If you're only planning to use that list of feature classes once, call ListFeatureClasses in the for loop:
for fc in arcpy.ListFeatureClasses():
In either case, you'll need to look at FeatureClassToFeatureClass for outputting a shapefile once you get your loop working :)

python netcdf: making a copy of all variables and attributes but one

I need to process a single variable in a netcdf file that actually contains many attributes and variable.
I think it is not possible to update a netcdf file (see question How to delete a variable in a Scientific.IO.NetCDF.NetCDFFile?)
My approach is the following:
get the variable to process from the original file
process the variable
copy all data from the original netcdf BUT the processed variable to the final file
copy the processed variable to the final file
My problem is to code step 3. I started with the following:
def processing(infile, variable, outfile):
data = fileH.variables[variable][:]
# do processing on data...
# and now save the result
fileH = NetCDFFile(infile, mode="r")
outfile = NetCDFFile(outfile, mode='w')
# build a list of variables without the processed variable
listOfVariables = list( itertools.ifilter( lamdba x:x!=variable , fileH.variables.keys() ) )
for ivar in listOfVariables:
# here I need to write each variable and each attribute
How can I save all data and attribute in a handfull of code without having to rebuild a whole structure of data?
Here's what I just used and worked. #arne's answer updated for Python 3 and also to include copying variable attributes:
import netCDF4 as nc
toexclude = ['ExcludeVar1', 'ExcludeVar2']
with netCDF4.Dataset("in.nc") as src, netCDF4.Dataset("out.nc", "w") as dst:
# copy global attributes all at once via dictionary
dst.setncatts(src.__dict__)
# copy dimensions
for name, dimension in src.dimensions.items():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited() else None))
# copy all file data except for the excluded
for name, variable in src.variables.items():
if name not in toexclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst[name][:] = src[name][:]
# copy variable attributes all at once via dictionary
dst[name].setncatts(src[name].__dict__)
If you just want to copy the file picking out variables, nccopy is a great tool as submitted by #rewfuss.
Here's a Pythonic (and more flexible) solution with python-netcdf4. This allows you to open it for processing and other calculations before writing to file.
with netCDF4.Dataset(file1) as src, netCDF4.Dataset(file2) as dst:
for name, dimension in src.dimensions.iteritems():
dst.createDimension(name, len(dimension) if not dimension.isunlimited() else None)
for name, variable in src.variables.iteritems():
# take out the variable you don't want
if name == 'some_variable':
continue
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[x][:] = src.variables[x][:]
This does not take into account of variable attributes, such as fill_values. You can do that easily following the documentation.
Do be careful, netCDF4 files once written/created this way cannot be undone. The moment you modify the variable, it is written to file at the end of with statement, or if you call .close() on the Dataset.
Of course, if you wish to process the variables before writing them, you have to be careful about which dimensions to create. In a new file, Never write to variables without creating them. Also, never create variables without having defined dimensions, as noted in python-netcdf4's documentation.
This answer builds on the one from Xavier Ho (https://stackoverflow.com/a/32002401/7666), but with the fixes I needed to complete it:
import netCDF4 as nc
import numpy as np
toexclude = ["TO_REMOVE"]
with nc.Dataset("orig.nc") as src, nc.Dataset("filtered.nc", "w") as dst:
# copy attributes
for name in src.ncattrs():
dst.setncattr(name, src.getncattr(name))
# copy dimensions
for name, dimension in src.dimensions.iteritems():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited else None))
# copy all file data except for the excluded
for name, variable in src.variables.iteritems():
if name not in toexclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[name][:] = src.variables[name][:]
The nccopy utility in C netCDF versions 4.3.0 and later includes an option to list which variables are to be copied (along with their attributes). Unfortunately, it doesn't include an option for which variables to exclude, which is what you need.
However, if the list of (comma-delimited) variables to be included doesn't cause the nccopy command-line to exceed system limits, this would work. There are two variants for this option:
nccopy -v var1,var2,...,varn input.nc output.nc
nccopy -V var1,var2,...,varn input.nc output.nc
The first (-v) includes all the variable definitions, but only data for the named variables.
The second (-V) doesn't include definitions or data for unnamed variables.
I know this is an old question, but as an alternative, you can use the library netcdf and shutil:
import shutil
from netcdf import netcdf as nc
def processing(infile, variable, outfile):
shutil.copyfile(infile, outfile)
with nc.loader(infile) as in_root, nc.loader(outfile) as out_root:
data = nc.getvar(in_root, variable)
# do your processing with data and save them as memory "values"...
values = data[:] * 3
new_var = nc.getvar(out_root, variable, source=data)
new_var[:] = values
All of the recipes so far (except for one form #rewfuss, which works fine, but is not exactly a pythonic one) produce a plain NetCDF3 file, which might be killing on highly compressed NetCDF4 datasets. Here is an attempt to cope with the issue.
import netCDF4
infname="Inf.nc"
outfname="outf.nc"
skiplist="var1 var2".split()
with netCDF4.Dataset(infname) as src:
with netCDF4.Dataset(outfname, "w", format=src.file_format) as dst:
# copy global attributes all at once via dictionary
dst.setncatts(src.__dict__)
# copy dimensions
for name, dimension in src.dimensions.items():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited() else None))
# copy all file data except for the excluded
for name, variable in src.variables.items():
if name in skiplist:
continue
createattrs = variable.filters()
if createattrs is None:
createattrs = {}
else:
chunksizes = variable.chunking()
print(createattrs)
if chunksizes == "contiguous":
createattrs["contiguous"] = True
else:
createattrs["chunksizes"] = chunksizes
x = dst.createVariable(name, variable.datatype, variable.dimensions, **createattrs)
# copy variable attributes all at once via dictionary
dst[name].setncatts(src[name].__dict__)
dst[name][:] = src[name][:]
This seems to work fine and store the variables the way they are in the original file, except it does not copy some variable attributes that start from _underscore, and are not known to the NetCDF library.

Categories

Resources