Arcpy, select features based on part of a string - python

So for my example, I have a large shapefile of state parks where some of them are actual parks and others are just trails. However there is no column defining which are trails vs actual parks, and I would like to select those that are trails and remove them. I DO have a column for the name of each feature, that usually contains the word "trail" somewhere in the string. It's not always at the beginning or end however.
I'm only familiar with Python at a basic level and while I could go through manually selecting the ones I want, I was curious to see if it could be automated. I've been using arcpy.Select_analysis and tried using "LIKE" in my where_clause and have seen examples using slicing, but have not been able to get a working solution. I've also tried using the 'is in' function but I'm not sure I'm using it right with the where_clause. I might just not have a good enough grasp of the proper terms to use when asking and searching. Any help is appreciated. I've been using the Python Window in ArcMap 10.3.
Currently I'm at:
arcpy.Select_analysis ("stateparks", "notrails", ''trail' is in \"SITE_NAME\"')

Although using the Select tool is a good choice, the syntax for the SQL expression can be a challenge. Consider using an Update Cursor to tackle this problem.
import arcpy
stateparks = r"C:\path\to\your\shapefile.shp"
notrails = r"C:\path\to\your\shapefile_without_trails.shp"
# Make a copy of your shapefile
arcpy.CopyFeatures_management(stateparks, notrails)
# Check if "trail" exists in the string--delete row if so
with arcpy.da.UpdateCursor(notrails, "SITE_NAME") as cursor:
for row in cursor:
if "trails" in row[0]: # row[0] refers to the current row in the "SITE_NAME" field
cursor.deleteRow() # Delete the row if condition is true

Related

Using "contains (text)" to find parent and following sibling in selenium with Python?

So I'm trying to build a tool to transfer tickets that I sell. A sale comes into my POS, I do an API call for the section, row, and seat numbers ordered (as well as other information obviously). Using the section, row, and seat number, I want to plug those values into a contains (text) statement to in order to find and select the right tickets on the host site.
Here is a sample of how the tickets are laid out:
And here is a screenshot (sorry if this is inconvenient) of the DOM related to one of the rows above:
Given this, how should I structure my contains(text) statement so that it is able to find and select the correct seats? I am very new/inexperienced with automation. I messed around with it a few months ago with some success and have managed to get a tool that gets me right up to selecting the seats but the "div" path confuses me when it comes to searching for text that is tied to other text.
I tried the following structure:
for i in range(int(lowseat), int(highseat)):
web.find_element_by_xpath('//*[contains (text(), "'+section+'")]/following-sibling::[contains text(), "'+row+'")]/following-sibling::[contains text(), "'+str(i)+'")]').click()
to no avail. Can someone help me explain how to structure these statements correctly so that it searches for section, row, and seat number correctly?
Thanks!
Also, if needed, here is a screenshot with more context of the button (in cases its needed). Button is highlighted in sky blue:
you can't use text() for that because it's in nested elements. You probably want to map all these into dicts and select with filter.
Update
Here's an idea for a lazy way to do this (untested):
button = driver.execute_script('''
return [...document.querySelectorAll('button')].find(b => {
return b.innerText.match(/Section 107\b.*Row P.*Seat 10\b/)
})
''')

How can I sort posts alphabetically in flask?

I have been following the tutorial provided by Flask. I'm trying to change things around a bit and make it fit the criterion for a glossary.
I suspect that my issue lies in this line of code in my flaskr.py file:
cur = db.execute('select title, text from entries order by id desc')
The reason why I suspect this is because when I mess with it it breaks everything. As well, when I tried to "sort" everything it did nothing, oh and it says to order by id descending... that's mainly why.
What I tried was:
#app.route('/order', methods=['POST'])
def order_entry():
entries.sort()
return entries
Which is probably crude and sort of silly, but I'm particularly new to programming. I can't find any other places in my code where entries are being ordered.
I have looked for different ways to organize a dictionary alphabetically but haven't had too much luck making it work. As you can tell.
Assuming this is the Flask tutorial you're following, I think your function is missing some things. Is entries some sort of global variable, or did you just remove the part where it was created? I've tried to combine your code with one of the examples from the tutorial, and added some comments.
#app.route('/order', methods=['POST'])
def order_entry():
# the following line creates a 'cursor' which you need to retrieve data
# from the database
cur = g.db.execute('select title, text from entries order by id desc')
# the following line uses that cursor ("cur"), fetches the data,
# turns it into a (unsorted) list of dictionaries
entries = [dict(title=row[0], text=row[1]) for row in cur.fetchall()]
# let's sort the list by the 'title' attribute now
entries = sorted(entries, key=lambda d: d['title'])
# or if you prefer, you could say: "entries.sort(key=lambda d:d['title']"
# return the template with the sorted entries in
return render_template('show_entries.html', entries=entries)
Now, I don't know know Flask at all, but I think this is the gist of what you want to do.
You may want to go through some Python tutorials (before tackling Flask), since there are a few basic concepts that, once you grasp, I think will make everything else much easier.

querying all fields in neo4j index using python-embedded bindings

I'm trying to query a node index across all fields. This is what I thought would work:
idx = db.node.indexes.get('myindex')
idx.query('*:search_query')
But this returns no results. However, this works
idx = db.node.indexes.get('myindex')
idx.query('*:*')
And it returns all the nodes in the index as expected. Am I wrong in assuming that the first version should work at all?
I don't expect the first version to work, and am surprised the second does. Neo4j parses those queries using this Lucene syntax- I don't see anything about wildcard fields. Instead, remove the field to search against an implied "all fields".
Plug - for an easier way to build Lucene queries (compatible with Neo4j), check out lucene-querybuilder. It's used by neo4j-rest-client and neo4django.
EDIT:
I can't seem to find support for the "all fields" implicit search I thought existed- sorry! I guess you'll just have to manually include all fields in the query (eg, "name:falmarri OR userType:falmarri").

Parsing SQL with Python

I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner.
I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree.
I have never implemented a parser before, and I would therefore like some advice on how to best implement a SQL parser and evaluator.
Does the approach described above sound about right?
Are there other tools/libraries I should look into? Like PLY or Pyparsing.
Pointers to articles, books or source code that will help me is appreciated.
Update:
I implemented a simple SQL parser using pyparsing. Combined with Python code that implement the relational operations against my data store, this was fairly simple.
As I said in one of the comments, the point of the exercise was to make the data available to reporting engines. To do this, I probably will need to implement an ODBC driver. This is probably a lot of work.
I have looked into this issue quite extensively. Python-sqlparse is a non validating parser which is not really what you need. The examples in antlr need a lot of work to convert to a nice ast in python. The sql standard grammars are here, but it would be a full time job to convert them yourself and it is likely that you would only need a subset of them i.e no joins. You could try looking at the gadfly (a Python SQL database) as well, but I avoided it as they used their own parsing tool.
For my case, I only essentially needed a where clause. I tried booleneo (a boolean expression parser) written with pyparsing but ended up using pyparsing from scratch. The first link in the reddit post of Mark Rushakoff gives a SQL example using it. Whoosh a full text search engine also uses it but I have not looked at the source to see how.
Pyparsing is very easy to use and you can very easily customize it to not be exactly the same as SQL (most of the syntax you will not need). I did not like ply as it uses some magic using naming conventions.
In short give pyparsing a try, it will most likely be powerful enough to do what you need and the simple integration with python (with easy callbacks and error handling) will make the experience pretty painless.
This reddit post suggests python-sqlparse as an existing implementation, among a couple other links.
TwoLaid's Python SQL Parser works very well for my purposes. It's written in C and needs to be compiled. It is robust. It parses out individual elements of each clause.
https://github.com/TwoLaid/python-sqlparser
I'm using it to parse out queries column names to use in report headers. Here is an example.
import sqlparser
def get_query_columns(sql):
'''Return a list of column headers from given sqls select clause'''
columns = []
parser = sqlparser.Parser()
# Parser does not like new lines
sql2 = sql.replace('\n', ' ')
# Check for syntax errors
if parser.check_syntax(sql2) != 0:
raise Exception('get_query_columns: SQL invalid.')
stmt = parser.get_statement(0)
root = stmt.get_root()
qcolumns = root.__dict__['resultColumnList']
for qcolumn in qcolumns.list:
if qcolumn.aliasClause:
alias = qcolumn.aliasClause.get_text()
columns.append(alias)
else:
name = qcolumn.get_text()
name = name.split('.')[-1] # remove table alias
columns.append(name)
return columns
sql = '''
SELECT
a.a,
replace(coalesce(a.b, 'x'), 'x', 'y') as jim,
a.bla as sally -- some comment
FROM
table_a as a
WHERE
c > 20
'''
print get_query_columns(sql)
# output: ['a', 'jim', 'sally']
Of course, it may be best to leverage python-sqlparse on Google Code
UPDATE: Now I see that this has been suggested - I concur that this is worthwhile:
I am using python-sqlparse with great success.
In my case I am working with queries that are already validated, my AST-walking code can make some sane assumptions about the structure.
https://pypi.org/project/sqlparse/
https://sqlparse.readthedocs.io/en/latest/

Extracting data from MS Word

I am looking for a way to extract / scrape data from Word files into a database. Our corporate procedures have Minutes of Meetings with clients documented in MS Word files, mostly due to history and inertia.
I want to be able to pull the action items from these meeting minutes into a database so that we can access them from a web-interface, turn them into tasks and update them as they are completed.
Which is the best way to do this:
VBA macro from inside Word to create CSV and then upload to the DB?
VBA macro in Word with connection to DB (how does one connect to MySQL from VBA?)
Python script via win32com then upload to DB?
The last one is attractive to me as the web-interface is being built with Django, but I've never used win32com or tried scripting Word from python.
EDIT: I've started extracting the text with VBA because it makes it a little easier to deal with the Word Object Model. I am having a problem though - all the text is in Tables, and when I pull the strings out of the CELLS I want, I get a strange little box character at the end of each string. My code looks like:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What's up with the little control character box? Is some kind of character code coming across from Word?
Word has a little marker thingy that it puts at the end of every cell of text in a table.
It is used just like an end-of-paragraph marker in paragraphs: to store the formatting for the entire paragraph.
Just use the Left() function to strip it out, i.e.
Left(Target, Len(Target)-1))
By the way, instead of
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Try this:
For Each row in Application.ActiveDocument.Tables(2).Rows
Descr = row.Cells(2).Range.Text
Well, I've never scripted Word, but it's pretty easy to do simple stuff with win32com. Something like:
from win32com.client import Dispatch
word = Dispatch('Word.Application')
doc = word.Open('d:\\stuff\\myfile.doc')
doc.SaveAs(FileName='d:\\stuff\\text\\myfile.txt', FileFormat=?) # not sure what to use for ?
This is untested, but I think something like that will just open the file and save it as plain text (provided you can find the right fileformat) – you could then read the text into python and manipulate it from there. There is probably a way to grab the contents of the file directly, too, but I don't know it off hand; documentation can be hard to find, but if you've got VBA docs or experience, you should be able to carry them across.
Have a look at this post from a while ago: http://mail.python.org/pipermail/python-list/2002-October/168785.html Scroll down to COMTools.py; there's some good examples there.
You can also run makepy.py (part of the pythonwin distribution) to generate python "signatures" for the COM functions available, and then look through it as a kind of documentation.
You could use OpenOffice. It can open word files, and also can run python macros.
I'd say look at the related questions on the right -->
The top one seems to have some good ideas for going the python route.
how about saving the file as xml. then using python or something else and pull the data out of word and into the database.
It is possible to programmatically save a Word document as HTML and to import the table(s) contained into Access. This requires very little effort.

Categories

Resources