I am trying to write a python command that includes a logical query. I have series of tables and layers and would like to run a command on them when two shared variables have the same values. I created a list of those tables and layers:
pattern = '*Assess*.dbf'
files = []
tables = []
layernames = []
## create the loop element
for dirpath, dirnames, filenames in arcpy.da.Walk(inputFolder):
# append shapefile path to list
for filename in fnmatch.filter(filenames, pattern):
files.append(os.path.join(dirpath, filename))
tables = arcpy.ListTables()
for file in files:
names.append("\"TOWN_ID\" = " + str(int(os.path.basename(file).rstrip(os.path.splitext(file)[1])[1:4])))
layernames.append(str(int(os.path.basename(file).rstrip(os.path.splitext(file)[1])[1:4])))
The code is as followed:
for layername, table in zip(layernames, tables):
#print layername
#print table
#CF_QueryTool = "C:\\ArcGIS_Temp\\parcel_mosaic\\land_parcels_mosaic.gdb\\cf_QueryTool"
tablelist = [table, "TaxParcels_Layer.{}".format(layername)]
wherecluase = "TaxParcels_Layer.{}.LOC_ID = table.LOC_ID".format(layername)
print tablelist
print wherecluase
arcpy.MakeQueryTable_management(tablelist, "QueryTable.{}".format(layername), "ADD_VIRTUAL_KEY_FIELD", "", "" ,wherecluase)
print "QueryTable.{} created".format(layername)
The problem is, apparently there is something wrong with my wherecluase which made me wonder if I can actually use something like table.locid using a list of tables. Let me know if I'm making a mistake here. Thanks a lot!
Since "table" is both a SQL reserved keyword and invalid name within a file geodatabase (*.gdb), I assume you want the value of the variable table in your SQL WHERE clause and not the word "table." You need to add another argument specifier to your whereclause formatting:
>>> table = "myTable"
>>> layername = "myLayer"
>>>
>>> # original code
>>> whereclause = "TaxParcels_Layer.{}.LOC_ID = table.LOC_ID".format(layername)
>>> print(whereclause)
TaxParcels_Layer.myLayer.LOC_ID = table.LOC_ID
>>>
>>> # modified code
>>> whereclause = "TaxParcels_Layer.{}.LOC_ID = {}.LOC_ID".format(layername, table)
>>> print(whereclause)
TaxParcels_Layer.myLayer.LOC_ID = myTable.LOC_ID
>>>
Related
So I decided to start using List Comprehensions and after a little bit of googling I managed to do this:
# old way
loaded_sounds = []
for i in assetList_text:
if i.startswith("loaded_sound"):
loaded_sounds.append(i.split(',')[1])
# new way
loaded_sounds = [i.split(',')[1] for i in assetList_text if i.startswith("loaded_sound")]
Which works perfectly.
So I thought id continue on to the hard for loops and this is where the list comprehension result isn't matching the for loop result.
So this conversion is a little harder as it not only has 2 'if' statment's but its not actually appending the index either.
gsc_files = []
for i in assetList_text:
if ".gsc" in i:
d = i.split(',')[-1].replace("\n", "")
if d not in gsc_files:
gsc_files.append(d)
So this prints out: 6
But with this:
gsc_files = [i.split(',')[-1].replace("\n", "") for i in assetList_text if ".gsc" in i if i.split(',')[-1].replace("\n", "") not in gsc_files]
It prints out: 0
So I don't know where its going wrong?
Also whilst on the topic of List Comprehensions id like to know their vastness.
Could the following 2 for loops be converted to list comprehensions?
[1]
weapon_files = []
x = join(f"{WAW_ROOT_DIR}/raw/weapons/sp")
for path, subdirs, files in walk(x):
for fileName in files:
content = join(x, fileName)
if content not in weapon_files:
weapon_files.append(f"{WAW_ROOT_DIR}/raw/weapons/sp/{fileName}")
[2]
gsc_files_dir = []
for path in gsc_files:
if f"{CURRENT_SELECTED_MOD.lower()}" in path:
dir = join(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}")
gsc_files_dir.append(dir)
elif os.path.exists(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}"):
dir = join(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}")
gsc_files_dir.append(dir)
else:
dir = join(f"{WAW_ROOT_DIR}/raw/{path}")
gsc_files_dir.append(dir)
Regards,
Phil
EDIT: in response to "DialFrost's" question:
f1 = join(f"{WAW_ROOT_DIR}/zone_source/english/assetlist/{CURRENT_SELECTED_MOD}.csv")
f2 = join(f"{WAW_ROOT_DIR}/zone_source/english/assetlist/{CURRENT_SELECTED_MOD}_patch.csv")
f3 = join(f"{WAW_ROOT_DIR}/zone_source/english/assetinfo/{CURRENT_SELECTED_MOD}.csv")
f4 = join(f"{WAW_ROOT_DIR}/zone_source/english/assetinfo/{CURRENT_SELECTED_MOD}_patch.csv")
with open(f1, 'r') as assetList, open(f2, 'r') as assetListPatch, open(f3, 'r') as assetInfo, open(f4, 'r') as assetInfoPatch:
assetList_text = assetList.readlines()
assetListPatch_text = assetListPatch.readlines()
assetInfo_text = assetInfo.readlines()
assetInfoPatch_text = assetInfoPatch.readlines()
assetList_text is a large(3k+ lines) file.
So here's some info from assetList_text including the ".gsc" lines:
fx,weapon/shellejects/fx_smk_weapon_shell_eject
fx,weapon/shellejects/fx_smk_weapon_shell_emit
fx,weapon/shellejects/shotgun
fx,weapon/shellejects/shotgun_resting
fx,weapon/shellejects/shotgun_view
fx,weapon/shellejects/shotgun_view_blurred01
mptype,nazi_zombie_heroes
character,char_zomb_player_0
character,char_zomb_player_1
character,char_zomb_player_2
character,char_zomb_player_3
rawfile,animtrees/zombie_factory.atr
rawfile,clientscripts/_zombie_mode.csc
rawfile,clientscripts/createfx/dlc3_fx.csc
rawfile,clientscripts/createfx/free_city_fx.csc
rawfile,clientscripts/dlc3_code.csc
rawfile,clientscripts/dlc3_teleporter.csc
rawfile,clientscripts/free_city.csc
rawfile,clientscripts/free_city_amb.csc
rawfile,maps/createart/free_city_art.gsc
rawfile,maps/createfx/dlc3_fx.gsc
rawfile,maps/createfx/free_city_fx.gsc
rawfile,maps/dlc3_code.gsc
rawfile,maps/dlc3_teleporter.gsc
rawfile,maps/free_city.gsc
rawfile,rumble/flamethrower
rawfile,rumble/flamethrower_h.rmb
rawfile,rumble/flamethrower_l.rmb
rawfile,vision/zombie_factory.vision
Try replace two if statements to one, and add 'and' operator instead of second 'if'. Because I'm not sure what 2 'if' works fine in list-comprehension.
So, change this:
gsc_files = [i.split(',')[-1].replace("\n", "") for i in assetList_text if ".gsc" in i if i.split(',')[-1].replace("\n", "") not in gsc_files]
To this:
gsc_files = [i.split(',')[-1].replace("\n", "") for i in assetList_text if (".gsc" in i) and (i.split(',')[-1].replace("\n", "") not in gsc_files)]
'add' operator should compare correct two statements
And I'm not sure what try convert all for loops or just large parts of code in list-comprehension it is good idea, because this can make your code hard to read
[1] First for loop you mentioned!
weapon_files = []
x = join(f"{WAW_ROOT_DIR}/raw/weapons/sp")
for path, subdirs, files in walk(x):
for fileName in files:
content = join(x, fileName)
if content not in weapon_files:
weapon_files.append(f"{WAW_ROOT_DIR}/raw/weapons/sp/{fileName}")
Answer: Hope it would help!
At first, you are trying to access a list which is not created before the list comprehension starts. For your case, you try to create an empty list at first weapon_files = [], again list comprehension creates a new list weapon_files, but this time, we assign the list with the data that we need to append using :=operator. Refer https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions. Now everytime recursively list gets updated for the condition.
weapon_files = []
weapon_files = [weapon_files := f"{WAW_ROOT_DIR}/raw/weapons/sp/{fileName}" for path, subdirs, files in walk(x) for fileName in files if join(x, fileName) not in weapon_files]
[2] Second 'for loop' mentioned!
gsc_files_dir = []
for path in gsc_files:
if f"{CURRENT_SELECTED_MOD.lower()}" in path:
#condition 1 ref variable to append ==> var1
dir = join(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}")
gsc_files_dir.append(dir)
elif os.path.exists(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}"):
#condition 2 ref variable to append ==> var2
dir = join(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}")
gsc_files_dir.append(dir)
else:
#condition 3 ref variable to append ==> var3
dir = join(f"{WAW_ROOT_DIR}/raw/{path}")
gsc_files_dir.append(dir)
Answer: Trying list comprehension for this loop statement doesn't bring better readability but still, we can apply. I've shown the template answer using the above commented ref variable names.(var1, var2, var3).
gsc_files_dir = [var1 if f"{CURRENT_SELECTED_MOD.lower()}" in path else var2 if os.path.exists(f"{WAW_ROOT_DIR}/mods/{CURRENT_SELECTED_MOD}/{path}") else var3 for path in gsc_files]
I have a lot of .csv files and I'd like to parse the file names.
The file names are in this format:
name.surname.csv
How can I write a function that populates two variables with the components of the file name?
A = name
B = surname
Use str.split and unpack the result in A, B and another "anonymous" variable to store (and ignore) the extension.
filename = 'name.surname.csv'
A, B, _ = filename.split('.')
Try this, the name is split by . and stored in A and B
a="name.surname.csv"
A,B,C=a.split('.')
Of course, this assumes that your file name is in the form first.second.csv
If the file names always have the exact same form, with exactly two periods, then you can do:
>>> name, surname, ext = "john.doe.csv".split(".")
>>> name
'john'
>>> surname
'doe'
>>> ext
'csv'
>>>
Simple use str.split() method and this function.
def split_names(input:str):
splitted = input.split(".")
return splitted[0], splitted[1]
A, B = split_names("name.surname.csv")
First find all the files in your directory with the extention '.csv', then split it by '.'
import os
for file in os.listdir("/mydir"):
if file.endswith(".csv"):
# print the file name
print(os.path.join("/mydir", file))
# split the file name by '.'
name, surname, ext = file.split(".")
# print or append or whatever you will do with the result here
If you have file saved at a specific location in the system , then you have to first get only the file name :
# if filename = name.surname.csv then discard first two lines
filename = "C://CSVFolder//name.surname.csv"
absfilename = filename.split('//')[-1]
# by concept of packing unpacking
A,B,ext = absfilename.split('.')
else you can just provide
A,B,ext = "name.surname.csv".split('.')
print A,B,ext
Happy coding :)
So let assume we have such simple query:
Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;
The result should looks this way:
tb1 col1
tb1 col7
tb2 col2
tb2 col8
I've tried to solve this problem using some python library:
1) Even extracting only tables using sqlparse might be a huge problem. For example this official book doesn't work properly at all.
2) Using regular expression seems to be really hard to achieve.
3) But then I found this , that might help. However the problem is that I can't connect to any database and execute that query.
Any ideas?
sql-metadata is a Python library that uses a tokenized query returned by python-sqlparse and generates query metadata.
This metadata can return column and table names from your supplied SQL query. Here are a couple of example from the sql-metadata github readme:
>>> sql_metadata.get_query_columns("SELECT test, id FROM foo, bar")
[u'test', u'id']
>>> sql_metadata.get_query_tables("SELECT test, id FROM foo, bar")
[u'foo', u'bar']
>>> sql_metadata.get_query_limit_and_offset('SELECT foo_limit FROM bar_offset LIMIT 50 OFFSET 1000')
(50, 1000)
A hosted version of the library exists at sql-app.infocruncher.com to see if it works for you.
Really, this is no easy task. You could use a lexer (ply in this example) and define several rules to get several tokens out of a string. The following code defines these rules for the different parts of your SQL string and puts them back together as there could be aliases in the input string. As a result, you get a dictionary (result) with the different tablenames as key.
import ply.lex as lex, re
tokens = (
"TABLE",
"JOIN",
"COLUMN",
"TRASH"
)
tables = {"tables": {}, "alias": {}}
columns = []
t_TRASH = r"Select|on|=|;|\s+|,|\t|\r"
def t_TABLE(t):
r"from\s(\w+)\sas\s(\w+)"
regex = re.compile(t_TABLE.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_JOIN(t):
r"inner\s+join\s+(\w+)\s+as\s+(\w+)"
regex = re.compile(t_JOIN.__doc__)
m = regex.search(t.value)
if m is not None:
tbl = m.group(1)
alias = m.group(2)
tables["tables"][tbl] = ""
tables["alias"][alias] = tbl
return t
def t_COLUMN(t):
r"(\w+\.\w+)"
regex = re.compile(t_COLUMN.__doc__)
m = regex.search(t.value)
if m is not None:
t.value = m.group(1)
columns.append(t.value)
return t
def t_error(t):
raise TypeError("Unknown text '%s'" % (t.value,))
t.lexer.skip(len(t.value))
# here is where the magic starts
def mylex(inp):
lexer = lex.lex()
lexer.input(inp)
for token in lexer:
pass
result = {}
for col in columns:
tbl, c = col.split('.')
if tbl in tables["alias"].keys():
key = tables["alias"][tbl]
else:
key = tbl
if key in result:
result[key].append(c)
else:
result[key] = list()
result[key].append(c)
print result
# {'tb1': ['col1', 'col7'], 'tb2': ['col2', 'col8']}
string = "Select a.col1, b.col2 from tb1 as a inner join tb2 as b on tb1.col7 = tb2.col8;"
mylex(string)
moz-sql-parser is a python library to convert some subset of SQL-92 queries to JSON-izable parse trees. Maybe it what you want.
Here is an example.
>>> parse("SELECT id,name FROM dual WHERE id>3 and id<10 ORDER BY name")
{'select': [{'value': 'id'}, {'value': 'name'}], 'from': 'dual', 'where': {'and': [{'gt': ['id', 3]}, {'lt': ['id', 10]}]}, 'orderby': {'value': 'name'}}
I am tackling a similar problem and found a simpler solution and it seems to work well.
import re
def tables_in_query(sql_str):
# remove the /* */ comments
q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in q.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
q = " ".join([re.split("--|#", line)[0] for line in lines])
# split on blanks, parens and semicolons
tokens = re.split(r"[\s)(;]+", q)
# scan the tokens. if we see a FROM or JOIN, we set the get_next
# flag, and grab the next one (unless it's SELECT).
tables = set()
get_next = False
for tok in tokens:
if get_next:
if tok.lower() not in ["", "select"]:
tables.add(tok)
get_next = False
get_next = tok.lower() in ["from", "join"]
dictTables = dict()
for table in tables:
fields = []
for token in tokens:
if token.startswith(table):
if token != table:
fields.append(token)
if len(list(set(fields))) >= 1:
dictTables[table] = list(set(fields))
return dictTables
code adapted from https://grisha.org/blog/2016/11/14/table-names-from-sql/
Create a list of all the tables that are present in the DB. You can then search each table name in the queries.
This obviously isn't foolproof and the code will break in case any column/alias name matches the table name.
But it can be done as a workaround.
import pandas as pd
#%config PPMagics.autolimit=0
#txt = """<your SQL text here>"""
txt_1 = txt
replace_list = ['\n', '(', ')', '*', '=','-',';','/','.']
count = 0
for i in replace_list:
txt_1 = txt_1.replace(i, ' ')
txt_1 = txt_1.split()
res = []
for i in range(1, len(txt_1)):
if txt_1[i-1].lower() in ['from', 'join','table'] and txt_1[i].lower() != 'select':
count +=1
str_count = str(count)
res.append(txt_1[i] + "." + txt_1[i+1])
#df.head()
res_l = res
f_res_l = []
for i in range(0,len(res_l)):
if len(res_l[i]) > 15 : # change it to 0 is you want all the caught strings
f_res_l.append(res_l[i])
else :
pass
All_Table_List = f_res_l
print("All the unique tables from the SQL text, in the order of their appearence in the code : \n",100*'*')
df = pd.DataFrame(All_Table_List,columns=['Tables_Names'])
df.reset_index(level=0, inplace=True)
list_=list(df["Tables_Names"].unique())
df_1_Final = pd.DataFrame(list_,columns=['Tables_Names'])
df_1_Final.reset_index(level=0, inplace=True)
df_1_Final
Unfortunately, in order to do this successfully for "complex SQL" queries, you will more or less have to implement a complete parser for the particular database engine you are using.
As an example, consider this very basic complex query:
WITH a AS (
SELECT col1 AS c FROM b
)
SELECT c FROM a
In this case, a is not a table but a common table expression (CTE), and should be excluded from your output. There's no simple way of using regexp:es to realize that b is a table access but a is not - your code will really have to understand the SQL at a deeper level.
Also consider
SELECT * FROM tbl
You'd have to know the column names actually present in a particular instance of a database (and accessible to a particular user, too) to answer that correctly.
If by "works with complex SQL" you mean that it must work with any valid SQL statement, you also need to specify for which SQL dialect - or implement dialect-specific solutions. A solution which works with any SQL handled by a database that does not implement CTE:s would not work in one that does.
I am sorry to say so, but I do not think you will find a complete solution which works for arbitrarily complex SQL queries. You'll have to settle for a solution which works with a subset of a particular SQL-dialect.
For my simple use case (one table in query, no joins), I used the following tweak
lst = "select * from table".split(" ")
lst = [item for item in lst if len(item)>0]
table_name = lst[lst.index("from")+1]
I am trying to read in multiple files to populate multiple tables in sqlite using python. I am trying to do something like this.
def connect_db():
return sqlite3.connect(app.config['DATABASE']) # Flask stuff
# List of tables and corresponding filenames
file_list = ['A','B','C','D']
for file in file_list:
with closing(connect_db()) as db:
cursor = db.execute('select * from ' + file)
# Get column names for the current table
names = tuple(map(lambda x: x[0], cursor.description))
filename = file + '.txt'
with open(filename, 'r') as f:
data = [row.split('\t') for row in f.readlines()]
cur.executemany('INSERT INTO' + file + ' ' + str(names) + ' VALUES ' + ???)
Now I realized that in the executemany() statement above, I need to put the syntax (?,?,?) where ? is the number of columns. I can generate a tuple of ? but then they will have quotes around them. ('?', '?', '?'). Is there a way around this? Thanks.
You can implement a query "maker", that loop over a dict or an object properties and build VALUES string with it.
I did something like that on some project :
d = {
"id":0,
"name":"Foo",
"bar":"foobar"
}
sql = ""
for key, foo in d.iteritems():
# First, copy the current value
bar = copy.copy( foo )
# Test if number
try:
# try to increase the value of 1
bar += 1
except TypeError:
# if we can't, add double quote
foo = '"%s"' % foo
# concat with the previous sequences
sql = "%s%s=%s," % ( sql, key, foo )
# remove the latest comma ...
sql = sql[:-1]
So the value of "sql" at the end of the loop should be something like this
bar="foobar",id=0,name="Foo"
I have two lists:
link_ids = ['111','222','333']
filenames = ['111-foo.txt','111-bar.txt','222.txt']
I want to do two things. First, find the filenames that match the Link IDs. Second, create a list of the Link IDs that don't have matching files.
Its very simple, but it is doing my head in! This clearly doesn't do what its supposed to, but its the best I can come up with:
missing = []
for i in link_ids:
for f in filenames:
if i in f:
print 'match found'
else:
missing.append(i)
Please help if you can!
namedtuple is a good fit for this problem.
It gives you named-attributes without the additional overhead associated with a (non-optimised) class.
import collections, os
link_ids = ['111','222','333']
filenames = ['111-foo.txt','111-bar.txt','222.txt']
File = collections.namedtuple("File", "fname fext") # named-tuple set-up
files = {File(*os.path.splitext(f)) for f in filenames}
# -> set([File(fname='222', fext='.txt'),
# File(fname='111-bar', fext='.txt'),
# File(fname='111-foo', fext='.txt')])
"First, find the filenames that match the Link IDs.":
matched = [f for f in files if f.fname in link_ids]
# -> [File(fname='222', fext='.txt')]
"Second, create a list of the Link IDs that don't have matching files.":
unmatched = [l for l in link_ids if l not in {getattr(f,'fname') for f in files}]
# -> ['111', '333']
In a comment you mention wanting the full filename after matching.
For that you can do:
matched_filenames = [f.fname + f.fext for f in matched]
# -> ['222.txt']
I'm just starting to learn python, but I'll give it a shot...
Maybe you could use the set facilities ?
>>> file_set = {i[:-4] for i in filenames}
>>> matched_links = set(link_ids) & file_set
>>> unmatched_links = set(link_ids) - file_set
First make a list of all the filename IDs but without the '.txt' extension:
>>> link_ids = ['111','222','333']
>>> filenames = ['111.txt','222.txt']
>>> filename_ids = [i[:-4] for i in filenames]
>>> filename_ids
['111', '222']
Then you can create two lists: IDs that match and IDs that don't match:
>>> match_ids = [i for i in link_ids if i in filename_ids]
>>> match_ids
['111', '222']
>>> not_match_ids = [i for i in link_ids if i not in filename_ids]
>>> not_match_ids
['333']
link_ids = ['111','222','333']
filenames = ['111-foo.txt','111-bar.txt','222.txt']
missing = []
found = []
for i in link_ids:
for f in filenames:
if i in f:
print 'match found'
found.append(i)
missing = list(set(link_ids) - set(found))
print 'Missing link ids: ', missing
Output:
match found
match found
match found
Missing link ids: ['333']