How to escape table names in SqlAlchemy - python

I'm working on a SQLAlchemy dialect for Apache Drill and I've run into an issue that I can't quite seem to figure out.
The basic problem is that SQLAlchemy is generating a query like the one below:
SELECT `field1`, `field2`
FROM dfs.test.data.csv LIMIT 100
which fails because data.csv needs backticks around it as shown below:
SELECT `field1`, `field2`
FROM dfs.test.`data.csv` LIMIT 100
I've defined the various visit_() functions in the dialect's compiler but these seem to have no effect.

This took some time to figure out, and I thought I'd post the result so that if anyone else runs into this issue, they'll have a point of reference as to how to solve it.
Here is the final working code:
https://github.com/JohnOmernik/sqlalchemy-drill/blob/master/sqlalchemy_drill/base.py
Here is what ultimately solved the issue:
def __init__(self, dialect):
super(DrillIdentifierPreparer, self).__init__(dialect, initial_quote='`', final_quote='`')
def format_drill_table(self, schema, isFile=True):
formatted_schema = ""
num_dots = schema.count(".")
schema = schema.replace('`', '')
# For a file, the last section will be the file extension
schema_parts = schema.split('.')
if isFile and num_dots == 3:
# Case for File + Workspace
plugin = schema_parts[0]
workspace = schema_parts[1]
table = schema_parts[2] + "." + schema_parts[3]
formatted_schema = plugin + ".`" + workspace + "`.`" + table + "`"
elif isFile and num_dots == 2:
# Case for file and no workspace
plugin = schema_parts[0]
formatted_schema = plugin + "." + schema_parts[1] + ".`" + schema_parts[2] + "`"
else:
# Case for non-file plugins or incomplete schema parts
for part in schema_parts:
quoted_part = "`" + part + "`"
if len(formatted_schema) > 0:
formatted_schema += "." + quoted_part
else:
formatted_schema = quoted_part
return formatted_schema

Related

Avoid existing folders and bring only folders that don't exist

I have the code below which is bringing attachments into parent_directory using api connection.
Problem: The code works great but the only problem with this code is it gets stuck when there're existing folders.
Solution: How can make this code bypass the existing folders. So if the folder exists, then don't do anything just move to the next loop.
import pandas as pd
import os
import zipfile
parent_directory = "folderpath"
csv_file_dir = "myfilepath.csv"
user = "API_username"
key = "API_password"
os.chdir(parent_directory)
bdr_data = pd.read_csv(csv_file_dir)
api_first = "… " + user + ":" + key + "…"
for index, row in bdr_data.iterrows():
#print(row['url_attachment'])
name = row['Ref_Num']
os.makedirs(parent_directory + name)
os.chdir(parent_directory + name)
url = api_first + row['url_attachment'] + " -o attachments.zip"
os.system(url)
os.chdir(parent_directory)
You can do it like this.
for index, row in bdr_data.iterrows():
name = row['Ref_Num']
child_dir = (parent_directory + name)
if os.path.exists(child_dir): # check if folder exist.
print(f'{child_dir} already exist') # you may want to know what is skipped
continue # skip iteration.
os.makedirs(child_dir) # if folder not found, do what you need.

Where can I temporarily store file byte data in a python flask server?

Right now, I am using fine-uploader as my front-end and python-flask as my backend. What I need to do is that I need to send large files from the front end to my server. I can do this easily without concurrent chunking. But once I turn it on, the file gets corrupted. I think this is because when you concurrently chunk, I would guess part 7 could get added before part 5 gets added. Again, non-concurrent chunking is OK since it is sequential.
I need to know if there is some sort of temporary, global variable or way that I can store the chunk parts temporarily. I tried redis but unfortunately when I get the data from the redis, it shows a decoding error because probably redis tries to turn it into a string when I just need it to be bytes.
If all else fails, I'll just go to my last resort, which is to put the parts into their own little files then open them later on to combine them one by one.
Here is my code for your reference. It still has the redis methods on it.
def upload_temp_file_part(redis, request):
try:
# Remember the paramName was set to 'file', we can use that here to grab it
file = request.files['qqfile']
uuid = request.form['qquuid']
part = request.form['qqpartindex']
offset = request.form['qqpartbyteoffset']
key_content = 'file_content_' + uuid + part
key_offset = 'file_offset_' + uuid + part
value_content = file.stream.read()
value_offset = offset
logging.info("Setting part " + part + " of " + uuid)
redis.set(key_content, value_content)
redis.set(key_offset, value_offset)
except Exception as e:
logging.error(e)
def combine_temp_file_part(redis, request):
try:
uuid = request.form['qquuid']
total_parts = request.form['qqtotalparts']
save_path = os.path.join(os.getenv('UPLOAD_FOLDER_TEMP'), uuid)
with open(save_path, 'ab') as f:
for index in range(0, int(total_parts)):
key_content = 'file_content_' + uuid + str(index)
key_offset = 'file_offset_' + uuid + str(index)
logging.info("Get part " + str(index) + " of " + uuid)
value_content = redis.get(key_content)
value_offset = redis.get(key_offset)
if value_content is None or value_offset is None:
pass
# Throw Error
logging.info("Placing part " + str(index) + " of " + uuid)
f.seek(value_offset)
f.write(value_content)
redis.delete(value_offset)
redis.delete(value_content)
except Exception as e:
logging.error(e)

Data gets mixed up while trying to transfer it to arangodb

I'm trying to transfer ca. 10GB of json data (tweets in my case) to a collection in arangodb. I'm also trying to use joblib for it:
from ArangoConn import ArangoConn
import Userdata as U
import encodings
from joblib import Parallel,delayed
import json
from glob import glob
import time
def progress(total, prog, start, stri = ""):
if(prog == 0):
print("")
prog = 1;
perc = prog / total
diff = time.time() - start
rem = (diff / prog) * (total - prog)
bar = ""
for i in range(0,int(perc*20)):
bar = bar + "|"
for i in range(int(perc*20),20):
bar = bar + " "
print("\r"+"progress: " + "[" + bar + "] " + str(prog) + " of " +
str(total) + ": {0:.1f}% ".format(perc * 100) + "- " +
time.strftime("%H:%M:%S", time.gmtime(rem)) + " " + stri, end="")
def processfile(filepath):
file = open(filepath,encoding='utf-8')
s = file.read()
file.close()
data = json.loads(s)
Parallel(n_jobs=12, verbose=0, backend="threading"
(map(delayed(ArangoConn.createDocFromObject), data))
files = glob(U.path+'/*.json')
i = 1
j = len(files)
starttime = time.time()
for f in files:
progress(j,i,starttime,f)
i = i+1
processfile(f)
and
from pyArango.connection import Connection
import Userdata as U
import time
class ArangoConn:
def __init__(self,server,user,pw,db,collectionname):
self.server = server
self.user = user
self.pw = pw
self.db = db
self.collectionname = collectionname
self.connection = None
self.dbHandle = self.connect()
if not self.dbHandle.hasCollection(name=self.collectionname):
coll = self.dbHandle.createCollection(name=collectionname)
else:
coll = self.dbHandle.collections[collectionname]
self.collection = coll
def db_createDocFromObject(self, obj):
data = obj.__dict__()
doc = self.collection.createDocument()
for key,value in data.items():
doc[key] = value
doc._key= str(int(round(time.time() * 1000)))
doc.save()
def connect(self):
self.connection = Connection(arangoURL=self.server + ":8529",
username=self.user, password=self.pw)
if not self.connection.hasDatabase(self.db):
db = self.connection.createDatabase(name=self.db)
else:
db = self.connection.databases.get(self.db)
return db
def disconnect(self):
self.connection.disconnectSession()
def getAllData(self):
docs = []
for doc in self.collection.fetchAll():
docs.append(self.doc_to_result(doc))
return docs
def addData(self,obj):
self.db_createDocFromObject(obj)
def search(self,collection,search,prop):
docs = []
aql = """FOR q IN """+collection+""" FILTER q."""+prop+""" LIKE
"%"""+search+"""%" RETURN q"""
results = self.dbHandle.AQLQuery(aql, rawResults=False, batchSize=1)
for doc in results:
docs.append(self.doc_to_result(doc))
return docs
def doc_to_result(self,arangodoc):
modstore = arangodoc.getStore()
modstore["_key"] = arangodoc._key
return modstore
def db_createDocFromJson(self,json):
for d in json:
doc = self.collection.createDocument()
for key,value in d.items():
doc[key] = value
doc._key = str(int(round(time.time() * 1000)))
doc.save()
#staticmethod
def createDocFromObject(obj):
c = ArangoConn(U.url, U.user, U.pw, U.db, U.collection)
data = obj
doc = c.collection.createDocument()
for key, value in data.items():
doc[key] = value
doc._key = doc["id"]
doc.save()
c.connection.disconnectSession()
It kinda works like that. My problem is that the data that lands in the database is somehow mixed up.
as you can see in the screenshot "id" and "id_str" are not the same - as they should be.
what i investigated so far:
I thought that at some points the default keys in the databese may "collide"
because of the threading so I set the key to the tweet id.
I tried to do it without multiple threads. the threading doesn't seem to be
the problem
I looked at the data I send to the database... everything seems to be fine
But as soon as I communicate with the db the data mixes up.
My professor thought that maybe something in pyarango isn't threadsafe and it messes up the data but I don't think so as threading doesn't seem to be the problem.
I have no ideas left where this behavior could come from...
Any ideas?
The screenshot shows the following values:
id : 892886691937214500
id_str : 892886691937214465
It looks like somewhere along the way the value is converted to an IEEE754 double, which cannot safely represent the latter value. So there is potentially some precision loss due to conversion.
A quick example in node.js (JavaScript is using IEEE754 doubles for any number values greater than 0xffffffff) shows that this is likely the problem cause:
$ node
> 892886691937214500
892886691937214500
> 892886691937214465
892886691937214500
So the question is where the conversion does happen. Can you check whether the python client program is correctly sending the expected values to ArangoDB, or does it already send the converted/truncated values?
In general, any integer number that exceeds 0x7fffffffffffffff will be truncated when stored in ArangoDB, or converted to an IEEE754 double. This can be avoided by storing the number values inside a string, but of course comparing two number strings will produce different results than comparing two numbers (e.g. "10" < "9" vs. 10 > 9).

Why I keep getting this error " Wrong number of args calling Redis command From Lua script" even I get desired output

I trying to execute this lua script i get proper output too. But i keep getting Wrong number of args calling Redis command From Lua script
def new_get_following(self, start, count, user_id=0):
script = """
local envs = redis.call('zrevrange',KEYS[1],ARGV[3],ARGV[4]);
redis.call('sadd',ARGV[1],unpack(envs));
local favs = redis.call('sinter',ARGV[2],ARGV[1]);
local acts= redis.call('mget',unpack(envs));
redis.call('del',ARGV[1]);
return {favs,envs,acts}
"""
count = int(start) + int(count) - 1
print count
fav_key = self.fav_key + ":" + str(user_id)
following_stream_key = self.following_stream_key + ":" + str(user_id)
tmp_key = int(time.time())
return self.exectute(script, args=[tmp_key, fav_key, start, count], keys=[following_stream_key])
Maybe it is just a typo and has already been corrected but:
self.exectute shouldn't it be self.execute?
In the code last line is causing error.
local envs = redis.call('zrevrange',KEYS[1],ARGV[3],ARGV[4]);
local acts= redis.call('mget',unpack(envs));
As if envs is empty table then second line:
local acts= redis.call('mget',unpack(envs));
becomes this:
local acts= redis.call('mget',unpack());
so lua keeps throwing error. For avoid this error we can use redis.pacall which gives Response error object which can be checked in output can handle error. So final code should be
def new_get_following(self, start, count, user_id=0):
script = """
local envs = redis.call('zrevrange',KEYS[1],ARGV[3],ARGV[4]);
redis.call('sadd',ARGV[1],unpack(envs));
local favs = redis.call('sinter',ARGV[2],ARGV[1]);
local acts= redis.pcall('mget',unpack(envs));
redis.call('del',ARGV[1]);
return {favs,envs,acts}
"""
count = int(start) + int(count) - 1
print count
fav_key = self.fav_key + ":" + str(user_id)
following_stream_key = self.following_stream_key + ":" + str(user_id)
tmp_key = int(time.time())
return self.exectute(script, args=[tmp_key, fav_key, start, count], keys=[following_stream_key])

Handling Unicode string pulled from SOQL In Python

The purpose of the code is to use SOQL to query the SalesForce API, then to format the data and do some stuff before putting putting it into an oracle database. My code successfully handles the first and third part but the second part keeps breaking.
The code is using Python 2.7 with the standard C python compiler on Windows 7.
The SOQL is
SELECT ID, Name, Type, Description, StartDate, EndDate, Status
FROM CAMPAIGN
ORDER BY ID
This query pulls back a few hundred results in a JSON Dict.
I have to pull each record (Record contains ID, Name, Type, Description, StartDate, EndDate, and Status) one at a time and pass them to a function that generates the proper SQL to put the data in the proper Oracle Database. All of the results of the query come back as Unicode strings.
After I query the data and try to pass it to the function to generate the SQL to insert it into the Oracle database is where the trouble shows up.
Here is the section of code where the error occurs.
keys = ['attributes', 'Id', 'Name', 'Type', 'Description', 'StartDate', 'EndDate', 'Status']
for record in SrcData['records']: #Data cleaning in this loop.
processedRecs = []
if record['Description'] is not None:
record['Description'] = encodeStr(record['Description'])
record['Description'] = record['Description'][0:253]
for key in keys:
if key == 'attributes':
continue
elif key == 'StartDate' and record[key] is not None:
record[key] = datetime.datetime.strptime(record[key], "%Y-%m-%d")
elif key == 'EndDate' and record[key] is not None:
record[key] = datetime.datetime.strptime(record[key], "%Y-%m-%d")
else:
pass
processedRecs.append(record[key])
sqlFile.seek(0)
Query = RetrieveSQL(sqlFile, processedRecs)
The key list is because there was issues with looping on SrcData.keys().
the encodeStr function is:
def encodeStr(strToEncode):
if strToEncode == None:
return ""
else:
try:
tmpstr = strToEncode.encode('ascii', 'ignore')
tmpstr = ' '.join(tmpstr.split())
return tmpstr
except:
return str(strToEncode)
The error message I get is:
Traceback (most recent call last): File "XXX", line 106, in Query = ASPythonLib.RetrieveSQL(sqlFile, processedRecs), UnicodeEncodeError: ascii codec cant encode character u\u2026 in position 31: ordinal not in range(128)
the XXXX is just a file path to where this code is in our file system. Boss said I must remove the path.
I have also tried multiple variation of:
record['Description'] = record['Description'].encode('ascii', 'ignore').decode(encoding='ascii',errors='strict')
I have tried swapping the order of the encode and decode functions. I have tried different codecs and different error handling schemes.
****Edit**** This code works correct in like 20 other cycles so it's safe to assume the error is not in the RetrieveSQL().
Here is the code for RetrieveSQL:
def RetrieveSQL(SQLFile, VarList, Log = None):
SQLQuery = SQLFile.readline()
FileArgs = [""]
NumArgValues = len(VarList)
if( "{}" in SQLQuery ):
# NumFileArgs == 0
if (NumArgValues != 0):
print "Number of File Arguments is zero for File " + str(SQLFile) + " is NOT equal to the number of values provided per argument (" + str(NumArgValues) + ")."
return SQLFile.read()
elif( SQLQuery[0] != "{" ):
print "File " + str(SQLFile) + " is not an SQL source file."
return -1
elif( SQLQuery.startswith("{") ):
FileArgs = SQLQuery.replace("{", "").replace("}", "").split(", ")
for Arg in xrange(0, len(FileArgs)):
FileArgs[Arg] = "&" + FileArgs[Arg].replace("\n", "").replace("\t", "") + "&" # Add &'s for replacing
NumFileArgs = len(FileArgs)
if (NumFileArgs != NumArgValues):
if (NumArgValues == 0):
print "No values were supplied to RetrieveSQL() for File " + str(SQLFile) + " when there were supposed to be " + str(NumFileArgs) + " values."
return -1
elif (NumArgValues > 0):
"Number of File Arguments (" + str(NumFileArgs) + ") for File " + str(SQLFile) + " is NOT equal to the number of values provided per argument (" + str(NumArgValues) + ")."
return -1
SQLQuery = SQLFile.read()
VarList = list(VarList)
for Arg in xrange(0, len(FileArgs)):
if (VarList[Arg] == None):
SQLQuery = SQLQuery.replace(FileArgs[Arg], "NULL")
elif ("'" in str(VarList[Arg])):
SQLQuery = SQLQuery.replace(FileArgs[Arg], "'" + VarList[Arg].replace("'", "''") + "'")
elif ("&" in str(VarList[Arg])):
SQLQuery = SQLQuery.replace(FileArgs[Arg], "'" + VarList[Arg].replace("&", "&'||'") + "'")
elif (isinstance(VarList[Arg], basestring) == True):
VarList[Arg] = VarList[Arg].replace("'", "''")
SQLQuery = SQLQuery.replace(FileArgs[Arg], "'" + VarList[Arg] + "'")
else:
SQLQuery = SQLQuery.replace(FileArgs[Arg], str(VarList[Arg]))
SQLFile.seek(0)
return SQLQuery
****Edit #2 ****
Tried finding a complete traceback in logging files but the logging system for this script is terrible and never logs more than 'Cycle success' or 'Cycle Fail'. Ahh the fun of rewriting code written by people who don't know how to code.

Categories

Resources