Dealing with special characters in file paths - Python script and Athena Query

Dealing with special characters in file paths - Python script and Athena Query - python

I have a script that I use to parse out S3 info using Athena and S3 inventory. Currently when it finds a specific file type, the script will use that path to generate a SQL query that is then passed to Athena. Here's an example:
def getS3TagReportData(session, tag, table):
whereStatement = ""
for v in tag.values:
first = True
for p in v.paths:
# print(p) #iterate through paths
if first:
whereStatement = f"WHERE dt = (SELECT MAX(dt) FROM \"{awsAthenaDatabase}\".\"{table}\") AND key LIKE '{p}/%'"
first = False
else:
whereStatement += f" OR key LIKE '{p}%'"
This works fine until I hit a path where someone has put in an '
For example: bucketname\data\int'l hotel files\data.jpg
Then the query that gets dynamically generated is:
WHERE dt = (SELECT MAX(dt) FROM \"{awsAthenaDatabase}\".\"{table}\") AND key LIKE 'bucketname\data\int'l hotel files\data.jpg'
As you can see, the Where statement is going to fail because there is that extra ' in there. How can I programmatically work around these errors and report them?
I appreciate the help!

Related

How to replace search method with sql query in odoo

When I have too many product and I used the following filter, the loading time is long.So I want to replace with query.How can I change my search into query?I know the basic query but cant convert my search into query because I don't know how to use for with_context in query.
class valuation_report(models.Model):
_inherit = "stock.valuation.layer"
def _get_current_user(self):
for r in self:
r.user_id = self.env.user
def _search_branch(self, operator, value):
warehouse_id= self.env['stock.warehouse'].search([('branch_id','=',self.env.user.branch_id.id)])
product_ids = self.env['product.product'].with_context(warehouse=warehouse_id.ids).search([]).filtered(lambda p:p.qty_available > 0)
return [('product_id','in',product_ids.ids)]
user = fields.Many2one('res.users', compute=_get_current_user, search=_search_branch)

You can do two things:
Put log_level = debug_sql in your Odoo configuration file. This way you will see what SQL in the logs and this will guide you to understand what with_context is doing.
If you really need to transform everything into SQL, use self.env.cr.execute("SELECT ... FROM ...") and then rows = self.env.cr.dictfetchall()

AWS QLDB ion_document update

I have code that insert data into AWS QLDB using partial SQL and ion documents. Now want to update a document inside QLDB, and I can't find any example on it. Please help!
statement = 'INSERT INTO MY_TABLE'
ion_documents = loads(dumps(MY_JSON_DATA))
def query_lambda(tx_executor, query=statement, docs=ion_documents):
return tx_executor.execute_statement(query, [docs])
def retry_lambda():
print ('Retrying')
cursor = session.execute_lambda(query_lambda, retry_lambda)

As you note, you need to use PartiQL statements to update documents. The code snippet you have to insert a document is most of what you need to update it: the only change you need to make is the statement that you're executing.
The documentation has a Python tutorial which includes examples of updating documents: https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started.python.step-5.html.
For example (from the above link), the following would update the owner of a vehicle in the sample application:
def update_vehicle_registration(transaction_executor, vin, document_id):
statement = "UPDATE VehicleRegistration AS r SET r.Owners.PrimaryOwner.PersonId = ? WHERE r.VIN = ?"
parameters = [document_id, convert_object_to_ion(vin)]
cursor = transaction_executor.execute_statement(statement, parameters)
try:
print_result(cursor)
logger.info('Successfully transferred vehicle with VIN: {} to new owner.'.format(vin))
except StopIteration:
raise RuntimeError('Unable to transfer vehicle, could not find registration.')
Note the use of ? as bind parameters. These will be bound to the values passed into the second argument of execute_statement (in corresponding order).
Here is some information on PartiQL update statements: https://docs.aws.amazon.com/qldb/latest/developerguide/ql-reference.update.html. The syntax is:
UPDATE table [ AS table_alias ] [ BY id_alias ]
SET element = data [, element = data, ... ]
[ WHERE condition ]
The results of running an update statement will be the document id(s) that were affected by the update.

parsing .log files then sort in access

I'm writing a parsing program that that searches through 100+ .log files after some keyword, then puts the words in different array´s and separates the words in to columns in excel. Now I want to sort them in Access automatically so that I can process the different .log file combinations. I can "copy paste" from my Excel file to Access, but that so inefficient and gives some errors... I would like it to be "automatic". I'm new to Access and don´t know how to link from python to Access, I have tried doing it as I did to Excel but that didn't work and started looking in to OBDC but had some problems there to...
import glob # includes
import xlwt # includes
from os import listdir # includes
from os.path import isfile, join # includes
def logfile(filename, tester, createdate,completeresponse):
# Arrays for strings
response = []
message = []
test = []
date = []
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
for line in filesearch:
file = filename[39:] # extract filename [file]
for lines in filesearch:
if createdate in lines: # extract "Create Time" {date}
date.append(lines[15:34])
if completeresponse in lines:
response.append(lines[19:])
print('pending...')
i = 1 # set a number on log {i}
d = {}
for name in filename:
if not d.get(name, False):
d[name] = i
i += 1
if tester in line:
start = '-> '
end = ':\ ' # |<----------->|
number = line[line.find(start)+3: line.find(end)] #Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
test.append(number) # extract tester {test}
# |<--------------------------------------------
text = line[line.find(end)+3:] # Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
message.append(text)
with open('Excel.txt', 'a') as handler: # create .txt file
for i in range(len(message)):
# A B C D E
handler.write(f"{file}|{date[i]}|{i}|{test[i]}|{response[i]}")
# A = filename B = create time C = number in file D = tester E = Complete response
# open with 'w' to "reset" the file.
with open('Excel.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\Desktop\Access\*.log'):
logfile(filename, 'Sending Request: Tester ->', 'Create Time:','Complete Response:','Channel')
def if_number(s): # look if number or float
try:
float(s)
return True
except ValueError:
return False
# ----------------------------------------------
my_path = r"C:\Users\Desktop\Access" # directory
# search directory for .txt files
text_files = [join(my_path, f) for f in listdir(my_path) if isfile(join(my_path, f)) and '.txt' in f]
for text_file in text_files: # loop and open .txt document
with open(text_file, 'r+') as wordlist:
string = [] # array ot the saved string
for word in wordlist:
string.append(word.split('|')) # put word to string array
column_list = zip(*string) # make list of all string
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet('Tab')
worksheet.col(0) # construct cell
first_col = worksheet.col(0)
first_col.width = 256*50
second_col = worksheet.col(1)
second_col.width = 256*25
third_col = worksheet.col(2)
third_col.width = 256*10
fourth_col = worksheet.col(3)
fourth_col.width = 256*50
fifth_col = worksheet.col(4)
fifth_col.width = 256*100
i = 0 # choose column 0 = A, 3 = C etc
for column in column_list:
for item in range(len(column)):
value = column[item].strip()
if if_number(value):
worksheet.write(item, i, float(value)) # text / float
else:
worksheet.write(item, i, value) # number / int
i += 1
print('File:', text_files, 'Done')
workbook.save(text_file.replace('.txt', '.xls'))
Is there a way to automate the "copy paste"-command, if so how would that look like and work? and if that's something that can´t be done, some advice would help a lot!
EDIT
Thanks i have done som googling and thanks for your help! but now i get a a error... i still can´t send the information to the Access file, i get a syntax error. and i know it exist because i would want to uppdate the existing file... is there a command to "uppdate an exising Acces file"?
error
pyodbc.ProgrammingError: ('42S01', "[42S01] [Microsoft][ODBC Microsoft Access Driver] Table 'tblLogfile' already exists. (-1303) (SQLExecDirectW)")
code
import pyodbc
UDC = r'C:\Users\Documents\Access\UDC.accdb'
# DSN Connection
constr = " DSN=MS Access Database; DBQ={0};".format(UDC)
# DRIVER connection
constr = "DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};UID=admin;UserCommitSync=Yes;Threads=3;SafeTransactions=0;PageTimeout=5;MaxScanRows=8;MaxBufferSize=2048;FIL={MS Access};DriverId=25;DefaultDir=C:/USERS/DOCUMENTS/ACCESS;DBQ=C:/USERS/DOCUMENTS/ACCESS/UDC.accdb"
# Connect to database UDC and open cursor
db = pyodbc.connect(constr)
cursor = db.cursor()
sql = "SELECT * INTO [tblLogfile]" +\
"FROM [Excel 8.0;HDR=YES;Database=C:/Users/Documents/Access/Excel.xls.[Tab];"
cursor.execute(sql)
db.commit()
cursor.close()
db.close()

First, please note, MS Access, a database management system, is not MS Excel, a spreadsheet application. Access sits on top of a relational engine and maintains strict rules in data and relational integrity whereas in Excel anything can be written across cells or ranges of cells with no rules. Additionally, the Access object library (tabledefs, querydefs, forms, reports, macros, modules) is much different than the Excel object library (workbooks, worksheets, range, cell, etc.), so there is no one-to-one translation in code.
Specifically, for your Python project, consider pyodbc using a make-table query that runs a direct connection to the Excel workbook. Since MS Access' database is the ACE/JET engine (Windows .dll files, available on Windows machines regardless of Access install). One feature of this data store is the ability to connect to workbooks even text files! So really, MSAccess.exe is just a GUI console to view .mdb/.accdb files.
Below creates a new database table that replicates the specific workbook sheet data, assuming the sheet maintains:
tabular format beginning in A1 cell (no merged cells/repeating labels)
headers in top row (no spaces before or after or special characters !#$%^~<>)))
columns of consistent data type format (i.e., data integrity rules)
Python code
import pyodbc
databasename = 'C:\\Path\\To\\Database\\File.accdb'
# DSN Connection
constr = "DSN=MS Access Database;DBQ={0};".format(databasename)
# DRIVER CONNECTION
constr = "DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={0};".format(databasename)
# CONNECT TO DATABASE AND OPEN CURSOR
db = pyodbc.connect(constr)
cur = db.cursor()
# RUN MAKE-TABLE QUERY FROM EXCEL WORKBOOK SOURCE
# OLDER EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 8.0;HDR=Yes;Database=C:\Path\To\Workbook.xls].[SheetName$];"
# CURRENT EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 12.0 Xml;HDR=Yes;Database=C:\Path\To\Workbook.xlsx].[SheetName$];"
cursor.execute(sql)
db.commit()
cur.close()
db.close()

Almost certainly the answer from Parfait above is a better way to go, but for fun I'll leave my answer below
If you are willing to put in the time I think you need 3 things to complete the automation of what you want to do:
1) Send a string representation of your data to the windows Clipboard There is windows specific code for this, or you can just save yourself some time and use pyperclip
2) Learn VBA and use VBA to grab the string from the clipboard and process it. Here is some example VBA code that I used in excel the past to grab text from the Clipboard
Function GetTextFromClipBoard() As String
Dim MSForms_DataObject As New MSForms.DataObject
MSForms_DataObject.GetFromClipboard
GetTextFromClipBoard = MSForms_DataObject.GetText()
End Function
3) use pywin32 (I believe available easily with Anaconda) to automate the vba access calls from Python. This is probably going to be the hardest part as the specific call trees are (in my opinion) not well documented and takes a lot of poking and digging to figure out what exactly you need to do. Its painful to say the least, but use IPython to help you with visual cues of what methods your pywin32 objects have available.
As I look at the instructions above, I realize it may also be possible to skip the clipboard and just send the information directly from python to access via pywin32. If you do the clipboard route however, you can break the steps up.
send one dataset to the clipboard
grab and process the data using the VBA editor in Access
after you figure out 1 and 2, use pywin32 to bridge the gap
Good luck, and maybe write a blog post about it if you figure it out to share the details.

Looping SQL query in python

I am writing a python script which queries the database for a URL string. Below is my snippet.
db.execute('select sitevideobaseurl,videositestring '
'from site, video '
'where siteID =1 and site.SiteID=video.VideoSiteID limit 1')
result = db.fetchall()
filename = '/home/Site_info'
output = open(filename, "w")
for row in result:
videosite= row[0:2]
link = videosite[0].format(videosite[1])
full_link = link.replace("http://","https://")
print full_link
output.write("%s\n"%str(full_link))
output.close()
The query basically gives a URL link.It gives me baseURL from a table and the video site string from another table.
output: https://www.youtube.com/watch?v=uqcSJR_7fOc
SiteID is the primary key which is int and not in sequence.
I wish to loop this sql query to pick a new siteId for every execution so that i have unique site URL everytime and write all the results to a file.
desired output: https://www.youtube.com/watch?v=uqcSJR_7fOc
https://www.dailymotion.com/video/hdfchsldf0f
There are about 1178 records.
Thanks for your time and help in advance.

I'm not sure if I completely understand what you're trying to do. I think your goal is to get a list of all links to videos. You get a link to a video by joining the sitevideobaseurl from site and videositestring from video.
From my experience it's much easier to let the database do the heavy lifting, it's build for that. It should be more efficient to join the tables, return all the results and then looping trough them instead of making subsequent queries to the database for each row.
The code should look something like this: (Be careful, I didn't test this)
query = """
select s.sitevideobaseurl,
v.videositestring
from video as v
join site as s
on s.siteID = v.VideoSiteID
"""
db.execute(query)
result = db.fetchall()
filename = '/home/Site_info'
output = open(filename, "w")
for row in result:
link = "%s%s" % (row[0],row[1])
full_link = link.replace("http://","https://")
print full_link
output.write("%s\n" % str(full_link))
output.close()
If you have other reasons for wanting to fetch these ony by one an idea might be to fetch a list of all SiteIDs and store them in a list. Afterwards you start a loop for each item in that list and insert the id into the query via a parameterized query.

How can I store URLs from Amazon S3 bucket in Django sqlite db to display & comment on?

I've been using Django for a couple of days & setup a basic blog from a tutorial with django comments.
I've got a totally separate python script that generates screenshots and uploads them to Amazon S3, now I'd like my django app to display all the images in the bucket and use a comment system on the images. Preferably I'd do this by just storing the URLs in my sqlite db, which I've got hard-coded currently to display all images in the db and has comments enabled on these.
My model:
(Does this need a foreign key to the django comments or is that just part of the Django Magic?!)
class Image(models.Model):
imgUrl=models.CharField(max_length=200)
meta=models.CharField(max_length=300)
def __unicode__(self):
return self.imgUrl
My bucket structure:
https://s3-eu-west-1.amazonaws.com/bucket/revision/process/images.png
Almost all the tutorials and packages I'm finding are based on upload/download rather than a simple for keys in bucket type approach that I want.
One of my problems is understanding how I can integrate my Boto functions with Django if I'm using Base.html. In an earlier tutorial I had an index page which had a view and could call functions from there. But base doesn't need that so I'm starting to get a little lost.

haven't looked up if boto api changed, but this is how it worked last time i looked
from boto.s3.connection import S3Connection
from boto.s3.key import Key
import s3config
conn = S3Connection(s3config.passwd, s3config.secret)
bucket = conn.get_bucket(s3config.bucket)
s3_path = '/some/path/in/your/bucket'
keys = bucket.list(s3_path)
# or if you want all keys:
# keys = bucket.get_all_keys()
for key in keys:
print key
# here you can download or do other stuff
# with the keys like get some metadata
print key.name
print key.etag
print key.size
print key.last_modified
#s3config.py
passwd = 'BLABALBALABALA'
secret = 'xvdwv3efefefefefef'
bucket = 'name-of-your-bucket'
Update:
Amazon s3 is a key value store, where key is a string. So nothing prevents you from putting in keys like:
/this/string/key/looks/like/a/unix/path
/folder/images/fileA.jpg
/folder/images/fileB.jpg
/folder/images/folderX/fileX1.jpg
now bucket.list(prefix="/folder/images/") would yield the latter three.
Look here for further details:
http://readthedocs.org/docs/boto/en/latest/ref/s3.html#boto-s3-bucket
http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTBucketGET.html?r=8270

This is my code to store result from s3 to mysql by boto, django.
from demo.models import Movies
import boto
from boto.s3.key import Key
import string
from django.db import connection, transaction
def movietitle(b):
key = b.get_key('netflix/movie_titles.txt')
content = key.get_contents_as_string()
line = content.split('\n')
args = []
for imovie in line:
if len(imovie) > 0:
imovie = imovie.split(',')
movieid = imovie[0]
year = imovie[1]
title = imovie[2]
iargs = [string.atoi(movieid),title,year]
args.append(iargs)
cursor = connection.cursor()
sql = "insert into demo_movies(MovieID,MovieName,ReleaseYear) values(%s,%s,%s)"
cursor.executemany(sql,args)
transaction.commit_unless_managed()
cursor.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dealing with special characters in file paths - Python script and Athena Query - python

Related

How to replace search method with sql query in odoo

AWS QLDB ion_document update

parsing .log files then sort in access

Looping SQL query in python

How can I store URLs from Amazon S3 bucket in Django sqlite db to display & comment on?

Categories

Resources