How to pass the date parameter in pyspark query using Jupyter notebook? - python

I want to pass the date parameter in below query in jupyter notebook but its not working as the way its mentioned below. Dont know where the problem is lying.
filedate = '2022-11-15'
query = """(select * from db.xyz
where name = 'Tom'
and login = '{filedate}') as salary"""
df = spark.read.format("jdbc")\
.option("url", jdbc_url)\
.option("driver",jdbc_driver)\
.option("dbtable" ,query).load()

You are missing an f-string:
filedate = '2022-11-15'
query = f"""(select * from db.xyz
where name = 'Tom'
and login = '{filedate}') as salary"""

Related

removing data from SQL table using python

I'm trying to achieve something with a function I have.
so as you can see here:
this is an SQL table with data. when I upload 2 docs I get it on doc0 and doc1 and all the others are null.
but what I want to do is if I upload only 2 docs the rest will be removed completely from the SQL table.
this is my code:
def submit_quality_dept_application(request, application_id):
n = int(request.data['length'])
application = Application.objects.get(id=application_id)
application_state = application.application_state
teaching_feedback = request.FILES['teaching-feedback']
application_state['teaching_feedback'] = teaching_feedback.name
now = datetime.now()
dt_string = now.strftime("%Y-%m-%d %H:%M:%S")
application_state['edited_time'] = dt_string
for i in range(5):
application_state[f'doc{i}'] = None
for i in range(n):
doc = request.FILES[f'doc{i}']
application_state[f'doc{i}'] = doc.name
copy_to_application_directory(doc, application.id)
copy_to_application_directory(teaching_feedback, application.id)
ApplicationStep.objects.update_or_create(
application=application, step_name=Step.STEP_7
)
Application.objects.filter(id=application_id).update(application_state=application_state)
return Response(n, status=status.HTTP_200_OK)
what should I do to achieve it?
thank you so much for your help!

Why does my pandas dataframe have two headers?

def get_all_rows(conn):
df6 = pd.read_sql_query("SELECT * FROM Outofcountry", conn)
print(df6)
return
ComputerName ConnectTime lastExtIP latestCountry latestRegion latestCity Name CurrentLogonUser LastLogonUser PrimaryUser UserName
0 ComputerName ConnectTime_decimal lastExtIP latestCountry latestRegion latestCity Name CurrentLogonUser LastLogonUser PrimaryUser UserName\n
Id like to be able to get rid of the 2nd row...
this is the code block:
def get_all_rows(conn):
df6 = pd.read_sql_query("SELECT * FROM Outofcountry", conn)
print(df6)
return
d1 = pd.read_csv("CS_Out_Of_Country.csv", mangle_dupe_cols='True', encoding='windows-1252')
i tried adding this and with False but it doesnt do anything
would like it to be just one output for the header:
ComputerName ConnectTime lastExtIP latestCountry latestRegion latestCity Name CurrentLogonUser LastLogonUser PrimaryUser UserName
It seems like the header appears two times in your SQL table. You could dust do:
df6.drop([0], inplace=True)

How to add current date in filename in python script

I'm trying to unload data from snowflakes to GCS, for that I'm using snowflakepython connector and python script. In the below python script in the file name 'LH_TBL_FIRST20200908' if the script runs today then the name will be same, if the script runs tomorrow then the file name should be 'LH_TBL_FIRST20200909' similarly if it runs day after then 'LH_TBL_FIRST202009010'.
Also please tell me if the code has any mistakes in it. Code is below
import snowflake.connector
# Gets the version
ctx = snowflake.connector.connect(
user='*****',
password='*******',
account='********',
warehouse='*******',
database='********',
schema='********'
)
cs = ctx.cursor()
sql = "copy into #unload_gcs/LH_TBL_FIRST20200908.csv.gz
from ( select * from TEST_BASE.LH_TBL_FIRST )
file_format =
( type=csv compression='gzip'
FIELD_DELIMITER = ','
field_optionally_enclosed_by='"'
NULL_IF=()
EMPTY_FIELD_AS_NULL = FALSE
)
single = fals
e max_file_size=5300000000
header = false;"
cur.execute(sql)
cur.close()
conn.close()
You can use f-strings to fill in (part of) your filename. Python has the datetime module to handle dates and times.
from datetime import datetime
date = datetime.now().strftime('%Y%m%d')
myFileName = f'LH_TBL_FIRST{date}.csv.gz'
print(myFileName)
>>> LH_TBL_FIRST20200908.csv.gz
As for errors in your code:
you declare your cursor as ctx.cursor() and further along you just use cur.execute(...) and cur.close(...). These won't work. Run your code to find the errors and fix them.
Edit suggested by #Lysergic:
If your python version is too old, you could use str.format().
myFileName = 'LH_TBL_FIRST{0}.csv.gz'.format(date)
from datetime import datetime
class FileNameWithDateTime(object):
def __init__(self, fileNameAppender, fileExtension="txt"):
self.fileNameAppender = fileNameAppender
self.fileExtension = fileExtension
def appendCurrentDateTimeInFileName(self,filePath):
currentTime = self.fileNameAppender
print(currentTime.strftime("%Y%m%d"))
filePath+=currentTime.strftime("%Y%m%d")
filePath+="."+self.fileExtension
try:
with open(filePath, "a") as fwrite1:
fwrite1.write(filePath)
except OSError as oserr:
print("Error while writing ",oserr)
I take the following approach
#defining what time/date related values your variable will contain
date_id = (datetime.today()).strftime('%Y%m%d')
Write the output file.
#Creating the filename
with open(date_id + "_" + "LH_TBL.csv.gz" 'w') as gzip:
output: YYYY/MM/DD _ filename
20200908_filename

Unable to insert caluclated field in PIVOT TABLE created using win32COM python library

I am trying to insert a calculated field in PIVOT TABLE created using win32com python library. But when i execute my code excel gives me error "References, names and arrays are not supported in Pivot Table formulas"
import win32com.client
Excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
win32c = win32com.client.constants
Wb = Excel.Workbooks.Open('MyWorkbook')
Excel.Visible = True
Ws = Wb.Sheets('PR Jan20')
Wb.Sheets.Add()
Wb.ActiveSheet.Name = 'PivotSheet'
WsP = Wb.Sheets('PivotSheet')
MaxR = Ws.UsedRange.Rows.Count
MaxC = Ws.UsedRange.Columns.Count
C1 = Ws.Cells(1,1)
C2 = Ws.Cells(MaxR, MaxC)
PivotSourceRange = Ws.Range(C1,C2)
PCache = Wb.PivotCaches().Create(SourceType=win32c.xlDatabase, SourceData=PivotSourceRange,Version=win32c.xlPivotTableVersion14)
PTable = PCache.CreatePivotTable(TableDestination=WsP.Range('B2'), TableName='RegisterPivot', DefaultVersion=win32c.xlPivotTableVersion14)
PTable.PivotFields('Party').Orientation = win32c.xlRowField
PTable.PivotFields('Party').Position = 1
PTable.AddDataField(PTable.PivotFields('Gross Kgs'))
PTable.AddDataField(PTable.PivotFields('Amount (RS.)'))
#till above this line code is working fine
#this below line is causing issue
PTable.CalculatedFields().Add('Average Purchase Rate', '= Amount (RS.) / Gross Kgs')
'Excel Error'
I have managed to resolve the above issue. The problem was with the column name "Amount (RS.)" i renamed the column to "Amount" and everything worked fine. I think VBA is not comfortable with () parenthesis in Pivot Field name.

Save a file name as "date - backup"

I am currently exporting a table from by Bigquery to G.C.S as another form of a backup. This is the code I have so far that saves the file name as "firebase_connectioninfo.csv".
# Export table to GCS as a CSV
data = 'dataworks-356fa'
destination = 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
def export_data_to_gcs(data, Firebase_ConnectionInfo, destination):
bigquery_client = bigquery.Client(data)
dataset = bigquery_client.dataset('FirebaseArchive')
table = dataset.table('Firebase_ConnectionInfo')
job_name = str(uuid.uuid4())
job = bigquery_client.extract_table_to_storage(
job_name, table, 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv')
job.source_format = 'CSV'
job.begin()
wait_for_job(job)
def wait_for_job(job):
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
export_data_to_gcs(data, 'Firebase_ConnectionInfo', destination)
I want this file to be named as "thedate_firebase_connectioninfo_backup". How do I add this command in a Python script?
So this is your string:
gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
What I would suggest is putting it into its own variable:
filename = 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
Additionally, we should put in a spot for the date. We can handle formatting the string a couple different ways, but this is my preferred method:
filename = 'gs://firebase_results/firebase_backups1/{date}-Firebase_ConnectionInfo.csv'
We can then call format() on the filename with the date like this:
from datetime import datetime
date = datetime.now().strftime("%M-%D-%Y")
filename.format(date=date)
Another way we could format the string would be the old string formatting style with %. I hate this method, but some people like it. I think it may be faster.
date = datetime.now().strftime("%M-%D-%Y")
filename = 'gs://firebase_results/firebase_backups1/%s-Firebase_ConnectionInfo.csv' % date
Or, you could use the other guy's answer and just add the strings like
"This " + "is " + "a " + "string."
outputs: "This is a string."
Try something like this:
import datetime
datestr = datetime.date.today().strftime("%B-%d-%Y")
destination = 'gs://firebase_results/firebase_backups1/' + datestr + '_Firebase_ConnectionInfo.csv'

Categories

Resources