I am currently exporting a table from by Bigquery to G.C.S as another form of a backup. This is the code I have so far that saves the file name as "firebase_connectioninfo.csv".
# Export table to GCS as a CSV
data = 'dataworks-356fa'
destination = 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
def export_data_to_gcs(data, Firebase_ConnectionInfo, destination):
bigquery_client = bigquery.Client(data)
dataset = bigquery_client.dataset('FirebaseArchive')
table = dataset.table('Firebase_ConnectionInfo')
job_name = str(uuid.uuid4())
job = bigquery_client.extract_table_to_storage(
job_name, table, 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv')
job.source_format = 'CSV'
job.begin()
wait_for_job(job)
def wait_for_job(job):
while True:
job.reload()
if job.state == 'DONE':
if job.error_result:
raise RuntimeError(job.errors)
return
time.sleep(1)
export_data_to_gcs(data, 'Firebase_ConnectionInfo', destination)
I want this file to be named as "thedate_firebase_connectioninfo_backup". How do I add this command in a Python script?
So this is your string:
gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
What I would suggest is putting it into its own variable:
filename = 'gs://firebase_results/firebase_backups1/Firebase_ConnectionInfo.csv'
Additionally, we should put in a spot for the date. We can handle formatting the string a couple different ways, but this is my preferred method:
filename = 'gs://firebase_results/firebase_backups1/{date}-Firebase_ConnectionInfo.csv'
We can then call format() on the filename with the date like this:
from datetime import datetime
date = datetime.now().strftime("%M-%D-%Y")
filename.format(date=date)
Another way we could format the string would be the old string formatting style with %. I hate this method, but some people like it. I think it may be faster.
date = datetime.now().strftime("%M-%D-%Y")
filename = 'gs://firebase_results/firebase_backups1/%s-Firebase_ConnectionInfo.csv' % date
Or, you could use the other guy's answer and just add the strings like
"This " + "is " + "a " + "string."
outputs: "This is a string."
Try something like this:
import datetime
datestr = datetime.date.today().strftime("%B-%d-%Y")
destination = 'gs://firebase_results/firebase_backups1/' + datestr + '_Firebase_ConnectionInfo.csv'
Related
I am running into an issue with validating an input is a valid date, and if not coming back to the question for a retry.
I want to loop through this header and ask for input for each item.
header = [employee_id,name,address,ssn,date_of_birth,job_title,start_date,end_date]
The CSV is empty aside from the header, as I am appending these rows into via this program. I want the date_validator() to work for DOB, start_date and end_date, but so far i can get it to validate that the input is wrong, it just doesnt go back and ask for the input again.
Any help would be appreciated! thanks!
import csv
import datetime
def add_employee():
global date_answer
list = []
for i in range(len(header)):
var = header[i]
answer1 = input('Input Employees {}:'.format(var))
if "date" in header[i]:
date_answer = answer1
date_validater()
list.append(answer1)
with open('employees.csv','a',newline="") as f_object:
writer = csv.writer(f_object)
writer.writerow(list)
f_object.close()
print()
def date_validater():
# input date
date_string = date_answer
date_format = '%m/%d/%Y'
try:
dateObject = datetime.datetime.strptime(date_string, date_format)
print(dateObject)
except ValueError:
print("Incorrect data format, should be MM/DD/YYYY")
A couple hints:
Functions take parameters...
Globals are a terrible ideas most of the time. Avoid like the plague.
Pass the variables to the function to use it inside the function.
import csv
import datetime
def add_employee(header):
list = []
for i in range(len(header)):
var = header[i]
answer1 = input('Input Employees {}:'.format(var))
if "date" in header[i]:
date_validater(answer1)
list.append(answer1)
with open('employees.csv','a',newline="") as f_object:
writer = csv.writer(f_object)
writer.writerow(list)
f_object.close()
print()
def date_validater(date_string):
# input date
date_format = '%m/%d/%Y'
try:
dateObject = datetime.datetime.strptime(date_string, date_format)
print(dateObject)
except ValueError:
print("Incorrect data format, should be MM/DD/YYYY")
or something like that...
Below is a format of the DAG used,
I am using for loop over input files in the DAG and rename the output files using the
CurrentDateTime = datetime.datetime.today().strftime("%Y%m%d%H%M%S")
#if currentDateTIme = 20210406010203
outputfile = ''
outputfile1 = ''
outputfile2 = ''
inputfiles = ['input1', 'input2']
API_CALL_TASK1 = {
source : inputfile1
filename : outputfile1 #20210406010513
}
API_CALL_TASK2 = {
source : inputfile2
filename : outputfile2
}
for file in inputfiles:
if file == 'input1'
outputfile1 = f'inputFileName_{CurrentDateTime}' #20210406010303
outputfile = f'inputFileName_{CurrentDateTime}' #20210406010303
if file == 'input2'
outputfile2 = f'inputFileName_{CurrentDateTime}'
outputfile = f'inputFileName_{CurrentDateTime}'
MOVE_OUTPUT_TO_BUCKET_TASK = (
filename = f{outputfile} #20210406010423
)
MOVE_OUTPUT_TO_BUCKET_TASK >> API_CALL_TASK1 >> API_CALL_TASK1
Here in the tasks - API_CALL_TASK1, API_CALL_TASK2 and to the MOVE_OUTPUT_TO_BUCKET_TASK the datetime in the filename is different as there is difference in time when each task is triggered.
How to get the uniform datetime for each file?
I want to pass the same filename from the loop to the MOVE_OUTPUT_TO_BUCKET_TASK and API_CALL_TASK1 or API_CALL_TASK2
As I understand you want to pass the same datetime in the format YYYYMMDD for both outputs in the loop.
Airflow provides us a set of Default Variables, which can be used across all templates. So you can use it to pass a constant value throughout your template. In your case, I see that you want the execution date to be the suffix of your output. Thus, you should use ds_nodash, as per documentation, it retrieves the execution date as YYYYMMDD.
In case you are using a Python operator with **kwargs, you can access it as kwargs['ds_nodash'].
I'm trying to unload data from snowflakes to GCS, for that I'm using snowflakepython connector and python script. In the below python script in the file name 'LH_TBL_FIRST20200908' if the script runs today then the name will be same, if the script runs tomorrow then the file name should be 'LH_TBL_FIRST20200909' similarly if it runs day after then 'LH_TBL_FIRST202009010'.
Also please tell me if the code has any mistakes in it. Code is below
import snowflake.connector
# Gets the version
ctx = snowflake.connector.connect(
user='*****',
password='*******',
account='********',
warehouse='*******',
database='********',
schema='********'
)
cs = ctx.cursor()
sql = "copy into #unload_gcs/LH_TBL_FIRST20200908.csv.gz
from ( select * from TEST_BASE.LH_TBL_FIRST )
file_format =
( type=csv compression='gzip'
FIELD_DELIMITER = ','
field_optionally_enclosed_by='"'
NULL_IF=()
EMPTY_FIELD_AS_NULL = FALSE
)
single = fals
e max_file_size=5300000000
header = false;"
cur.execute(sql)
cur.close()
conn.close()
You can use f-strings to fill in (part of) your filename. Python has the datetime module to handle dates and times.
from datetime import datetime
date = datetime.now().strftime('%Y%m%d')
myFileName = f'LH_TBL_FIRST{date}.csv.gz'
print(myFileName)
>>> LH_TBL_FIRST20200908.csv.gz
As for errors in your code:
you declare your cursor as ctx.cursor() and further along you just use cur.execute(...) and cur.close(...). These won't work. Run your code to find the errors and fix them.
Edit suggested by #Lysergic:
If your python version is too old, you could use str.format().
myFileName = 'LH_TBL_FIRST{0}.csv.gz'.format(date)
from datetime import datetime
class FileNameWithDateTime(object):
def __init__(self, fileNameAppender, fileExtension="txt"):
self.fileNameAppender = fileNameAppender
self.fileExtension = fileExtension
def appendCurrentDateTimeInFileName(self,filePath):
currentTime = self.fileNameAppender
print(currentTime.strftime("%Y%m%d"))
filePath+=currentTime.strftime("%Y%m%d")
filePath+="."+self.fileExtension
try:
with open(filePath, "a") as fwrite1:
fwrite1.write(filePath)
except OSError as oserr:
print("Error while writing ",oserr)
I take the following approach
#defining what time/date related values your variable will contain
date_id = (datetime.today()).strftime('%Y%m%d')
Write the output file.
#Creating the filename
with open(date_id + "_" + "LH_TBL.csv.gz" 'w') as gzip:
output: YYYY/MM/DD _ filename
20200908_filename
Hi everyone this is my first time here, and I am a beginner in Python. I am in the middle of writing a program that returns a txt document containing information about a stock (Watchlist Info.txt), based on the input of another txt document containing the company names (Watchlist).
To achieve this, I have written 3 functions, of which 2 functions reuters_ticker() and stock_price() are completed as shown below:
def reuters_ticker(desired_search):
#from company name execute google search for and return reuters stock ticker
try:
from googlesearch import search
except ImportError:
print('No module named google found')
query = desired_search + ' reuters'
for j in search(query, tld="com.sg", num=1, stop=1, pause=2):
result = j
ticker = re.search(r'\w+\.\w+$', result)
return ticker.group()
Stock Price:
def stock_price(company, doc=None):
ticker = reuters_ticker(company)
request = 'https://www.reuters.com/companies/' + ticker
raw_main = pd.read_html(request)
data1 = raw_main[0]
data1.set_index(0, inplace=True)
data1 = data1.transpose()
data2 = raw_main[1]
data2.set_index(0, inplace=True)
data2 = data2.transpose()
stock_info = pd.concat([data1,data2], axis=1)
if doc == None:
print(company + '\n')
print('Previous Close: ' + str(stock_info['Previous Close'][1]))
print('Forward PE: ' + str(stock_info['Forward P/E'][1]))
print('Div Yield(%): ' + str(stock_info['Dividend (Yield %)'][1]))
else:
from datetime import date
with open(doc, 'a') as output:
output.write(date.today().strftime('%d/%m/%y') + '\t' + str(stock_info['Previous Close'][1]) + '\t' + str(stock_info['Forward P/E'][1]) + '\t' + '\t' + str(stock_info['Dividend (Yield %)'][1]) + '\n')
output.close()
The 3rd function, watchlist_report(), is where I am getting problems with writing the information in the format as desired.
def watchlist_report(watchlist):
with open(watchlist, 'r') as companies, open('Watchlist Info.txt', 'a') as output:
searches = companies.read()
x = searches.split('\n')
for i in x:
output.write(i + ':\n')
stock_price(i, doc='Watchlist Info.txt')
output.write('\n')
When I run watchlist_report('Watchlist.txt'), where Watchlist.txt contains 'Apple' and 'Facebook' each on new lines, my output is this:
26/04/20 275.03 22.26 1.12
26/04/20 185.13 24.72 --
Apple:
Facebook:
Instead of what I want and would expect based on the code I have written in watchlist_report():
Apple:
26/04/20 275.03 22.26 1.12
Facebook:
26/04/20 185.13 24.72 --
Therefore, my questions are:
1) Why is my output formatted this way?
2) Which part of my code do I have to change to make the written output in my desired format?
Any other suggestions about how I can clean my code and any libraries I can use to make my code nicer are also appreciated!
You handle two different file-handles - the file-handle inside your watchlist_report gets closed earlier so its being written first, before the outer functions file-handle gets closed, flushed and written.
Instead of creating a new open(..) in your function, pass the current file handle:
def watchlist_report(watchlist):
with open(watchlist, 'r') as companies, open('Watchlist Info.txt', 'a') as output:
searches = companies.read()
x = searches.split('\n')
for i in x:
output.write(i + ':\n')
stock_price(i, doc = output) # pass the file handle
output.write('\n')
Inside def stock_price(company, doc=None): use the provided filehandle:
def stock_price(company, output = None): # changed name here
# [snip] - removed unrelated code for this answer for brevity sake
if output is None: # check for None using IS
print( ... ) # print whatever you like here
else:
from datetime import date
output.write( .... ) # write whatever you want it to write
# output.close() # do not close, the outer function does this
Do not close the file handle in the inner function, the context handling with(..) of the outer function does that for you.
The main takeaway for file handling is that things you write(..) to your file are not neccessarily placed there immediately. The filehandler chooses when to actually persist data to your disk, the latests it does that is when it goes out of scope (of the context handler) or when its internal buffer reaches some threshold so it "thinks" it is now prudent to alter to data on your disc. See How often does python flush to a file? for more infos.
Working on a script to collect users browser history with time stamps ( educational setting).
Firefox 3 history is kept in a sqlite file, and stamps are in UNIX epoch time... getting them and converting to readable format via a SQL command in python is pretty straightforward:
sql_select = """ SELECT datetime(moz_historyvisits.visit_date/1000000,'unixepoch','localtime'),
moz_places.url
FROM moz_places, moz_historyvisits
WHERE moz_places.id = moz_historyvisits.place_id
"""
get_hist = list(cursor.execute (sql_select))
Chrome also stores history in a sqlite file.. but it's history time stamp is apparently formatted as the number of microseconds since midnight UTC of 1 January 1601....
How can this timestamp be converted to a readable format as in the Firefox example (like 2010-01-23 11:22:09)? I am writing the script with python 2.5.x ( the version on OS X 10.5 ), and importing sqlite3 module....
Try this:
sql_select = """ SELECT datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime'),
url
FROM urls
ORDER BY last_visit_time DESC
"""
get_hist = list(cursor.execute (sql_select))
Or something along those lines
seems to be working for me.
This is a more pythonic and memory-friendly way to do what you described (by the way, thanks for the initial code!):
#!/usr/bin/env python
import os
import datetime
import sqlite3
import opster
from itertools import izip
SQL_TIME = 'SELECT time FROM info'
SQL_URL = 'SELECT c0url FROM pages_content'
def date_from_webkit(webkit_timestamp):
epoch_start = datetime.datetime(1601,1,1)
delta = datetime.timedelta(microseconds=int(webkit_timestamp))
return epoch_start + delta
#opster.command()
def import_history(*paths):
for path in paths:
assert os.path.exists(path)
c = sqlite3.connect(path)
times = (row[0] for row in c.execute(SQL_TIME))
urls = (row[0] for row in c.execute(SQL_URL))
for timestamp, url in izip(times, urls):
date_time = date_from_webkit(timestamp)
print date_time, url
c.close()
if __name__=='__main__':
opster.dispatch()
The script can be used this way:
$ ./chrome-tools.py import-history ~/.config/chromium/Default/History* > history.txt
Of course Opster can be thrown out but seems handy to me :-)
The sqlite module returns datetime objects for datetime fields, which have a format method for printing readable strings called strftime.
You can do something like this once you have the recordset:
for record in get_hist:
date_string = record[0].strftime("%Y-%m-%d %H:%M:%S")
url = record[1]
This may not be the most Pythonic code in the world, but here's a solution: Cheated by adjusting for time zone (EST here) by doing this:
utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)
Here's the function : It assumes that the Chrome history file has been copied from another account into /Users/someuser/Documents/tmp/Chrome/History
def getcr():
connection = sqlite3.connect('/Users/someuser/Documents/tmp/Chrome/History')
cursor = connection.cursor()
get_time = list(cursor.execute("""SELECT last_visit_time FROM urls"""))
get_url = list(cursor.execute("""SELECT url from urls"""))
stripped_time = []
crf = open ('/Users/someuser/Documents/tmp/cr/cr_hist.txt','w' )
itr = iter(get_time)
itr2 = iter(get_url)
while True:
try:
newdate = str(itr.next())
stripped1 = newdate.strip(' (),L')
ms = int(stripped1)
utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)
stripped_time.append(str(utctime))
newurl = str(itr2.next())
stripped_url = newurl.strip(' ()')
stripped_time.append(str(stripped_url))
crf.write('\n')
crf.write(str(utctime))
crf.write('\n')
crf.write(str(newurl))
crf.write('\n')
crf.write('\n')
crf.write('********* Next Entry *********')
crf.write('\n')
except StopIteration:
break
crf.close()
shutil.copy('/Users/someuser/Documents/tmp/cr/cr_hist.txt' , '/Users/parent/Documents/Chrome_History_Logs')
os.rename('/Users/someuser/Documents/Chrome_History_Logs/cr_hist.txt','/Users/someuser/Documents/Chrome_History_Logs/%s.txt' % formatdate)