That's the clearest I could make my title.
I have some code that reads in two CSV files. One CSV file has the data, and the other has information about this data... let's call it config.
data_jan2018.csv
data_feb2018.csv
config.csv
Now, config has columns for which dates I want to read in. I'm reading these in as follows:
data_config = pd.read_csv(loc + data_config_name)
# Read in dates from config file
dates = data_config.drop_duplicates('Start_date')
dates = dates[['Start_date','End_date']]
print(dates)
Start_date = dates['Start_date'].tolist()
End_date = dates['End_date'].tolist()
StartDate = ''.join(Start_date)
EndDate = ''.join(End_date)
print(StartDate)
print(EndDate)
date1 = datetime.strptime(StartDate, '%d%b%Y')
date2 = datetime.strptime(EndDate, '%d%b%Y')
# Loop across months
for dt in rrule.rrule(rrule.MONTHLY, dtstart=date1, until=date2):
print(dt)
reporting_date = dt.strftime('%d%b%Y')
reporting_date_fmt = dt.strftime(date_format)
print('Formatted reporting_date is ' + reporting_date_fmt)
source_data = pd.read_csv(loc + source_data_name)
source_data.columns = source_data.columns.str.lower()
As you can see, I want to read in a csv file called source_data_name. However, this file name contains my formatted reporting_date_fmt. I want the programmer to edit the file name at the beginning of the code so I have these line right at the top:
date_format = '%b%Y'
source_data_name = 'g4_RWA_sample_' + reporting_date_fmt + '.csv'
But of course this flags a warning, telling me reporting_date_fmt hasn't been created yet. Is there a workaround to this?
Define data name separately at the top of the file, then append the format and extension after the format has been defined.
data_name = 'g4_RWA_sample_'
...
source_data_name = data_name + reporting_date_fmt + '.csv'
Related
I have function in Python Pandas like below (it is sample):
def xyz():
df = pd.DataFrame({"a" : [1,1,1]})
TodaysDate = time.strftime("%Y-%m-%d")
excelfilename = "raport_" +TodaysDate +".xlsx"
df.to_excel(excelfilename, sheet_name="raport", index=True)
return df
Every time I run above function the existing excel file gets overwritten but I need a new excel file to be created every time I run the function. How can I modify my function to do that in Python ?
Maybe something like this. Depends on how many excel files you want to generate.
import random
excelfilename = "raport_" + str(random.randrange(9999)) +TodaysDate +".xlsx"
You can change TodaysDate = time.strftime("%Y-%m-%d") to TodaysDate = str(time.strftime("%Y-%m-%d %X")).replace(":", "") or TodaysDate = str(TodaysDate.strftime("%Y-%m-%d %H%M%S"))
This will give you an additional Hour/Minute/Seconds for the creation of your excel. So unless you are running this function multiple times a second this should cover your needs.
So you can do in this way:
def xyz(itr):
df = pd.DataFrame({"a" : [1,1,1]})
TodaysDate = time.strftime("%Y-%m-%d")
excelfilename = "raport_" +TodaysDate + str(itr) + ".xlsx"
df.to_excel(excelfilename, sheet_name="raport", index=True)
return df
for i in range(9): #or smth another iteration
xyz(i)
You should really use ArchAngelPwn's solution in my opinion. Applying random numbers might work but you still run the 1/1000 chance that one file will get overwritten. Also, you might not know which file belongs to which loop/run.
You could also save your file names in a list and check if it exists :
fn = []
def xyz():
df = pd.DataFrame({"a" : [1,1,1]})
TodaysDate = time.strftime("%Y-%m-%d")
excelfilename = "raport_" +TodaysDate +".xlsx"
temp = len(np.where((np.array(fn) == excelfilename)))
fn.append(excelfilename)
df.to_excel("raport_" +TodaysDate + "_" + temp + ".xlsx", sheet_name="raport", index=True)
return df
Its because % in your excelfilename becase you are using %X.
Just change:
TodaysDate = time.strftime("%Y-%m-%d %X")
To:
TodaysDate = time.strftime("%Y-%m-%d %H_%M_%S")
I have double checked and its worked on my side :))
For full code:
import pandas as pd
import time
def xyz():
df = pd.DataFrame({"a": [1, 1, 1]})
TodaysDate = time.strftime("%Y-%m-%d %H_%M_%S")
excelfilename = "raport_" + TodaysDate + ".xlsx"
df.to_excel(excelfilename, sheet_name="raport", index=True)
return df
xyz()
I am making code which generates a new text file with today's date each time it is run. For exemple today's file name would be 2020-10-05. I would like to increment it so that if the program is run one or more times the same day it becomes 2020-10-05_1, _2 etc..
I have this code that I found from another question and i've tried tinkering with it but I'm still stuck. The problem is here they convert the file name to an int 1,2,3 and this way it works but this isn't the result I want.
def incrementfile():
todayday = datetime.datetime.today().date()
output_folder = "//10.2.30.61/c$/Qlikview_Tropal/Raport/"
highest_num = 0
for f in os.listdir(output_folder):
if os.path.isfile(os.path.join(output_folder, f)):
file_name = os.path.splitext(f)[0]
try:
file_num = int(file_name)
if file_num > highest_num:
highest_num = file_num
except ValueError:
print("The file name %s is not an integer. Skipping" % file_name)
output_file = os.path.join(output_folder, str(highest_num + 1) + f"{todayday}" + ".txt")
return output_file
How can I modify this code so that the output I get in the end is something like 2020-10-05_0, _1, _2 etc.. ?
Thanks !
I strongly recommend you to use pathlib instead of os.path.join. This is more convenient.
def incrementfile():
td = datetime.datetime.today().date()
path = pathlib.Path("/tmp") #set your output folder isntead of /tmp
inc = len(list(path.glob(f"{td}*")))+1
outfile = path/f"{td}_{inc}.txt"
return outfile
Not a direct answer to your question, but instead of using _1, _2 etc, you could use a full timestamp with date and current time, which would avoid duplication, EG:
from datetime import datetime
t = str(datetime.now()).replace(":", "-").replace(" ", "_")
print(t)
Example output:
2020-10-05_13-06-53.825870
I think this will work-
import os
import datetime
#assuming files will be .txt format
def incrementfile():
output_folder = "//10.2.30.61/c$/Qlikview_Tropal/Raport/"
files=os.listdir(output_folder)
current_name=datetime.date.today().strftime('%Y-%m-%d_0')
current_num=1
def nameChecker(name,files):
return True if name +'.txt' in files else False
while namChecker(current_name,files):
current_name+='_'+str(current_num)
current_num+=1
return current_name+'.txt'
I have written a script which works but is not very elegant. It merges csv files, outputs a new file, filters that file to the required conditions, then outputs the filtered file, which is the file I want. I then repeat the process for every month.
Rather than altering this code to process every month (I have 5 more years worth of data to go), I would like to automate the path directory part and export csv file names that change from one month (and year) to the next.
See snippet of Jan and Feb below:
import os
import glob
import pandas as pd
import shutil
path = r"C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx01"
os.chdir(path)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv("201401.csv", index=False, encoding='utf-8-sig')
grab1 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx01\201401.csv'
move1 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-01.csv'
shutil.move(grab1,move1)
fd = pd.read_csv(r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-01.csv')
df = pd.DataFrame(fd)
irishsea = df[(df.lat_bin >= 5300) & (df.lat_bin <= 5500) & (df.lon_bin >= -650) & (df.lon_bin <= -250)]
irishsea.to_csv("2014-01_irishsea.csv", index=False, encoding='utf-8-sig')
grab2 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx01\2014-01_irishsea.csv'
move2 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-01-IrishSea.csv'
shutil.move(grab2,move2)
I then repeat it for Feb data but have to update the path locations.
#process feb data
path = r"C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx02"
os.chdir(path)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
combined_csv.to_csv("201402.csv", index=False, encoding='utf-8-sig')
grab1 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx02\201402.csv'
move1 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-02.csv'
shutil.move(grab1,move1)
fd = pd.read_csv(r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-02.csv')
df = pd.DataFrame(fd)
irishsea = df[(df.lat_bin >= 5300) & (df.lat_bin <= 5500) & (df.lon_bin >= -650) & (df.lon_bin <= -250)]
irishsea.to_csv("2014-02_irishsea.csv", index=False, encoding='utf-8-sig')
grab2 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\20xx02\2014-02_irishsea.csv'
move2 = r'C:\Users\jonathan.capanda\Documents\Fishing_DataBase\gfw_data\100_deg_data\daily_csvs\2014\2014-02-IrishSea.csv'
shutil.move(grab2,move2)
You can do something like the following. Keep in mind that the second number of range (the stop value) needs to be one value higher than you intend.
for year in range(2014, 2020):
for month in range(1, 13):
if month < 10:
month_as_string = "0" + str(month)
else:
month_as_string = str(month)
date = "%s\%s-%s" % (year, year, month_as_string)
pathname = 'YOUR\FILEPATH\HERE' + date + 'irishsea.csv'
You can learn more about string formatting here https://www.learnpython.org/en/String_Formatting
I don't know if that is the right term for my problem, I hope you can understand it. The purpose of this is to collect files from ftp with a combination of account name and date of the filename
I have this program in Python using Flask Framework with a Database account_names
The template have two dates - from date and to date and I will get the between dates of it. I created for loop to get the region in my Database and another for loop to get to get the dates between the two dates. In this for loop, I created variable idDate combines the names from database and the output of the dates and a for loop for files from ftp.
For this, I want to collect the files using the output in idDate to the for loop of file. This works on my another program that only gets a single date. But when I did it with a multiple dates, there's no output from my print
c, conn = connection()
data_name = c.execute("SELECT acct_value FROM account_names ORDER BY region ASC")
data_name = c.fetchall()
current_path = os.getcwd()
save_path = str(current_path) + '/logs/'
ftp_directory = '/R03/sqa/logs/'
chk_log_dir = os.path.isdir('logs')
strdate = datetime.datetime.strptime(from_collect_tab, "%Y-%m-%d")
endate = datetime.datetime.strptime(to_collect_tab, "%Y-%m-%d")
frm_month = strdate.month
frm_day = strdate.day
frm_year = strdate.year
to_month = endate.month
to_day = endate.day
to_year = endate.year
for names in data_name:
region = c.execute("SELECT region FROM account_names WHERE acct_value = (%s)", (names[0]))
region = c.fetchone()
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
start_dt = date(frm_year, frm_month, frm_day)
end_dt = date(to_year, to_month, to_day)
for dt in daterange(start_dt, end_dt):
out_dates = dt.strftime("%Y-%m-%d")
idDate = names[0] + "-" + out_dates
# print(idDate)
if chk_log_dir == True:
ftp = ftpconnection()
ftp.cwd(ftp_directory + region[0])
files = ftp.nlst()
for file in files:
# print(file)
if idDate in file:
# ftp.retrbinary('RETR ' + file , open(save_path + file, 'wb').write)
print(idDate)
# print(file)
ftp.close()
I am trying to combine multiple .csv files into one .csv file using the dataframe in pandas. the tricky part about this is, i need to grab multiple files from multiple days. Please let me know if this does not make sense. As it currently stands i cannot figure out how to loop through the directory. Could you offer some assistance?
import csv
import pandas as pd
import datetime as dt
import glob, os
startDate = 20160613
endDate = 20160614
dateRange = endDate - startDate
dateRange = dateRange + 1
todaysDateFilePath = startDate
for x in xrange(dateRange):
print startDate
startDate = startDate + 1
filePath = os.path.join(r"\\export\path", startDate, "preprocessed")
os.chdir(filePath)
interesting_files = glob.glob("trade" + "*.csv")
print interesting_files
df_list = []
for filename in sorted(interesting_files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list)
saveFilepath = r"U:\Chris\Test_Daily_Fails"
fileList = []
full_df.to_csv(saveFilepath + '\\Files_For_IN' + "_0613_" + ".csv", index = False)
IIUC you can create list all_files and in loop append output from glob to all_files:
all_files = []
for x in xrange(dateRange):
print startDate
startDate = startDate + 1
filePath = os.path.join(r"\\export\path", startDate, "preprocessed")
os.chdir(filePath)
all_files = all_files + glob.glob("trade" + "*.csv")
print interesting_files
Also you need first append all values to df_list and then only once concat (I indented code for concat):
df_list = []
for filename in sorted(interesting_files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list)