I have an ecxel file like this and I want the numbers in the date field to be converted to a date like (2021.7.22) and replaced in the date field again using python
You can try something like this:
import pandas as pd
dfs = pd.read_excel('Test.xlsx', sheet_name=None)
output = {}
for ws, df in dfs.items():
if 'date' in df.columns:
df['date'] = pd.to_datetime(df['date'].apply(lambda x: f'{str(x)[:4]}.{str(x)[4:6 if len(str(x)) > 7 else 5]}.{str(x)[-2:]}')).dt.date
output[ws] = df
writer = pd.ExcelWriter('TestOutput.xlsx')
for ws, df in output.items():
df.to_excel(writer, index=None, sheet_name=ws)
writer.save()
writer.close()
For each worksheet containing the column date in the input xlsx file, it will convert the integer it finds to a date, assuming that the month portion may be 1 or 2 digits and that the day portion is always a full 2 digits. If the actual month/day protocol in your data is different, you can adjust the logic accordingly.
The code creates a new output xlsx reflecting the above changes.
Related
This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 2 years ago.
Need guidance on how I can format a value to date format in Pandas before it prints out the value to an Excel sheet.
I am new to Pandas and had to edit an existing code when the values are output to an Excel sheet.
After some conditional/functional calculations are done, the value is then output to Excel.
My Current value seems to be in string format which is not an Excel friendly date format.
Output of the value looks like this :
Needed to format the output to the date format
I did try the options of strptime, but as per my understanding, these values will also give the output in string format. the strange part is, I am not able to format the column in Excel to date format using Excel formatting option as well.
Thank you for your time and help.
My code is something like this:
def calculate(snumber,owner,reason):
#some if conditions and then
date11 = Date + relativedelta(months = 1)
return date11.strftime('%d %b %Y')
df['date1'] = df.apply(lambda x: calculate(x['snumber'], x['owner'], x['reason']), axis=1)
For making sure you have column in dateformat, use following
df['date1'] = df['date1'].dt.strftime('%Y/%m/%d')
Once that is done, you can use Pandas ExcelWriter's xlsxwriter engine.
Please see more details about that in this article: https://xlsxwriter.readthedocs.io/example_pandas_column_formats.html
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_column_formats.xlsx", engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format = workbook.add_format({'num_format': 'dd/mm/yy'})
# Set the column width and format.
# Provide proper column where you have date info.
worksheet.set_column('A:A', 18, format)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Convert date format in pandas dataframe itself using below:
date['date1'] = pd.to_datetime(df['date1'])
Example:
I have a dataframe
PRODUCT PRICE PURCHSE date
0 ABC 5000 True 2020/06/01
1 ABB 2500 False 2020/06/01
apply above given formulae on date in dataframe
df['date'] = pd.to_datetime(df['date'])
Output will like:
PRODUCT PRICE PURCHSE date
0 ABC 5000 True 2020-06-01
1 ABB 2500 False 2020-06-01
I want to format a column in a dataframe to have ',' between large numbers once i send the df to_excel. i have a code that works but it selects the column based on its position. I want a code to select the column based on its name and not position. can someone help me please?
df.to_excel(writer, sheet_name = 'Final Trade List')
wb = writer.book
ws = writer.sheets['Final Trade List']
format = wb.add_format({'num_format': '#,##'})
ws.set_column('O:O', 12, format) # this code works but its based on position and not name
ws.set_column(df['$ to buy'], 12, format) # this gives me an error
writer.save()
TypeError: cannot convert the series to <class 'int'>
This should do the trick:
import pandas as pd
df['columnname'] = pd.Series([format(val, ',') for val in df['columnname']], index = df.index)
How do i loop through my excel sheet and add each 'Adjusted Close' to a dataframe? I want to summarize all adj close and make an stock indice.
When i try with the below code the dataframe Percent_Change is empty.
xls = pd.ExcelFile('databas.xlsx')
countSheets = len(xls.sheet_names)
Percent_Change = pd.DataFrame()
x = 0
for x in range(countSheets):
data = pd.read_excel('databas.xlsx', sheet_name=x, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[x] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()
unfortunately I do not have the data to replicate your complete example. However, there appears to be a bug in your code.
You are looping over "x" and "x" is a list of integers. You probably want to loop over the sheet names and append them to your DF. If you want to do that your code should be:
import pandas as pd
xls = pd.ExcelFile('databas.xlsx')
# pep8 unto thyself only, it is conventional to use "_" instead of camelCase or to avoid longer names if at all possible
sheets = xls.sheet_names
Percent_Change = pd.DataFrame()
# using sheet instead of x is more "pythonic"
for sheet in sheets:
data = pd.read_excel('databas.xlsx', sheet_name=sheet, index_col='Date')
# Calculate the percent change from day to day
Percent_Change[sheet] = pd.Series(data['Adj Close'].pct_change()*100, index=Percent_Change.index)
stock_index = data['Percent_Change'].cumsum()
import csv
f = csv.reader(open('lmt.csv','r')) # open input file for reading
Date, Open, Hihh, mLow, Close, Volume = zip(*f) #s plit it into separate columns
ofile = open("MYFILEnew1.csv", "wb") # output csv file
c = csv.writer(ofile)
item = Date
item2 = Volume
rows = zip(item, item)
i = 0
for row in item2:
print row
writer = csv.writer(ofile, delimiter='\t')
writer.writerow([row])
ofile.close()
Above is what I have produced so far.
As you can see in the 3rd line, I have extracted 6 columns from a spreadsheet.
I want to create a .csv file under the name of MYFILEnew1.csv which only has two columns, Date and Volume.
What I have above creates a .csv that only writes Volume column into the first column of the new .csv file.
How would you go about placing Date into the second column?
For example
Date Open High Low Close Volume
17-Feb-16 210 212.97 209.1 212.74 1237731
is what i have. and Id like to produce a new csv file such that it has
Date Volume
17-Feb-16 1237731
If I understand you question correctly, you can achieve that very easily using panda's read_csv and to_csv (#downvoter: Could you explain your downvote, please!?); the final solution to your problem can be found below EDIT2:
import pandas as pd
# this assumes that your file is comma separated
# if it is e.g. tab separated you should use pd.read_csv('data.csv', sep = '\t')
df = pd.read_csv('data.csv')
# select desired columns
df = df[['Date', 'Volume']]
#write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
So, if your data.csv file looks like this:
Date,Open,Hihh,mLow,Close,Volume
1,5,9,13,17,21
2,6,10,14,18,22
3,7,11,15,19,23
4,8,12,16,20,24
The the MYFILEnew1.csv would look like this after running the script above:
Date Volume
1 21
2 22
3 23
4 24
EDIT
Using your data (tab separated, stored in the file data3.csv):
Date Open Hihh mLow Close Volume
17-Feb-16 210 212.97 209.1 212.74 1237731
Then
import pandas as pd
df = pd.read_csv('data3.csv', sep='\t')
# select desired columns
df = df[['Date', 'Volume']]
# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
gives the desired output
Date Volume
17-Feb-16 1237731
EDIT2
Since your header in your input csv file seems to be messed up (as discussed in the comments), you have to rename the first column. The following now works fine for me using your entire dataset:
import pandas as pd
df = pd.read_csv('lmt.csv', sep=',')
# get rid of the wrongly formatted column name
df.rename(columns={df.columns[0]: 'Date' }, inplace=True)
# select desired columns
df = df[['Date', 'Volume']]
# write to the file (tab separated)
df.to_csv('MYFILEnew1.csv', sep='\t', index=False)
Here I would suggest using the csv module's csv.DictReader object to read and write from the files. To read the file, you would do something like
import csv
fieldnames=('Date', 'Open', 'High', 'mLow', 'Close', 'Volume')
with open('myfilename.csv') as f:
reader = csv.DictReader(f, fieldnames=fieldnames)
Beyond this, you will just need to filter out the keys you don't want from each row and similarly use the csv.DictWriter class to write to your export file.
You were so close:
import csv
f = csv.reader(open('lmt.csv','rb')) # csv is binary
Date, Open, Hihh, mLow, Close, Volume = zip(*f)
rows = zip(Date, Volume)
ofile = open("MYFILEnew1.csv", "wb")
writer = csv.writer(ofile)
for row in rows:
writer.writerow(row) # row is already a tuple so no need to make it a list
ofile.close()
I'm stuck with reading all the rows of a csv file and save into a csv files (I'm using pandas 0.17.1).
I've a list of tickers inserted into a csv file: they are inserted into each column, like this:
Column A: AAPL / Column B:TSLA / Column C: EXPD... and so on.
Now, I've to add 3000 new tickers to this list, and so I change the orientation of the csv, bringing every ticker into each row of the first column, like this:
Column A
AAPL
TSLA
EXPD
...and so on.
The issue is: when I save the document into a csv file, it read only the first row, and nothing else.
In my example, if i have on the first row "AAPL", I will obtain a csv file that has only the data from AAPL.
This is my code:
symbols_list = pd.read_csv('/home/andrea/htrade/python/titoli_rows.csv')
symbols = []
for ticker in symbols_list:
r = DataReader(ticker, "yahoo",
start=datetime.datetime.now() - BDay(20),
end=datetime.datetime.now())
# add a symbol column
r['Symbol'] = ticker
symbols.append(r)
# concatenate all the dfs
df = pd.concat(symbols)
#define cell with the columns that i need
cell = df[['Symbol', 'Open', 'High', 'Low', 'Adj Close', 'Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1, 0]).set_index('Symbol').to_csv('/home/andrea/Dropbox/HT/stock20.csv', date_format='%d/%m/%Y')
Why if I paste a ticker in each column the csv contain all the data of every ticker, but if I paste a ticker in each row, it will read just the first row?
I already tried to see if the "read_csv" function was reading correctly the csv, and he is, so I don't understand why he's not elaborating them all.
I just ran the below and with a short list of symbols imported via read_csv it seemed to work fine:
from datetime import datetime
import pandas.io.data as web
from pandas.tseries.offsets import BDay
df = pd.read_csv(path_to_file).loc[:, ['symbols']].dropna().squeeze()
symbols = []
for ticker in df.tolist():
r = web.DataReader(ticker, "yahoo",
start= datetime.now() - BDay(20),
end= datetime.now())
r['Symbol'] = ticker
symbols.append(r)
df = pd.concat(symbols).drop('Close', axis=1)
cell= df[['Symbol','Open','High','Low','Adj Close','Volume']]
cell.reset_index().sort_values(['Symbol', 'Date'], ascending=[1,0]).set_index('Symbol').to_csv(path_to_file, date_format='%d/%m/%Y')