I have a column in my Excel file (before importing into IDE with read_csv) with dates that begin as string type with the format of “yyyy-mm-dd” and I need to change that entire column to date type with format of “mm/dd/yyyy” as I’m importing it as a data frame in Python with Pandas.
Also, it would be great if the format could be where if the month and/or day is a single digit, then it comes out like “1/4/2021”. But if one or both are plural, then it comes out as “1/12/2021” or “10/8/2021” or “11/16/2020”.
I currently have this code:
df = df.df.strptime(“Date”, “%Y-%m-%d”).strftime(“%m/%d/%Y”)
But the IDE is saying there’s a syntax error. And I’m not sure if this is close to correct in terms of making sure the entire column is being changed.
This line will change the format "yyyy-mm-dd" to "mm/dd/yyyy"
df = df[5:7]+'/'+df[8:10]+'/'+df[0:4]
I use pandas to read a .csv file, then save it as .xls file. Code as following:
import pandas as pd
df = pd.read_csv('filename.csv', encoding='GB18030')
print(df)
df.to_excel('filename.xls')
There's a column contains date like '2020/7/12', it's looks like pandas recognized it as date and output it to '2020-07-12' automatically. I don't want to format this column, or any other columns like this, I'd like to keep all data remain the same as plain text.
This convertion happens at read_csv(), because print(df) already outputs YYYY-MM-DD, before to_excel().
I tried use df.info() to check the data type of that column, the data type is object. Then I added argument dtype=pd.StringDtype() to read_csv() and it doesn't help.
The file contains Chinese characters so I set encoding to GB18030, don't know if this matters.
My experience concerning pd.read_csv indicates that:
Only columns convertible to int or float are by default
converted to respective types.
"Date-like" strings are still read as strings (the column type in
the resulting DataFrame is actually object).
If you want read_csv to convert such column to datetime type, you
should pass parse_dates parameter, specifying a list of columns to be
parsed as dates. Since you didn't do it, no source column should be
converted to datetime type.
To check this detail, after you read file, run file.info() and check
the type of the column in question.
So if respective Excel file column is of Date type, then probably
this conversion is caused by to_excel.
And one more remark concerning variable names:
What you have read using read_csv is a DataFrame, not a file.
Actual file is the source object, from which you read the content,
but here you passed only file name.
So don't use names like file to name the resulting DataFrame, as this
is misleading. It is much better to use e.g. df.
Edit following a comment as of 05:58Z
To check in full extent what you wrote in your comment, I created
the following CSV file:
DateBougth,Id,Value
2020/7/12,1031,500.15
2020/8/18,1032,700.40
2020/10/16,1033,452.17
I ran: df = pd.read_csv('Input.csv') and then print(df), getting:
DateBougth Id Value
0 2020/7/12 1031 500.15
1 2020/8/18 1032 700.40
2 2020/10/16 1033 452.17
So, at the Pandas level, no format conversion occurred in DateBougth
column. Both remaining columns, contain numeric content, so they were
silently converted to int64 and float64, but DateBought remained as object.
Then I saved this df to an Excel file, running: df.to_excel('Output.xls')
and opened it with Excel. The content is:
So neither at the Excel level any data type conversion took place.
To see the actual data type of B2 cell (the first DateBougth),
I clicked on this cell and pressed Ctrl-1, to display cell formatting.
The format is General (not Date), just as I expected.
Maybe you have some outdated version of software?
I use Python v. 3.8.2 and Pandas v. 1.0.3.
Another detail to check: Look at your code after pd.read_csv.
Maybe somewhere you put instruction like df.DateBought = pd.to_datetime(df.DateBought) (explicit type conversion)?
Or at least format conversion. Note that in my environment
there was absolutely no change in the format of DateBought column.
Problem solved. I double checked my .csv file, opened it with notepad, the data is 2020-07-12, which displays as 2020/7/12 on Office. Turns out that Office reformatted date to yyyy/m/d (based on your region). I'm developing a tool to process and import data to DB for my company, we did these work manually by copy and paste so no one noticed this issue. Thanks to #Valdi_Bo for his investigate and patience.
I have been using pandas but am open to all suggestions, I'm not an expert at scripting but am a complete loss. My goal is the following:
Merge multiple CSV files. Was able to do this in Pandas and have a dataframe with the merged dataset.
Screenshot of how merged dataset looks like
Delete the duplicated "GEO" columns after the first set. This last part doesn't let me usedf = df.loc[:,~df.columns.duplicated()] because they are not technically duplicated.The repeated column names end with a .1,.2,etc. as I am guessing the concate adds this. Other problem is that some columns have a duplicated column name but are different datasets. I have been using the first row as the index since it's always the same coded values but this row is unnecessary and will be deleted afterwards in the script. This is my biggest problem right now.
Delete certain columns such as the ones with the "Margins". I use ~df2.columns.str.startswith for this and have no trouble with this.
Replace spaces, ":" and ";" with underscores in the first row. I have no clue how to do this.
Insert a new column, write '=TEXT(B1,0)' formula, do this for the whole column (formula would change to B2,B3, etc.), copy the column and paste as values. I was able to do this in openpyxl although was having trouble and was not able to try the final output thanks to excel trouble.
source = excel.Workbooks.Open(filename)
excel.Range("C1:C1337").Select()
excel.Selection.Copy()
excel.Selection.PasteSpecial(Paste=constants.xlPasteValues)
Not sure if it works and was wondering if it was possible in pandas, win32com or I should stay with openpyxl. Thanks all!
I'm pretty new to Pandas in python and i need help to see if I'm doing it right. Basically i have an excel file containing data and I'm using pandas to play with the data. The question mentions that i can select four ports before aggregating in terms of Year and Ports. So i tried something like this.
filterport = portTraffic.Port.isin(['Adelaide','Brisbane','Sydney','Melbourne'])
new = portTraffic[filterport]
year_port = new.groupby(['Year','Port'])
I obtain an output (if i print the head) only showing me the data of the four ports i filtered but I wonder if I'm doing it correctly?
note, portTraffic is the excel file
I am using DataNitro in my spreadsheet. When I write the values to a cell. It automatically guesses if format looks like a date. This is obviously not always helpfull!
dt_str = "08/20/13"
Cell("A1").value = dt_str
# puts date type in that cell
I am not sure whether this behaviour is from Excel 2010 or from DataNitro side. As I am writing this i am getting more convinced that this is an Excel issue. Anybody with experience on this?
Done some more research and I almost conviced it is Excel Issue. Solutions when Entering data directly is starting the cell with a ' This is obviously? not possible if I come in from python.
This is an Excel issue, and putting a single quote at the beginning is correct. You can do that as long as you use double quotes to delimit the string:
Cell("A1").value = "'10/1/2013"