Keep formatting of Excel file when loading into DataFrame in Python

Keep formatting of Excel file when loading into DataFrame in Python - python

I'm trying to load a Excel file into Python and keep the formatting of the columns/ data. I have numbers which are stored as text, but Python changes the formatting to numbers which causes an issues as it only shows 15 significant digits in a number, and changes digits after the 15th place to 0 (as it would do in Excel). I would like to keep the numbers as Text to have all digits.
I'm using:
myContractData = pd.read_excel(Path)
Thanks a lot a lot for your help.
Alex

If you case is based on loading data to a excel file, you need to ensure the number in the excel files has 15+ digits.
what I know in old office versions, excel only handle 15 significant digits of precision https://support.microsoft.com/en-us/kb/78113
if you use Excel load the number, the number is rounded to 15 digits automatically. That’s why in Excel files for numbers, we can only see the 15 digits of the value.
if possible, I would suggest you connecting to the data source directly in Python, or convert it to string in data source.

Related

Working with Hours & Minutes in Python & openpyxl - Excel Formatting vs Strings & Ints

Hoping someone with some openpyxl or general Excel experience might be able to help.
I'm working on a project to record flying hours, and produce an Excel spreadsheet of flights completed in a month.
So far, I've used PySimpleGUI to create a nice front end, and got it working so it stores each flight's details as a dictionary, where the keys are terms like the names of the crew, the aircraft registration and so on. Each flight is separately stored in a dictionary for the current month.
To make sure the hours flown make sense, I've used number spinners so they can't get nonsense inputs. Each type of flying hour is recorded as 2 keys, one for hours and one for minutes. So the dictionary has a section with parts like:
'-firstPilotHours-': 1,
'-firstPilotMins-': 30,
'-captainHours-': 1,
'-captainMins-': 30,
.. and so on.
I've managed to get these put into Excel by converting them to strings and the concatenating them with a colon in the middle:
ws1.cell(row=sortieIdent, column=9).value = str(currentMonth[sortie]["-captainHours-"]) + ":" + str(currentMonth[sortie]["-captainMins"])
... so it appears as "1:30" in Excel, which is the way I used to input the data when I ran a manual Excel file for this purpose.
The cell's number format is set as "[h]:mm" to allow me to perform calculations on the values as hours and minutes, so there can be a monthly total shown and so on.
However, this is the point where I'm stuck. I think because I'm converting them to strings, even though they look like "1:30" in Excel, they're being handled in Excel like a string and not an integer. It's not possible to perform any calculations with them. If I overtype them in Excel with "1:30," then they move to the right hand side of the column and start behaving like numbers.
I can't think of any way to get these into Excel in a manner where I can carry out calculations on them. Can anyone help?
I've thought about having separate columns for the hours and minutes, but I can't figure out how to work calculations in that manner either. I also thought about just displaying them as strings as it works now, but doing the calculations in Python; but I can't figure out how to do proper "hours & minutes" calculations within Python.

Hopefully this gives you some insight into Excel's date/time processing.
For a given decimal number that represents date/time...
The left side of the decimal represents the days since 1/0/1900.
The right side of a decimal number represents time.
Practical Example:
The date time I am writing this post is: 24 May 2020 # 8:02pm
Excel's underlying value is: 43975.8347222222
43975 days since 1/0/1900 = 24 May 2020
.8347222222 = Decimal portion of 24 hours
Having that foundation down (let me know if it isn't clear).
Now we can tackle your example.
To express 1:30 into Excel's world.
We would need to turn 1 hour and 30 minutes into a decimal day
Your 1:30 = .0625 in Excel
To show .0625, you can Format the cell as "hh:mm" or whatever you want using the custom format.
Custom Format = Display value to the user
.0625 = Underlying value Excel uses to calculate
Hope this nudges you further down the road.

how to read a csv and write it back exactly the same with pandas overcoming float imprecision

I would like to read in a csv and write it back exactly the same as it was using pandas or similar
example csv
019-12-12 23:45:00,95480,12.41,-10.19,11.31851,2.1882
and when I go to write it back, due to floating point properties i might get something like
019-12-12 23:45:00,95480,12.410000009,-10.19,11.31851.000000002,2.1822
I've seen suggestions to use float_format but the format is different for each column and different across files I'm looping through.

I'm not sure what you're doing, but if you need pandas and want to re-save the file, you likely will want to change data somewhere. If so, I'd recommend using
df.set_option('precision', num_decimals)
as long as your decimals are reasonably close in precision, in the example given, 4 would allow for enough precision and remove any straggling floating point inaccuracy. Otherwise, you'll have to look for several zeros in a row and delete all decimal places after that.
If you don't need to change any data, I would go with an alternative solution: shutil
import shutil
shutil.copyfile(path_to_file, path_to_target_file)
This way, there's no mutation that can occur as it's just copying the raw contents.

Decimal point / comma handling in pandas.read_clipboard

My script reads in data from the clipboard with pd.read_clipboard(). Being run on different computers and getting data from different inputs with either European decimal format (1,23) or international decimal format (1.23) I'd like to have some flexibility in the parsing process.
I could first read the content of the clipboard and decide according to some heuristics if I set decimal='.' or decimal=',' as parameter to pd.read_clipboard(), but is there a more elegant way to achieve this? So far I haven't found an option to let pandas decide on the correct decimal separator.

keep track of a string in a binary file at python

I have multiple strings and I want to keep track of them when save them on a binary file. In fact I like to know each string occupy how many bytes of a binary file. I don't know how to do it in python.Please help me.

How to read office 2010 excelfile using openpyxl without changing style

i am new in python and i want to read office 2010 excel file without changing its style. Currently its working fine but changing date format. i want it as they are in excel file.

i want it as they are in excel file.
A date is recorded in an Excel file (both 2007+ XLSX files and earlier XLS files) as a floating point number of days (and fraction thereof) since some date in 1899/1900 or 1904. Only the "number format" that is recorded against the cell can be used to distinguish whether a date or a number was intended.
You will need to be able to retrieve the actual float value and the "number format" and apply the format to the float value. If the "number format" being used is one of the standard ones, this should be easy enough to do. Customised number formats are another matter. Locale-dependant formats likewise.
To get detailed help, you will need to give examples of what raw data you have got and what you want to "see" and how it is now being presented ("changing date format").

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.