How to read office 2010 excelfile using openpyxl without changing style - python

i am new in python and i want to read office 2010 excel file without changing its style. Currently its working fine but changing date format. i want it as they are in excel file.

i want it as they are in excel file.
A date is recorded in an Excel file (both 2007+ XLSX files and earlier XLS files) as a floating point number of days (and fraction thereof) since some date in 1899/1900 or 1904. Only the "number format" that is recorded against the cell can be used to distinguish whether a date or a number was intended.
You will need to be able to retrieve the actual float value and the "number format" and apply the format to the float value. If the "number format" being used is one of the standard ones, this should be easy enough to do. Customised number formats are another matter. Locale-dependant formats likewise.
To get detailed help, you will need to give examples of what raw data you have got and what you want to "see" and how it is now being presented ("changing date format").

Related

Working with Hours & Minutes in Python & openpyxl - Excel Formatting vs Strings & Ints

Hoping someone with some openpyxl or general Excel experience might be able to help.
I'm working on a project to record flying hours, and produce an Excel spreadsheet of flights completed in a month.
So far, I've used PySimpleGUI to create a nice front end, and got it working so it stores each flight's details as a dictionary, where the keys are terms like the names of the crew, the aircraft registration and so on. Each flight is separately stored in a dictionary for the current month.
To make sure the hours flown make sense, I've used number spinners so they can't get nonsense inputs. Each type of flying hour is recorded as 2 keys, one for hours and one for minutes. So the dictionary has a section with parts like:
'-firstPilotHours-': 1,
'-firstPilotMins-': 30,
'-captainHours-': 1,
'-captainMins-': 30,
.. and so on.
I've managed to get these put into Excel by converting them to strings and the concatenating them with a colon in the middle:
ws1.cell(row=sortieIdent, column=9).value = str(currentMonth[sortie]["-captainHours-"]) + ":" + str(currentMonth[sortie]["-captainMins"])
... so it appears as "1:30" in Excel, which is the way I used to input the data when I ran a manual Excel file for this purpose.
The cell's number format is set as "[h]:mm" to allow me to perform calculations on the values as hours and minutes, so there can be a monthly total shown and so on.
However, this is the point where I'm stuck. I think because I'm converting them to strings, even though they look like "1:30" in Excel, they're being handled in Excel like a string and not an integer. It's not possible to perform any calculations with them. If I overtype them in Excel with "1:30," then they move to the right hand side of the column and start behaving like numbers.
I can't think of any way to get these into Excel in a manner where I can carry out calculations on them. Can anyone help?
I've thought about having separate columns for the hours and minutes, but I can't figure out how to work calculations in that manner either. I also thought about just displaying them as strings as it works now, but doing the calculations in Python; but I can't figure out how to do proper "hours & minutes" calculations within Python.
Hopefully this gives you some insight into Excel's date/time processing.
For a given decimal number that represents date/time...
The left side of the decimal represents the days since 1/0/1900.
The right side of a decimal number represents time.
Practical Example:
The date time I am writing this post is: 24 May 2020 # 8:02pm
Excel's underlying value is: 43975.8347222222
43975 days since 1/0/1900 = 24 May 2020
.8347222222 = Decimal portion of 24 hours
Having that foundation down (let me know if it isn't clear).
Now we can tackle your example.
To express 1:30 into Excel's world.
We would need to turn 1 hour and 30 minutes into a decimal day
Your 1:30 = .0625 in Excel
To show .0625, you can Format the cell as "hh:mm" or whatever you want using the custom format.
Custom Format = Display value to the user
.0625 = Underlying value Excel uses to calculate
Hope this nudges you further down the road.

Is it possible to format individual cells in a .tsv file before opening with excel? (via python?)

this is my first question. I'm trying my best to make it as understandable as possible.
My Problem:
I'm writing a python program which reads an excel(.xlsm) file (~500 rows, 40 columns) and converts it via pandas into a pandas dataframe. My program then proceeds to edit the data, generate an output dataframe and write it to a .tsv file. This .tsv now consists of those 500 datapoints as rows and 7 columns with generated data by the program.
In the next step, the .tsv file will be opened in Excel, because we need to fill in the last 1 or 2 columns manually, which cannot be done by my program.
To achieve this, someone needs to process the content in the 2nd column, and deduce the content which needs to be written to those last 1 or 2 columns. The cells in the 2nd column that need to be read manually look something like this:
Unformated Cell
To make it easier to read for the human, I would like to somehow format the cells in the 2nd column BEFORE you open the .tsv via excel to fill in the gaps, so that it looks something like this: Formated Cell
I hope you understand my problem. Is there any way to format the whole column of 500 Cells (state of image 1 -> state of image 2) somehow in the .tsv before opening it in excel?
TSV files cannot be formatted like that. You need to convert your TSV to Excel (XLS, for example), format it and then open it.

can you subset while reading in a csv in python

I have daily weather data in csv since 1980, >10GB in size. The column I am interested in date, and I want to be able to have a user select a date so that only the results from that date are returned.
I wonder if it is possible to read in and subset at the same time to save memory and computation
I am relatively new to python and tried:
d=pd.read_csv('weather.csv',sep='\t')['Date' == 'yyyymmdd']
to no avail.
Is it possible to read in all of the data that is only present for a single day (ei 20011004)?
Short answer: from a csv you'll not be able to do so.
Long answer: csv formats are very handy for humans to read, but it's the worst for machines to operate with. You'll need to parse line by line until you find the lines where the date fits the requested one.
A possible solution: You should convert the csv into a more amenable format for such operations. My suggestion would be to go with something like hdf5. You can read the whole csv with pandas and then save it as a hdf5 file as d.to_hdf('weather.h5', format='table'). You can check the pandas hdf documentation here. This should allow you to handle in a more memory and cpu efficient way.
Binary files can implement indexes and sorting in such a way that you don't have to go through all the data to check for those pieces you need. The same ideas apply to databases.
Addendum: There are other options for binary formats, like parquet (which maybe would be even better you should test) or feather (if you want some level of "native" interoperativity with R). You might want to check the following post for some insights regarding loading/saving times in different formats and their size.

Creating an Excel data sheet from big csv file using python (or other)

At my job we are working with a huge set of data of real estate properties compacted in a csv file of around 200000 lines (constantly growing).
This csv sheet includes columns with info such as: pricing, surface area, year built, street, street nr, post code, etc.
Part of the work we are doing includes creating an Excel sheet of properties that are comparable to a given object within a set of certain limits (e.g. surface area +/- 20%).
I want to automate generating such an Excel list and I was thinking about using Python for this. Here is what I want the program to do:
1) Read in the csv file
2) Take in all necessary parameters to be compared for the Excel sheet
3) Create an excel sheet from the csv data with properties that fit these
parameters
4) Rewrite abstract parameter descriptions (e.g. if the value of column 'dishwasher' is '0', write 'No dishwasher available') and append the value in the house_number column to the street_name column value
Is python a good way for handling this or would you have other suggestions?
Python is a good language to do data parsing like this. Using the pandas library might be helpful. It has functions for importing CSVs and functions to operate on the resulting data. Pandas can also directly export into the excel format.

Keep formatting of Excel file when loading into DataFrame in Python

I'm trying to load a Excel file into Python and keep the formatting of the columns/ data. I have numbers which are stored as text, but Python changes the formatting to numbers which causes an issues as it only shows 15 significant digits in a number, and changes digits after the 15th place to 0 (as it would do in Excel). I would like to keep the numbers as Text to have all digits.
I'm using:
myContractData = pd.read_excel(Path)
Thanks a lot a lot for your help.
Alex
If you case is based on loading data to a excel file, you need to ensure the number in the excel files has 15+ digits.
what I know in old office versions, excel only handle 15 significant digits of precision https://support.microsoft.com/en-us/kb/78113
if you use Excel load the number, the number is rounded to 15 digits automatically. That’s why in Excel files for numbers, we can only see the 15 digits of the value.
if possible, I would suggest you connecting to the data source directly in Python, or convert it to string in data source.

Categories

Resources