The tried to open an excel file in Python, but it contains a filter in the first row (Image 1), it causes an error in Python that it cannot read it. I tried to use skiprow and changing the .xlsx file to .csv, but that filter from the first row sticks. Is there any way I can read the file without manually deleting that row?
In Excel I have many sheets and they are all with filters in the first row, below is the example of these filters
You could create a duplicate of that excel file, remove the filter and then try again.
You can check out this documentation on how to read excel files.
Documentation
Something like this:
pd.read_excel(open('tmp.xlsx'),sheet_name='Sheet1')
Related
I'm currently working with a pandas data frame and need to save data via CSV for different categories.so I thought to maintain one CSV and add separate sheets to each category. As per my research via CSV, we can't save data for multiple sheets. is there any workaround for this? I need to keep the format as CSV(cant use excel)
No.
A CSV file is just a text file, it doesn't have a standard facility for "multiple sheets" like spreadsheet files do.
You could save each "sheet" as a separate file, but that's about it.
This is probably a really dumb question.
I have a dataframe that has a column containing scores of a soccer game (e.g. 1-2). When I save the dataframe using df.to_csv, and open the .csv file in Excel afterwards, the scores are given as date (e.g. 1-2 is now 1st Feb).
I realize this is an issue within Excel probably, since when I open the file in Notepad, the scores are as they should be.
So my question is, how best to handle it? Is there an option in Python where I can save the .csv in such a format that the score isn't converted to a date? Or is it something to be tackled in Excel?
Thanks!
If you save your file as text (.txt) instead of .csv, Excel shouldn't re-format it.
This might go against your specific needs, if .csv is necessary. But if not, you can achieve the same result (in the sense of delimitation and headers) by opening the text file from Excel's File Menu, selecting 'Delimited'.
Then, if in python you are saving your .txt file with a comma delimitation, de-select the 'Tab' option and select 'Comma'..
I am programmatically creating csv files using Python. Many end users open and interact with those files using excel. The problem is that Excel by default mutates many of the string values within the file. For example, Excel converts 0123 > 123.
The values being written to the csv are correct and display correctly if I open them with some other program, such as Notepad. If I open a file with Excel, save it, then open it with Notepad, the file now contains incorrect values.
I know that there are ways for an end user to change their Excel settings to disable this behavior, but asking every single user to do so is not possible for my situation.
Is there a way to generate a csv file using Python that a default copy of Excel will NOT mutate the values of?
Edit: Although these files are often opened in Excel, they are not only opened in Excel and must be output as .csv, not .xlsx.
The short answer is no, it is not possible to generate a single CSV that will display (arbitrary) data the same way in Excel and in non-Excel programs.
There are convoluted ways to force strings to appear how you want when you open a CSV in Excel, but then non-Excel programs will almost certainly not display them the way you want.
Though you say you must stick to CSV due to non-Excel programs, you don't say which programs those are. If it is possible that they can open .xlsx files after all, then .xlsx would be the best choice.
The solution is to declare the data type while writing the file. It seems like Excel is trying to be smart and converts the whole column to a numeric type. The output should be written directly into .xlsx format like so:
import pandas as pd
writer = pd.ExcelWriter('path/to/save.xlsx')
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data)
Df.to_excel(writer,"Sheet1")
writer.save()
Source: https://stackoverflow.com/a/31136119/8819895
Have you tried expressly formatting the relevant column(s) to 'str' before exporting?
df['column_ex'] = df['column_ex'].astype('str')
df.to_csv('df_ex.csv')
Another workaround may be to open Excel program (not file), go to Data menu, then Import form Text. Excel's import utility will give you options to define each column's data type. I believe Apache's Liibre office defaults to keep the leading 0s but Excel doesn't.
I am looking to write certain columns of data from an excel sheet to a HTML table. Not looking to write specific/fixed cells into the table always, need to do this based on conditions. For example, if I have a table with columns Name/Age/Occupation, I would like to make an HTML table using just columns Name and Occupation. Also, within Name, I would only like to write the names starting with 'N' onto the table and corresponding Occupation. The Excel sheet dynamically changes with new data everytime. Essentially, I would not want to write specific cells or range of cells into the table but only the data based on conditions I set. Any suggestions using python/html/jquery or other methods are welcome.
First you should edit the Excel file, export it as a .csv file and then work on the file using a program language of your preference. It would be much much more complicated if you try to work on the .xls or .xlsx files. I recommend using python with its library panda that works on csv files.
For parsing excel files, I've had good success using openpyxl
A Python library to read/write Excel 2010 xlsx/xlsm files
I've written a python/webdriver script that scrapes a table online, dumps it into a list and then exports it to a CSV. It does this daily.
When I open the CSV in Excel, it is unformatted, and there are fifteen (comma-delimited) columns of data in each row of column A.
Of course, I then run 'Text to Columns' and get everything in order. It looks and works great.
But tomorrow, when I run the script and open the CSV, I've got to reformat it.
Here is my question: "How can I open this CSV file with the data already spread across the columns in Excel?"
Try importing it as a csv file, instead of opening it directly on excel.