Saving pandas dataframe as .csv, converts score to date - python

This is probably a really dumb question.
I have a dataframe that has a column containing scores of a soccer game (e.g. 1-2). When I save the dataframe using df.to_csv, and open the .csv file in Excel afterwards, the scores are given as date (e.g. 1-2 is now 1st Feb).
I realize this is an issue within Excel probably, since when I open the file in Notepad, the scores are as they should be.
So my question is, how best to handle it? Is there an option in Python where I can save the .csv in such a format that the score isn't converted to a date? Or is it something to be tackled in Excel?
Thanks!

If you save your file as text (.txt) instead of .csv, Excel shouldn't re-format it.
This might go against your specific needs, if .csv is necessary. But if not, you can achieve the same result (in the sense of delimitation and headers) by opening the text file from Excel's File Menu, selecting 'Delimited'.
Then, if in python you are saving your .txt file with a comma delimitation, de-select the 'Tab' option and select 'Comma'..

Related

Tableau data and python

hoping someone can help with this - please go easy I'm new to python - what I'm trying to do is download data from tableau as a CSV (which I have done) and then read that csv file using python - the problem I'm having is that no matter what method I try and use to read the file (csv, pandas, etc) all of the data from the csv is in the first 'cell' (col1 row1) and has question marks in diamonds in-between every character (these are also present when the csv file is imported into Google sheets or excel) - what am I doing wrong, or what do I need to do to fix this? - thank you in advance
Also when the csv file is opened in something like notepad it looks like a normal csv file

Can Python read this Excel file?

The tried to open an excel file in Python, but it contains a filter in the first row (Image 1), it causes an error in Python that it cannot read it. I tried to use skiprow and changing the .xlsx file to .csv, but that filter from the first row sticks. Is there any way I can read the file without manually deleting that row?
In Excel I have many sheets and they are all with filters in the first row, below is the example of these filters
You could create a duplicate of that excel file, remove the filter and then try again.
You can check out this documentation on how to read excel files.
Documentation
Something like this:
pd.read_excel(open('tmp.xlsx'),sheet_name='Sheet1')

Create a csv file that Excel will not mutate the data of when opening

I am programmatically creating csv files using Python. Many end users open and interact with those files using excel. The problem is that Excel by default mutates many of the string values within the file. For example, Excel converts 0123 > 123.
The values being written to the csv are correct and display correctly if I open them with some other program, such as Notepad. If I open a file with Excel, save it, then open it with Notepad, the file now contains incorrect values.
I know that there are ways for an end user to change their Excel settings to disable this behavior, but asking every single user to do so is not possible for my situation.
Is there a way to generate a csv file using Python that a default copy of Excel will NOT mutate the values of?
Edit: Although these files are often opened in Excel, they are not only opened in Excel and must be output as .csv, not .xlsx.
The short answer is no, it is not possible to generate a single CSV that will display (arbitrary) data the same way in Excel and in non-Excel programs.
There are convoluted ways to force strings to appear how you want when you open a CSV in Excel, but then non-Excel programs will almost certainly not display them the way you want.
Though you say you must stick to CSV due to non-Excel programs, you don't say which programs those are. If it is possible that they can open .xlsx files after all, then .xlsx would be the best choice.
The solution is to declare the data type while writing the file. It seems like Excel is trying to be smart and converts the whole column to a numeric type. The output should be written directly into .xlsx format like so:
import pandas as pd
writer = pd.ExcelWriter('path/to/save.xlsx')
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data)
Df.to_excel(writer,"Sheet1")
writer.save()
Source: https://stackoverflow.com/a/31136119/8819895
Have you tried expressly formatting the relevant column(s) to 'str' before exporting?
df['column_ex'] = df['column_ex'].astype('str')
df.to_csv('df_ex.csv')
Another workaround may be to open Excel program (not file), go to Data menu, then Import form Text. Excel's import utility will give you options to define each column's data type. I believe Apache's Liibre office defaults to keep the leading 0s but Excel doesn't.

Writing from Excel sheet to a table

I am looking to write certain columns of data from an excel sheet to a HTML table. Not looking to write specific/fixed cells into the table always, need to do this based on conditions. For example, if I have a table with columns Name/Age/Occupation, I would like to make an HTML table using just columns Name and Occupation. Also, within Name, I would only like to write the names starting with 'N' onto the table and corresponding Occupation. The Excel sheet dynamically changes with new data everytime. Essentially, I would not want to write specific cells or range of cells into the table but only the data based on conditions I set. Any suggestions using python/html/jquery or other methods are welcome.
First you should edit the Excel file, export it as a .csv file and then work on the file using a program language of your preference. It would be much much more complicated if you try to work on the .xls or .xlsx files. I recommend using python with its library panda that works on csv files.
For parsing excel files, I've had good success using openpyxl
A Python library to read/write Excel 2010 xlsx/xlsm files

python script to convert txt file to xlsx

Im trying to convert my .txt file into .xlsx file. I tried few options using openpyxl & xlsxwriter modules, but i was not able to get intended results. My text file is as below:
Text file
I need to convert it in Excel which look like this:
Excel file
Problem is, with the solutions i got till now i could only able to convert all the data in text in Excel, but i need few lines to chopped in between & transposed it to first row. Please help me, as i have to deal with more text to excel convertions manually.
Thanks in advance! :)

Categories

Resources