Tableau data and python - python

hoping someone can help with this - please go easy I'm new to python - what I'm trying to do is download data from tableau as a CSV (which I have done) and then read that csv file using python - the problem I'm having is that no matter what method I try and use to read the file (csv, pandas, etc) all of the data from the csv is in the first 'cell' (col1 row1) and has question marks in diamonds in-between every character (these are also present when the csv file is imported into Google sheets or excel) - what am I doing wrong, or what do I need to do to fix this? - thank you in advance
Also when the csv file is opened in something like notepad it looks like a normal csv file

Related

How to upload multiple excel documents into one dataset using python?

I an new to code and I would like to know whether it is possible to upload multiple excel documents into one dataset using python? If so, what is the code for this? All of the code I have seen is used for uploading one single excel document. Moreover, do I have to convert the data into CSV form first or I can use code to convert it into CSV after uploading it?
I am using jupyter notebook in anaconda to run my python code.
Your assistance is greatly appreciated.
By uploading, do you mean reading a file? If so, just create a list or dictionary, open the files and write them 1 by 1 into your list / dictionary. Also, it would be really helpful creating CSV files first. If you want to do it manually you can easily by saving the file as CSV in Excel.

Saving pandas dataframe as .csv, converts score to date

This is probably a really dumb question.
I have a dataframe that has a column containing scores of a soccer game (e.g. 1-2). When I save the dataframe using df.to_csv, and open the .csv file in Excel afterwards, the scores are given as date (e.g. 1-2 is now 1st Feb).
I realize this is an issue within Excel probably, since when I open the file in Notepad, the scores are as they should be.
So my question is, how best to handle it? Is there an option in Python where I can save the .csv in such a format that the score isn't converted to a date? Or is it something to be tackled in Excel?
Thanks!
If you save your file as text (.txt) instead of .csv, Excel shouldn't re-format it.
This might go against your specific needs, if .csv is necessary. But if not, you can achieve the same result (in the sense of delimitation and headers) by opening the text file from Excel's File Menu, selecting 'Delimited'.
Then, if in python you are saving your .txt file with a comma delimitation, de-select the 'Tab' option and select 'Comma'..

import csv file to google sheets via python

I am trying to setup a Python script that will automatically take a rather large csv file and upload it to a Google Sheet overwriting all the data on the current sheet.
for example lets say a new file comes out everyday of data and it is currently around 13,000 rows and goes to column ah. I want to have my pi read that csv file that is already on the pi and overwrite the cells that are already on the google sheet from the previous day.
any advice will be greatly appreciated

Embed CSV in Excel and import the data

I wrote a tool that extracts data from a large DB and outputs it to an Excel file along with (conditional) formatting to improve readability. For this I use Python with openpyxl on a Linux machine. It works great, but this package is rather slow for writing Excel.
It seems to be a lot quicker to dump the table as (compressed) csv, import that into Excel and apply formatting there using a macro/vba.
To automate the process I'd like to create an empty Excel file pre-loaded with the required VBA to do the formatting; a template. For every data dump, the data is embedded (compressed using deflate) into the Excel file and loaded into the Workbook upon opening the document (or using a "LOAD" button to circumvent macro related security things).
However, just adding some file into the Excel file raises an error when opened:
We found a problem with some content in 'Werkmap1_test_embed.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Clicking Yes opens the file and shows some tracing information as XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Repair Result to Werkmap1_OLE_Word0.xml</logFileName>
<summary>Errors were detected in file '/Users/joostk/mnt/cluster/Werkmap1_OLE_Word.xlsx'</summary>
<additionalInfo>
<info>Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.</info>
</additionalInfo>
</recoveryLog>
Is it possible to avoid this? How would I embed a file into the Excel ZIP? Do I need to update some file table (which I could not file easily).
When that's done, I'd like to import the data. Can I access files in the Excel ZIP from VBA? I guess not, and I need to extract the data to some temporary path and load it from there.
I have found these helpful answers elsewhere to load ZIP and plain text:
https://stackoverflow.com/a/35781621/4998990
https://stackoverflow.com/a/11267603/4998990
Many thanks for sharing your thoughts!
so my "Answer" here is that this is caused by using Named Ranges, or an underlying table, or an embedded Query/Connection. When you start manipulating this file you will get the error that you are talking about:
There is no harm to the file if you click "yes" and open. Excel will open this in Repaired Mode which will require you to re-save the file.
The way I've worked around this is to re-read the "repaired" file, in python, and save it as another file or replace it. Essentially just do an extra step of re-reading the data into memory, and write it to a new file. The error will go away. As always, test this method before deploying to production to ensure no records are lost. The way I solve it is with two lines of pandas.
import pandas as pd
repair = pd.read_excel('PATH_TO_REPAIR_FILE')
new_file = repair.to_excel('PATH_TO_WHERE_NEW_FILE_GOES')

python script to convert txt file to xlsx

Im trying to convert my .txt file into .xlsx file. I tried few options using openpyxl & xlsxwriter modules, but i was not able to get intended results. My text file is as below:
Text file
I need to convert it in Excel which look like this:
Excel file
Problem is, with the solutions i got till now i could only able to convert all the data in text in Excel, but i need few lines to chopped in between & transposed it to first row. Please help me, as i have to deal with more text to excel convertions manually.
Thanks in advance! :)

Categories

Resources