moving data in excel with python

moving data in excel with python - python

I would like to be able to move a data of the table automatically to place it on a new column and duplicate it as many times as I have rows before a row with only one data but I don't know which tool to use.

This is probably not a python, pandas or dataframe question but more about running a macro in excel.
One can run macro's in excel with python using: https://www.xlwings.org/
This is open source and free, comes preinstalled with Anaconda and WinPython, and works on Windows and macOS
Although, you might simple prefer the natural excel vba editor for this and "record a macro".
Hope this is helpful.

Using ffill answer directly the question.
df['col'] = df['col'].ffill()

Related

Is there a way to export the data viewer in VS Code?

I am viewing the data viewer for the counter dictionary. The data is nicely put in 2 columns, but I can't seem to find an option to export as a CSV or to excel. Selecting all and copying doesn't work for some reason, only the rows that are currently on the screen are copied, even though all the rows are selected. I am running VScode on a Mac.

Sorry, but it's impossible for now. This feature request is still open on GitHub. You can refer to here to join the discussion.

Python: Read existing Excel file and select a different dropdown value

I want to be able to open an xlsx file in Python and select a different dropdown value in a cell which should trigger an update for the entire spreadsheet based on the new value (just how it currently does so if I manually select a different value). How can I do this in Python and which library can help me?

TL;DR: You can't.
In order to get cascading execution, you need to access the Excel execution engine. Python libraries do not have a copy of this.
If you wish to change additional values in the spreadsheet, you will need to write your Python code to make the changes.
Caveat: There technically is a way to do it using pywin32 if you have a version of Excel installed. In this case Python is simply feeding instructions to Excel, no differently than if you were using VBA. It is significantly more complicated than changing a value using a library such as Openpyxl.

How do I use python pandas to read an already opened excel sheet

Assuming I have an excel sheet already open, make some changes in the file and use pd.read_excel to create a dataframe based on that sheet, I understand that the dataframe will only reflect the data in the last saved version of the excel file. I would have to save the sheet first in order for pandas dataframe to take into account the change.
Is there anyway for pandas or other python packages to read an opened excel file and be able to refresh its data real time (without saving or closing the file)?

Have you tried using mitosheet package? It doesn't answer your question directly, but it allows you working on pandas dataframes as you would do in excel sheets. In this way, you may edit the data on the fly as in excel and still get a pandas dataframe as a result (meanwhile generating the code to perform the same operations with python). Does this help?

There is no way to do this. The table is not saved to disk, so pandas can not read it from disk.

Be careful not to over-engineer, that being said:
Depending on your use case, if this is really needed, I could theoretically imagine a Robotic Process Automation like e.g. BluePrism, UiPath or PowerAutomate loading live data from Excel into a Python environment with a pandas DataFrame continuously and then changing it.
This use case would have to be a really important process though, otherwise licensing RPA is not worth it here.

df = pd.read_excel("path")
In variable explorer you can see the data if you run the program in SPYDER ide

Semi-Interactive Pandas Dataframe in a GUI

There are a number of excellent answers to this question GUIs for displaying dataframes, but what I'm looking to do is a bit more advanced.
I'd like to display a dataframe, but have a couple of the columns be interactive where the user can manually overwrite values (and the rest be static). It would be useful to have "total" rows that change with the overwritten values and eventually have some interactive buttons around the dataframe for loading and clearing data.
QTPandas looks promising, but appears to be dead as it is build off of a really old version of Pandas (0.17.1). Can this be done in QT? Is something else better?

I love Rstudio as my IDE as I can not only view all objects created but I can also edit data in the IDE itself. There are many other great features too.
And you can use R Studio for Python coding too (using reticulate package).
Spyder too gives this feature of viewing or editing the data frame.
However, if you're looking for a dedicated GUI with drag & drop features, you can use Pandas GUI.
Features of pandasgui are:
View DataFrames and Series (with MultiIndex support)
Interactive plotting
Filtering
Statistical summary
Data editing and copy / paste
Import CSV files with drag & drop Search toolbar
It's first version was released in Mar 2019 & still developing. As of date, you can't use it in Colab

While not a GUI in itself, XLWings leveraged Excel as a GUI and makes pandas dataframes interactive for users and was our library of choice.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.