Obtain textbox value inside shape from Excel in Python - python

I'm developing a function that allows users to upload .xls or .xlsx files to the server and save data from those files into a database.
I'm using openpyxl and xlrd libraries for reading data from Excel, but for some Excel files which contain text in textbook inside shapes, I'm currently unable to read those values.
I know maybe my question is a duplicate of this: Obtain textbox value from Excel in Python but the solution of the asker of that question is not a general solution.
Does any anyone know how to achieve this?

Related

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!
You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

How to append dataframe to xlsx file without loading workbook?

I'm working with slightly big data and i need to write this data to an xlsx file. Sometimes the size of this files can be 15GB. I have a python code that gets data as dataframes and writes data to excel continuously so i need to write data to an existing excel and the existing sheet. I was using 'openpyxl'.
There are two problems that I faced while working with that library.
Firstly to append an existing excel it needs to load workbook which is an impossible thing for me because of the data size. I must use
the lowest RAM I can use. -
Secondly this lib is useful only writing
to the different sheets. When I'm trying to write data to same sheet
even if I give the 'startrow' for the saving process it deletes the
old data and writes new one starting from that row.
I already tried the solution available here to address my problem but it doesn't fit my requirements.
Do you have any idea how I can do this?.

How to upload multiple excel documents into one dataset using python?

I an new to code and I would like to know whether it is possible to upload multiple excel documents into one dataset using python? If so, what is the code for this? All of the code I have seen is used for uploading one single excel document. Moreover, do I have to convert the data into CSV form first or I can use code to convert it into CSV after uploading it?
I am using jupyter notebook in anaconda to run my python code.
Your assistance is greatly appreciated.
By uploading, do you mean reading a file? If so, just create a list or dictionary, open the files and write them 1 by 1 into your list / dictionary. Also, it would be really helpful creating CSV files first. If you want to do it manually you can easily by saving the file as CSV in Excel.

How can I update workbook links when using pd.read_excel()?

The question is pretty simple, actually.
I'm reading an Excel file using Pandas. When I open it using Office's Excel in my Desktop I'm prompted to Enable Content and then Update Links [that is, update values in those cells importing information from cells in other workbooks and xslx files], so it reads other files in some other folders.
While using pd.read_excel('filename') however that option is not available, and I'm afraid it's importing the data previously contained in the spreadsheet without updating it. Is there a workaround?

Taking PDF Info and Exporting it to Specific Excel Cells

I've searched for an answer to this specific question and haven't found anything. The goal is taking data from invoices and pasting it into specific cells in excel. Anyone have a good resource for doing this?
Thanks!
Suggestions:
For getting data out of PDF, use the pdf-miner https://pypi.python.org/pypi/pdfminer/, for writing the data into excel files, use xlwt https://pypi.python.org/pypi/xlwt and xlutils https://pypi.python.org/pypi/xlutils

Categories

Resources