Taking PDF Info and Exporting it to Specific Excel Cells - python

I've searched for an answer to this specific question and haven't found anything. The goal is taking data from invoices and pasting it into specific cells in excel. Anyone have a good resource for doing this?
Thanks!

Suggestions:
For getting data out of PDF, use the pdf-miner https://pypi.python.org/pypi/pdfminer/, for writing the data into excel files, use xlwt https://pypi.python.org/pypi/xlwt and xlutils https://pypi.python.org/pypi/xlutils

Related

Obtain textbox value inside shape from Excel in Python

I'm developing a function that allows users to upload .xls or .xlsx files to the server and save data from those files into a database.
I'm using openpyxl and xlrd libraries for reading data from Excel, but for some Excel files which contain text in textbook inside shapes, I'm currently unable to read those values.
I know maybe my question is a duplicate of this: Obtain textbox value from Excel in Python but the solution of the asker of that question is not a general solution.
Does any anyone know how to achieve this?

Django read excel (xlsx) file to display in a table

I have a simple excel file I am trying to figure out how I can get my app to read the excel file, so I can use the data to display in a template. I have looked into xlrd but does anyone have any suggestions on the best way to do this?
Using pandas's read_excel is very easy and efficient and will also give you liberty to manipulate the columns(Fields)

Reading excel file with pylightxl

My organization needs me to use pylightxl library to read some bulky excel xlsx files. I have never used this library before and I'm getting a strange error in pycharm. I simply do not understand what it is.
I've tried googling but there isn't much support for pylightxl on the web. Does anyone know how to help?
for completeness of this post it looks like there was a bug in early days of pylightxl for files that were converted from xls to xlsx, however this issue has been resolved with #31 with version 1.52+
For those who might meet the same issue in the future, here's how i solved my issue, or rather a work around of it.
The excel files i had been given were initially in xls format, which pylightxl does not support.
I converted them to xlsx by just clicking "save as" in excel and then tried to read them in pylightxl, of which i was getting the strange error above. Must be something to do with the format
So I ended up saving it in csv instead, of which reading them was successful.
So if anyone meets this error, try different formats for the document you're trying to read

Any way to save format when importing an excel file in Python?

I'm doing some work on the data in an excel sheet using python pandas. When I write and save the data it seems that pandas only saves and cares about the raw data on the import. Meaning a lot of stuff I really want to keep such as cell colouring, font size, borders, etc get lost. Does anyone know of a way to make pandas save such things?
From what I've read so far it doesn't appear to be possible. The best solution I've found so far is to use the xlsxwriter to format the file in my code before exporting. This seems like a very tedious task that will involve a lot of testing to figure out how to achieve the various formats and aesthetic changes I need. I haven't found anything but would said writer happen to in any way be able to save the sheet format upon import?
Alternatively, what would you suggest I do to solve the problem that I have described?
Separate data from formatting. Have a sheet that contains only the data – that's the one you will be reading/writing to – and another that has formatting and reads the data from the first sheet.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Categories

Resources