Take data from an xls sheet and add them into python commands - python

I've been asked to create a Python script to automate a server deployment for 80 retail stores.
As part of this script, I have a secondary script that I call to change multiple values in 9 XML files, however, the values are unique for each store, so this script needs to be changed each time, but after I am gone, this is going to be done by semi / non-technical people, so we don't want them to change the Python scripts directly for fear of breaking them.
This in mind, I would like to have these people input the store details into an XLS sheet, and a python file read this sheet and put the data it finds into the existing python script with the data to be changed.
The file will be 2 columns, with the required data in the 2nd one.
I'm sorry if this is a long explanation, but that is the gist of it. I'm using python 2.6. Does anyone have a clue about how I can do this? Or which language might be better for this. I also know Bash and Javascript.
Thanks in advance

Depending on the complexity and the volume of your data
for small Openpyxl,
for large pandas

Related

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!
You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

Easiest way to use database between two software

fellows! I hope you are doing fine these days.
Currently I am working on a project at work, whose final goal is to create an interface between 2 pieces of software (I cannot mention which software, as it is a research project). The steps are as follows:
There are 2 software which provide data about processes.
The goal is: when the data changes in the first software, the data corresponding to the same process should also change in the second software.
The data can be exported as Excel files or I can obtain it using Python (with scripts created by us). The leaders of the project proposed us to create a data base where we will store the data and compare the data from the two software in order to change it.
My question is: how it is the easiest way to do it and which database software is more suitable? Currently, I am used to work with Python, but I do not mind inserting the Excel files directly in the database.
Thank you in advance! I wish you a nice day, folks!

I need a starting point to code an app to extract text from pdf to excel

To start I just want to state that I'm an Electrical Engineer with basic knowledge of programming.
My requirement is as follows:
I want to create an app where I can load and view PDF files that
contain tables.
These PDF files tables are of irregular shapes and in a different
position on every page. (that's why tools like tabular couldn't help
me)
Each table entry is multiline and of irregular dimensions (I cannot
select a whole row at a time it has to be each element alone. simply
copying the lines to excel won't work either because it will need a
lot of formatting)
So I want to be able to select each table entry individually from the
table (like a selection or cropping box over the required text),
delete new line if there is a new line in the text and just keep spaces.
The generated excel (or access database I do not really mind any)
should be reviewable and saveable (if those are even words XD).
I have a good knowledge of python and a very elementary knowledge of Django and I'm seeking some expert who can tell me what do I really need to learn (and if possible where to learn it) to execute my project.
Is it very much for me to execute and if I can dedicate 10 hours a week, how much would it take me to execute such a project.
Thanks all for your help in advance.
Don't use Python, use Word. Open the pdf, then step through the tables collection to collect the data and put it into excel. See this for an example
Here are the advises i can provide you :
first of all, ask internet for questions :
https://lmddgtfy.net/?q=python%20library%20tabular%20pdf
-> Camelot , which is mentioned multiple time seems to be relevant
For the use of excel sheet, i present you one of the most famous library for manipulating DataFrame: Pandas
You can use small courses on internet which will offer you a quick ability to manage your project easier.
for the application, you can easily find on youtube courses on a library made by someone who will explain you how to do a basic application. It could offer you the entry point you are talking about. Then, You can just wonder what else do you need or simply want for making it better.
for the time needed, it depends on how much time do you need to understand the basics, how much time you spend on having a deeper comprehension. I think in one week, working during your free time with a real interest, it could be working( not perfect, but working, which is a good beginning)
PS: I am not sure if your question is relevant for the aims of stackoverflow. I suggest you to read this file. ( https://stackoverflow.com/help/how-to-ask)

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.
seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Categories

Resources