how to read formulas with xlrd - python

I am trying to do a parser, that reads several excel files. I need values usually at the bottom of a row where you find a sum of all upper elements. So the cell value is actually "=sum()" or "A5*0.5" lets say... To a user that opens this file with excel it appears like a number, which is fine. But if I try to read this value with ws.cell(x, y).value I do not get anything.
So my question is how to read this kind of fields with xlrd, if it is possible to read it like ws.cell(x, y).value or something similar?
thanks

As per the link for your question,I have posted above, the author of xlrd says, 'The work is 'in-progress' but is not likely to be available soon as the focus of xlrd lies elsewhere". By this, I assume that there is nothing much you can do about it. Note: this is based on author's comment on Jan, 2011.

Related

Parsing Excel sheet based on date, number of visitors, and printing email

I am trying to parse through an Excel sheet that has columns for the website name (column A), the number of visitors (F), a contact at that website's first name (B), one for last name (C), for email (E), and date it was last modified (L).
I want to write a python script that goes through the sheet and looks at sites that have been modified in the last 3 months and prints out the name of the website and an email.
It is pretty straightforward to do this. I think a little bit of googling can help you a lot. But in short, you need to use a library called Pandas which is a really powerful tool for handling spreadsheets, datasets, and table-based files.
Pandas documentation is very well written. You can use the tutorials provided within the documentation to work your way through the problem easily. However, I'll give you a brief overview of what you should do.
First open the spreadsheet (excel file) inside python using Pandas and load it into a data frame (read the docs and you'll understand).
Second Using one of the methods provided by pandas called where (actually there are a couple of methods) you can easily set a condition (like if date is older than some data) and get the masked data frame (which represents your spreadsheet) back from the method.

Adding new text into excel cell with another format using python & xlsxwriter

Hello I've recently embarked on a project that allows me to input some data into a python programme using Tinker. This the programme interface.
With this input after clicking "Go" it'll open an excel spreadsheet and write the start and end time for that specific date. My question is how do I write a code to have a different colour of text for the NEW text without altering say what was in the cell originally using xlsxwriter? Here's an example.
This is the original text/format for say 5th May cell in my excelsheet.
And after clicking Go, I hope to achieve this:
The coding of opening excel, writing, finding the cell, and saving. I'm ok with that.
I hope this is a clear enough question and hopefully it's an answer I can use!
Thanks!!
I think the GetCharacters function done on a Range, using win32com (pywincom) will do what you want.
ws.Range(cell/range as string).GetCharacters(start,end).Font.Color = [color ID]
After opening the workbook, I was able to do this to make characters 2-5 as Red:
ws.Range('A1').GetCharacters(2,5).Font.Color = -16776961
I got a lot of this from a previous question looking at bolding: How Do I Bold only part of a string in an excel cell with python
To get the color (and there is probably a better way), I went into Excel and used the macro recorded, and just changed the font the red and saw what the macro recorded called that color. So you could get the number ID for the colors you want from that.

Identify the edited location in the PDF modified by online editor www.ilovepdf.com using Python

I have an SBI bank statement PDF which is tampered/forged. Here is the link for the PDF.
This PDF is edited using online editor www.ilovepdf.com. The edited part is the first entry under the 'Credit' column. Original entry was '2,412.00' and I have modified it to '12.00'.
Is there any programmatic way either using Python or any other opensource technology to identify the edited/modified location/area of the PDF (i.e. BBOX(Bounding Box) around 12.00 credit entry in this PDF)?
2 things I already know:
Metadata (Info or XMP metadata) is not useful. Modify date of the metadata doesn't confirm if the PDF is compressed or indeed edited, it will change the modify date in both these cases. Also it doesn't give the location of the edit been done.
PyMuPDF SPANS JSON object is also not useful as the edited entry doesn't come at the end of the SPANS JSON, instead it's in the proper order of the text inside the PDF. Here is the SPAN JSON file generated from PyMuPDF.
Kindly let me know if anyone has any opensource solution to resolve this problem.
iLovePDF completely changes the whole text in the document. You can even see this, just open the original and the manipulated PDFs in two Acrobat Reader tabs and switch back and forth between them, you'll see nearly all letters move a bit.
Internally iLovePDF also rewrote the PDF completely according to its own preferences, and the edit fits in perfectly.
Thus, no, you cannot recognize the manipulated text based on this document alone because it technically is a completely different, a completely new one.

Writing to specific cells in csv

I have to use CSVs and make a list of people's contact detail, like emails , phone numbers and addresses.
I have a list of column names along the top: name, email, number, etc.
I need to write in a specific cell. User's can enter their name and then enter new information, like if they didn't have a phone number and now they do, they can enter it. I can find the row of a specific person as it starts with their name that I can search, but then I don't know how to write to the column of phone number.
My code is like this:
import csv
with open(csvfile.csv,a)as file:
reader=cvs.reader(file)
writer=csv.writer(file)
for row in file:
if row["First colunm"]==x:
row[1]="still don't have a phone"
writer.writerow(row)
The problem seems like it can't be both writing and reading at the same time, but i don't know what to do. I am using Python 3.
Cause your a student I am going to push you to figure out the total answer on your own. But the tool you will want to use is
pandas.iloc
This is an integer based finding and it could be as simple as
df.iloc[0,1] = whatever you need it to be.
Hopefully this gets you a step closer :)
Best,
Andy
EDIT Realized your just Using CSV
If you can, I would recommend loading your dataframe through Pandas to work with CSV. Its an overall more powerful tool that packs in alot of what you will need to solve this issues.
If you want I can help you set up the pandas but see this for the answer regarding CSV module
Writing to a particular cell using csv module in python
Sorry for my mistake in not reading as fully,
Best,
Andy

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.
seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

Categories

Resources