This is a very straightforward issue. I added an invisible text layer using page.insert_text().
After saving the modified pdf, I can use page.get_text() to retrieve the created text layer.
I would like to be able to eliminate that layer, buy couldn't find a function to do it.
The solution I've came up with is taking the pages as images and create a new pdf. But it seems like a very inefficient solution.
I would like to be able to solve this issue without using a different library other than fitz and it feels like it should be a solution within fitz, considering that page.get_text() can access the exact information I'm trying to eliminate
If you are certain of the whereabouts of your text on the page (and I understood that you are), simply use PDF redactions:
page.add_redact_annot(rect1) # remove text inside this rectangle
page.add_redact_annot(rect2)
...
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# the above removes everything intersecting any of the rects,
# but leaves images untouched
Obviously you can remove all text on the page by taking page.rect as the redaction rectangle.
Related
Is there a way to insert a picture inside a cell using pptx python?
I'm also thinking of finding the coordinate of the cell and adjust the numbers for inserting the picture, but can not find anything.
Thank you.
No, unfortunately not. Note that this is not a limitation of python-pptx, it is a limitation of PowerPoint in general. Only text can be placed in a table cell.
There is nothing stopping you from placing a picture shape above (in z-order) a table cell, which will look like the picture is inside. This is a common approach but unfortunately is somewhat brittle. In particular, the row height is not automatically adjusted to "fit" the picture and changes in the content of cells in prior rows can cause lower rows to "move down" and no longer be aligned with the picture. So this approach has some drawbacks.
Another possible approach is to use a picture as the background for a cell (like you might use colored shading or a texture). There is no API support for this in python-pptx and it's not without its own problems, but might be an approach worth considering.
I am working on an automatation program using tensorflow. But i need some data to bypass text based CAPTCHA and i try to gather some data(images actually) from sites. How can i take "clean" screenshots with the help of OpenCV. With "clean" i mean images without white blanks.
Note: I know that we can take screenshot of desired web element using selenium (refer to: https://www.lambdatest.com/blog/screenshots-with-selenium-webdriver/) but in this site there are two text based CAPTCHAs so the screenshot also include white blanks, which ı don't want to have. I also tried capturing images manually but because of my not sensitive hands images also include white blanks.
When I was trying to get the web element using selenium. I was not satisfied with the result because it has white blanks, which I don't want in my dataset
Normally the images look like that. All I want is getting two seperate images without a white blank
All I want is getting two seperate images without a white blank in order to use in my data for training. Could you please help me?
You could use Playwright and take an element screenshot with omitBackground enabled: https://playwright.dev/#version=v1.0.2&path=docs%2Fapi.md&q=elementhandlescreenshotoptions
I want to define an rectangular area on top of an image, that has got a specific width and a specific height and in which a text should be pasted. I want the text to stay inside of this text field. If the text is longer, its size should be decreased and at one point it should be converted into a multiline text. How can I define such a text field using Pillow in python3?
I am looking for a clean and minimalist solution.
Sadly, I have no idea how to do this, since I am very new to python. I also didn't find any helpful and fitting information on how to do this, in the Internet. Thanks in advance!
Is there a way to add a watermark containing some images (icons, data matrix codes, preferred in a vector format) and text to a PDF in a way that the original appearance of the PDF can be restored, whenever needed?
In other words, I want to implement the following:
- add a watermark to existing PDF
- remove this watermark whenever desired
The watermark is provided by me in whichever format it might be needed to achieve my goal.
I have found implementations written in Python like this solution using PyPDF2. But I have not found a way to remove the added watermark afterwards. Also I have found a solution to add and remove watermarks using iText, which unfortunatelly is not a Python library.
I am accessing the twitter streaming API. I generate a map using Basemap in python.
I want only certain parts of the map to change with time (for eg. every second). Is it hard to do?
Do I need to leave Basemap and look for something else? Please help!
You can send an ajax request and update the html contents dynamically.
A possible approach: divide the map into tiles, and treat each one separately; use Basemap to generate just the map-tile that contains new data, then update just that tile on your webpage using Ajax.
Of course, depending on the nature of changes to the data on your map, this approach may or may not work for you -- gerrymandering is not really possible.
You would need to write logic to understand which tile the new data belongs to, then use basemap to create a new image for that time, then intelligently update the tiled image. You will also have to play with margins and padding (both in matplotlib and in CSS) to cleanly piece the tiles together.
...
When the approach gets this complicated, one should re-evaluate whether better tools are available. Basemap doesn't sound like a good fit for what you need to do.