I would love to have literary batch records on my computer, unfortunately this is not an option. I decided to write a program that records the layout of the letters on the board and the available letters on each line in a text file. My idea is to get it out of html with python, maybe with selenium. I know Python pretty well, but I can't say that about working with pages. Need an idea how to do it in the most efficient way (ie how to extract this data from an html page into an array or a list). The website is https://www.kurnik.pl/literaki.
I tried static methods, but it turned out to be a completely wrong approach and nothing came of it.
Thank you in advance.
If you have the algorithms and logic to convert the page into a list, I may have a solution. Try making a csv file (comma separated values), which makes a table sort-of layout that can also be converted back to an array with ease. E.g.
with open('board.csv', 'w') as f:
f.write(board data here)
f.close()
Related
How can I read pdf in python? I know one way of converting it to text, but I want to read the content directly from pdf.
Can anyone explain which module in python is best for pdf extraction?
I tried to use PyPDF2 package but it gives me inconsistent results. Also, i would like a lot to have a way to get the tables, the images, and remove the headers and the footers at least consistently, it doesn't need to happens 100% of the times. Thanks for your answers, i just need to find the right library. Thanks!
From another post that asked pretty much the same:
The answer depends if the question is general or specific to a single form. Your approach is reasonable for the general case, but there will be variability. If you have a pdf form that is a single form or report that has been created with different data at each iteration consider converting the form from pdf to postscript then see if you can parse the postscript.
Two utilities do this: pdf2ps and pdftops Try each. This approach may benefit if you know some postscript. With some luck the needed fields may be simple text strings. Worth a try.
I have an original text that I want to translate. I normally do it manually but I know I could save a lot of time translating automatically the most frequent words and expressions.
I will find out how to translate simple words, the problem is not here. I have read some books on python and I think using string manipulations can be done.
But I am lost about how to create the output file.
The output file will contain:
short empty forms ready to be filled wherever there is text that has not been translated
the translated words wherever they were in the original file
In the output file I will fill manually the empty forms, after pressing Tab the cursor should jump to the next exmpty form
I am lost here, I know how to do forms on html but the language I am used to is Python.
I would like to know what modules from Python I could use. I need some guidance on this.
Can you recommend me a book or a tool that explains how to do something similar to this?
This is what I want to do, assuming I have managed to create a simple database to translate colors from Spanish to English.
The first step contains the original file.
The second step contains the automatic translation.
In the third step I complete the manual translation.
After finishing everything is grouped into a normal txt file ready to be used.
I think it is quite clear. I don't expect people to tell me the code to do this, I just need to know what tools could be used to achieve my goal.
Thanks for editing.
To create an interface that works with a web browser, Flask for Python is a good method for creating webforms. There are tutorials available.
One method for storing data would be an SQLite file. That may be more than you need, so I'd recommend starting with a CSV file. Libraries exist in Python for both CSVs and SQLite.
Very general question here. Need ideas. I have a dataset with about 20 rows. I want to use python or R to automatically take each of these rows and create 1 pdf per row. The pdf is formatted in a particular way that I need to be able to play around with.
Imagine each row is a student's name, and I need to make a pdf "Report Card" for each student. The report card will have a designated spot that says "Math Grade" and then the value will come from the dataset.
I want to be able to hit run, and have all 20 of the pdfs save to a folder on my machine. Eventually, I may try to have this run on a server or something so it is fully automatic. The pdfs ultimately get emailed out to a distribution list.
I am very pretty familiar with R, and mildly familiar with Python. I have no experience in HTML, but is that what I need here?
Any tutorials, ideas, explanations of the process I should use would be appreciated.
I thought about using plot.ly.dash. But I think that is mostly for viewing in a web browser. I want pdfs, so I don't know if that will work.
To start I just want to state that I'm an Electrical Engineer with basic knowledge of programming.
My requirement is as follows:
I want to create an app where I can load and view PDF files that
contain tables.
These PDF files tables are of irregular shapes and in a different
position on every page. (that's why tools like tabular couldn't help
me)
Each table entry is multiline and of irregular dimensions (I cannot
select a whole row at a time it has to be each element alone. simply
copying the lines to excel won't work either because it will need a
lot of formatting)
So I want to be able to select each table entry individually from the
table (like a selection or cropping box over the required text),
delete new line if there is a new line in the text and just keep spaces.
The generated excel (or access database I do not really mind any)
should be reviewable and saveable (if those are even words XD).
I have a good knowledge of python and a very elementary knowledge of Django and I'm seeking some expert who can tell me what do I really need to learn (and if possible where to learn it) to execute my project.
Is it very much for me to execute and if I can dedicate 10 hours a week, how much would it take me to execute such a project.
Thanks all for your help in advance.
Don't use Python, use Word. Open the pdf, then step through the tables collection to collect the data and put it into excel. See this for an example
Here are the advises i can provide you :
first of all, ask internet for questions :
https://lmddgtfy.net/?q=python%20library%20tabular%20pdf
-> Camelot , which is mentioned multiple time seems to be relevant
For the use of excel sheet, i present you one of the most famous library for manipulating DataFrame: Pandas
You can use small courses on internet which will offer you a quick ability to manage your project easier.
for the application, you can easily find on youtube courses on a library made by someone who will explain you how to do a basic application. It could offer you the entry point you are talking about. Then, You can just wonder what else do you need or simply want for making it better.
for the time needed, it depends on how much time do you need to understand the basics, how much time you spend on having a deeper comprehension. I think in one week, working during your free time with a real interest, it could be working( not perfect, but working, which is a good beginning)
PS: I am not sure if your question is relevant for the aims of stackoverflow. I suggest you to read this file. ( https://stackoverflow.com/help/how-to-ask)
i have this little problem. Is there any way how to recognize where is the end of current page in text file ?
I have some PDF files which i convert to plain text. Now i want to split every page and work with this separated pages. Is there any solution to do it? I work with only converted documents - there are page numbers but that is not accurate when there are some other numbers in text.
PDFMiner which i use, have some variable for converting page, which i can use. But there is a lot of other programs for PDF converting and i want to write a program for general use.
have somebody please some advise ? Thank you