read a pdf in python [duplicate] - python

This question already has answers here:
How to read line by line in pdf file using PyPdf?
(3 answers)
Closed 7 years ago.
I want to read a pdf file in python. Tried some of the ways- PdfReader and pdfquery but not getting the result in string format. Want to have some of the content from that pdf file. is there any way to do that?

PDFminer is a tool for extracting information from PDF documents.

Does it matter in your case if file is pdf or not. If you just want to read your file as string, just open it as you would open a normal file.
E.g.-
with open('my_file.pdf') as file:
content = file.read()

Related

Write bold text to file [duplicate]

This question already has an answer here:
Plain-text formatting in python [closed]
(1 answer)
Closed 2 years ago.
i just want to write a string to a text file with part of this one in bold.
Is there any way to do it?
An example of what i am asking for:
2021/02/19: this is an example
You cannot use any type of markdown in a .txt file, using python or not.
You might want to check other files extensions such as .md files.
test = 'teste.rtf'
out_file = open(test,'w')
out_file.write("""{\\rtf1
This is \\b Bold \\b0\line\
}""")
out_file.close()

Import from csv file in vs code [duplicate]

This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 3 years ago.
I want to change my code from read and process from a string of sentence into read from a csv file, and process line by line.
This is my program in VS Code.
import paralleldots
paralleldots.set_api_key("API KEY")
# for single sentence
text="Come on, lets play together"
lang_code="en"
response=paralleldots.sentiment(text,lang_code)
print(response)
I expect the output is run for each line of sentence in a specific csv file, instead of just from a string of sentence.
fp = open('C:/Users/User/Desktop/hj.txt',encoding='utf-8' ,errors='ignore' ) # Open file on read mode
lines = fp.read().split("\n") # Create a list containing all lines
fp.close() # Close file
print(lines)
#print("----------------------------------------------...------\n")
print("\nThe emotion analysis results for each sentence in the file .txt :")
print("------------------------------------------------...------\n")
response=paralleldots.batch_emotion(lines)
print(response)
This worked well.

Read json file as input and output as pprint? [duplicate]

This question already has answers here:
How to prettyprint a JSON file?
(15 answers)
Closed 5 years ago.
I'm working with a large json file that is currently encoded as one long line.
This makes it unintelligable for other people to work with, so I want to render it using pprint.
At the moment I'm trying to import the full file and print as pprint but my output looks like this:
<_io.TextIOWrapper name='hash_mention.json' mode='r' encoding='UTF-8'>
My question is- what is that showing? How can I get it to output the json data as pprint?
The code I've written looks like this:
import pprint
with open('./hash_mention.json', 'r') as input_data_file:
pprint.pprint(input_data_file)
You opened the file in read mode but forgot to read the file contents.
Just change pprint.pprint(input_data_file) with pprint.pprint(input_data_file.read()) and voila!

How do I overwrite the text in a file using python3 so that the user can write more with out it interfering with old text? [duplicate]

This question already has answers here:
How to empty a file using Python
(2 answers)
Closed 6 years ago.
I'm confused and don't know what to do about this. I'm trying to overwrite the files text.
when opening a file with 'w' flag, it will rewrite the file if it exists.
with open('yourfile.ext', 'wt') as fileObj:
fileObj.write(stuff)

Convert a PDF to text with Python [duplicate]

This question already has answers here:
How to extract text from a PDF file?
(33 answers)
Closed 2 months ago.
How can I get the content of pdf file line by line in python? I have searched in stackoverflow but could not find any good answer. Notes: pyPdf gives assertion erro, if possible something with slate and pdfminer.
from the command line:python /path/to/pdf2txt.py -o text.txt /path/to/yourpdf.pdf
You can then just take the text file it makes and use for line in file:
If you want to be efficient you would have to change pdf2txt.py, and have outfp be a python iostring, which would avoid the making a file and then reading from it.

Categories

Resources