I have a .txt file with text like the following:
Project Gutenberg Australia
a treasure-trove of literature
treasure found hidden with no evidence of ownership
but whenever I try to read this file in Python (using Flask - I'm uploading the file to a site), using the following lines:
if request.method == 'POST':
f = request.files['file']
f.save(secure_filename(f.filename))
f.stream.seek(0)
content = f.read()
return render_template("book.html", text=content)
My "book.html" file is like the following:
<pre>
{{ text }}
</pre>
I get something like the following:
b'\xef\xbb\xbf\r\nProject Gutenberg Australia\r\na treasure-trove of literature\r\ntreasure found hidden with no evidence of ownership\r\n\r\n\r\n\r\n\r\...]
How can I fix this so that what is displayed on my site is just like what is displayed in the .txt file? Do I need to modify "book.html" or just read the file differently with Python?
Thanks!
You just need to .decode() your bytes:
print(b'\xef\xbb\xbf\r\nProject Gutenberg Australia\r\na treasure-trove of literature\r\ntreasure found hidden with no evidence of ownership\r\n\r\n\r\n\r\n\r\...]'.decode())
gives:
Project Gutenberg Australia
a treasure-trove of literature
treasure found hidden with no evidence of ownership
\...]
Related
I have a pdf file. It contains of four columns and all the pages don't have grid lines. They are the marks of students.
I would like to run some analysis on this distribution.(histograms, line graphs etc).
I want to parse this pdf file into a Spreadsheet or an HTML file (which i can then parse very easily).
The link to the pdf is:
Pdf
this is a public document and is available on this domain openly to anyone.
note: I know that this can be done by exporting the file to text from adobe reader and then import it into Libre Calc or Excel. But i want to do this using a python script.
Kindly help me with this issue.
specs:
Windows 7
Python 2.7
Use PyPDF2:
from PyPDF2 import PdfFileReader
with open('CT1-All.pdf', 'rb') as f:
reader = PdfFileReader(f)
contents = reader.getPage(0).extractText().split('\n')
pass
When you print contents, it will look like this (I have trimmed it here):
[u'Serial NoRoll NoNameCT1 Marks (50)111MA20026KARADI KALYANI212AR10029MUKESH K
MAR5', u'312MI31004DEEPAK KUMAR7', u'413AE10008FADKE PRASAD DIPAK27', u'513AE10
22RAHUL DUHAN37', u'613AE30005HIMANSHU PRABHAT26.5', u'713AE30019VISHAL KUMAR39
, u'813AG10014HEMANT17', u'913AG10028SHRESTH KR KRISHNA37.51013AG30009HITESH ME
RA33.5', u'1113AG30023RACHIT MADHUKAR40.5', u'1213AR10002ACHARY SUDHEER11', u'1
13AR10004AMAN ASHISH20.5', u'1413AR10008ANKUR44', u'1513AR10010CHUKKA SHALEM RA
U11.5', u'1613AR10012DIKKALA VIJAYA RAGHAVA20.5', u'1713AR10014HRISHABH AMRODIA
1', u'1813AR10016JAPNEET SINGH CHAHAL19.5', u'1913AR10018K VIGNESH42.5', u'2013
R10020KAARTIKEY DWIVEDI49.5', u'2113AR10024LAKSHMISRI KEERTI MANNEY49', u'2213A
10026MAJJI DINESH9.5', u'2313AR10028MOUNIKA BHUKYA17.5', u'2413AR10030PARAS PRA
I have a question according to emails generated in my Django application. I created an HTML file, but I have question with my txt file and tags used.
In my HTML file:
links look like: My link
bold looks like: <b>My text</b>
variables look like: {{ my_variable }}
But in my txt file, HTML tags work ? How I can display links and bold text in my .txt file which will send by email ?
Thank you very much,
I have a piece of code that handles file uploads for me, and ideally I want to accept only text files (csv, tab delimited files, etc.) So I added this chunk of code:
mimetype = magic.from_buffer(request.FILES['docfile'].read(512), mime=True)
if form.is_valid() and mimetype == 'text/plain':
....
Just recently one of my users tried uploading a text file and the system rejected it, the mime for that file is:
file --mime-type -b input_file.txt
application/octet-stream
And of course, all of the previously uploaded files have been text/plain. What's the difference between these two? Is there a more "global" way to check if a file is a text file?
I found this answer which is probably relevant:
Yet another method based on file(1) behavior:
textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
Example:
is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024))
False
I have a form where User can fill either text to translate or attach a file. If the text to translate has been filled, I want to create a txt file from it so it seems like User uploaded a txt file.
if job_creation_form.is_valid():
cleaned_data_job_creation_form = job_creation_form.cleaned_data
try:
with transaction.atomic():
text = cleaned_data_job_creation_form.get('text_to_translate')
if text:
cleaned_data_job_creation_form['file']=create_txt_file(text)
Job.objects.create(
customer=request.user,
text_to_translate=cleaned_data_job_creation_form['text_to_translate'],
file=cleaned_data_job_creation_form['file']....
)
except Exception as e:
RaiseHttp404(request, 'Something went wrong :(')
return HttpResponseRedirect(reverse('review_orders'))
I though about creating a txt file like:
with open('name.txt','a') as f:
...
But there can be many problems - the directory where the file is saved, the name of the file which uploading handles automatically etc.
Do you know a better way?
In short:
If the text to translate has been filled, fake it so it looks like txt file has been uploaded.
use a tempfile maybe?
import tempfile
tmp = tempfile.TemporaryFile()
tmp.write("Hello World!\n")
Job.objects.create(file=File(tmp),...)
Hope this helps
I am generating a bunch of html emails in django, and I want to save them into a model, in a FileField. I can quite easily generate the html content and dump in into a File, but I want to create something that can be opened in email clients, e.g. an eml file. Does anyone know of a python or django module to do this? Just to be clear, I'm not looking for an alternative email backend, as I also want the emails to be sent when they're generated.
Edit: After a bit of reading, it looks to me like the EmailMessage.messge() should return the content that should be stored int he eml file. However, if I try to save it like this, the file generated is empty:
import tempfile
name = tempfile.mkstemp()[1]
fh = open(name, 'wb')
fh.write(bytes(msg.message()))
fh.close()
output = File(open(name, 'rb'), msg.subject[:50])
I want to use a BytesIO instead of a temp file, but the temp file is easier for testing.
EML file is actually a text file with name value pairs. A valid EML file would be like
From: test#example.com
To: test#example.com
Subject: Test
Hello world!
If you follow the above pattern and save it in file with .eml extension, thunderbird like email clients will parse and show them without any problem.
Django's EmailMessage.message().as_bytes() will return the content of the .eml file. Then you just need to save the file to the directory of your choice:
from django.core.mail import EmailMessage
msg = EmailMessage(
'Hello',
'Body goes here',
'from#example.com',
['to3#example.com'],
)
eml_content = msg.message().as_bytes()
file_name = "/path/to/eml_output.eml"
with open(file_name, "wb") as outfile:
outfile.write(eml_content)
I had the similar problem. I found ticket on Django site. Last comment suggests using django-eml-email-backend. It helps me and it is very useful and simple.
Example:
installing:
$ pip install django-eml-email-backend
using:
EMAIL_BACKEND = 'eml_email_backend.EmailBackend'
EMAIL_FILE_PATH = 'path/to/output/folder/'