Read and upload to GitHub non UTF-8 file Python

Read and upload to GitHub non UTF-8 file Python - python

I have code thats upload SQlite3 file to GitHub(module PyGithub).
import github
with open('server.db', 'r') as file:
content = file.read()
g = github.Github('token')
repo = g.get_user().get_repo("my-repo")
file = repo.get_contents("server.db")
repo.update_file("server.db", "Python Upload", content, file.sha, branch="main")
If you open this file through a text editor, then there will be characters that are not included in UTF-8, since this is a database file. I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd9 in position 99: invalid continuation byte
How i can fix it?
Maybe I can upload the file to GitHub so it is not text-based, like a PNG?

i use this.
f = open("yourtextfile", encoding="utf8")
contents = get_blob_content(repo, branch="main",path_name="yourfile")
repo.delete_file("yourfile", "test title", contents.sha)
repo.create_file("yourfile", "test title", f.read())
and this def
def get_blob_content(repo, branch, path_name):
# first get the branch reference
ref = repo.get_git_ref(f'heads/{branch}')
# then get the tree
tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
# look for path in tree
sha = [x.sha for x in tree if x.path == path_name]
if not sha:
# well, not found..
return None
# we have sha
return repo.get_git_blob(sha[0])

Related

Transposing files from columns to rows for multiple files

I have approximately 200 files (plus more in the future) that I need to transpose data from columns into rows. I'm a microbiologist, so coding isn't my forte (have worked with Linux and R in the past). One of my computer science friends was trying to help me write code in Python, but I have never used it before today.
The files are in .lvm format, and I'm working on a Mac. Items with 2 stars on either side are paths that I've hidden to protect my privacy.
The for loop is where I've been getting the error, but I'm not sure if that's where my problem lies or if it's something else.
This is the Python code I've been working on:
import os
lvm_directory = "/Users/**path**"
output_file = "/Users/**path**/Transposed.lvm"
newFile = True
output_delim = "\t"
for filename in os.listdir(lvm_directory):
header = []
data = []
f = open(lvm_directory + "/" + filename)
for l in f:
sl = l.split()
if (newFile):
header += [sl[1]]
f. close()
This is the error message I've been getting and I can't figure out how to work through it:
File "<pyshell#97>", line 5, in <module>
for l in f:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 345: invalid continuation byte
The rest of the code after this error is as follows, but I haven't worked through it yet due to the above error:
f = open(output_file, 'w')
f.write(output_delim.join(header))
newFile = False
else:
f = open(output_file, 'a')
f.write("\n"+output_delim.join(data))
f.close()

Looks like your files have a different encoding than the default utf-8 format. Probably ASCII. You'd use something like:
with open(lvm_directory + "/" + filename, encoding="ascii") as f:
for l in f:
# rest of your code here
^ It's generally more "pythonic" to use a with statement to handle resource management (i.e. opening and closing a file), hence the with approach demonstrated above. If your files aren't ASCII, see if any other encoding work. There are command-line tools like chardet that can help you identify the file's encoding.

Heroku Changes Encoding Automatically

I have created a Python/Selenium app which runs perfectly on local. The app reads information from a file called 'config.txt' . When I deploy the files to Heroku and run my app, I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Function I used for opening the text file:
def config_parse(path):
with open(path, "r+") as file:
lines = file.readlines()
if lines:
for line in lines:
splitted = line.replace('"', "").replace("\n", "").split(",")
data = {'name': splitted[0], 'city': splitted[1], 'date': splitted[2],
"time": splitted[3], 'priority': splitted[4], 'guests': splitted[5]}
try:
data['frequency'] = splitted[6]
data['ping_date'] = splitted[7]
data['ping_time'] = splitted[8]
except:
pass
entries.append(data)
#print("Parsed Data: %s" % entries)
file.truncate(0)
The error simply comes from readlines() method.
I checked the file charset through the Heroku Console but it seems to be utf-16-le whereas it is utf-8 on my local and it is already an empty file. I tried to change locale and add config vars to Heroku but nothing resolved it. Why does it make all the files either us-ascii or utf-16-le?
When I edit the file on the Heroku console and open it through the python console, it opens successfully but when I run the script, it fails to open it.

How to return file contents from controller?

I'm trying to return the contents of an image file via a Python Connexion application generated from an OpenAPI v2 spec file using swagger-codegen and the python-flask language setting. In my controller module, I simply do the following:
def file_contents_get(file_id):
file = app.datastore.get_instance().get_file(file_id)
with open(file.path, "rb") as f:
return f.read()
However, this results in the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
What is the proper way to return a file's contents? Note that I don't want the file as an attachment but rather inline.

How to save an encoded pdf in a zip file using django?

I've read some posts about this problem, but most of them didn't help my case, I'm trying to save an encoded pdf in a zip file (I'm using Docraptor API for the pdf generation, which return the encoded pdf).
def toZip(request, ...):
...
response = docraptor_api_call() #api call to generate pdf (encoded pdf)
with open('creation.pdf', 'wb') as f:
f.write(response)
#decode pdf
with open(f.name, 'rb') as pdf:
# this will download the pdf to the user
# doc = HttpResponse(pdf.read(), content_type='application/pdf')
# doc['Content-Disposition'] = "attachment; filename=filename.pdf"
# return doc
zip_io = io.BytesIO()
# create zipFile
zf = zipfile.ZipFile(zip_io, mode='w')
# write PDF in ZIP ?
save_zf = zf.write(pdf.read())
# save zip to FileField
zip = ZipStore.objects.create(zip=save_zf)
While trying the code on top I get this error :
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 43: character maps to
I'm don't really get what am I doing wrong and how I should fix it, any suggestion ?

You've got an error in the way you're calling zf.write. You should be using:
# ZipFile.write would take the the file to write, not bytes to be written.
# f.name is the name of the file in the zip archive. So if I passed
# in "foo.txt", "1", I'd get a file named `foo.txt` after decompressing, and its
# contents would be 1
zf.writestr(f.name, pdf.read())
This method does not appear to return something, so you'll need to change this: zip = ZipStore.objects.create(zip=save_zf) probably to:
zip = ZipStore.objects.create(zip=zip_io)

UnicodeEncodeError when reading pdf with pyPdf

Guys i had posted a question earlier pypdf python tool .dont mark this as duplicate as i get this error indicated below
import sys
import pyPdf
def convertPdf2String(path):
content = ""
# load PDF file
pdf = pyPdf.PdfFileReader(file(path, "rb"))
# iterate pages
for i in range(0, pdf.getNumPages()):
# extract the text from each page
content += pdf.getPage(i).extractText() + " \n"
# collapse whitespaces
content = u" ".join(content.replace(u"\xa0", u" ").strip().split())
return content
# convert contents of a PDF file and store retult to TXT file
f = open('a.txt','w+')
f.write(convertPdf2String(sys.argv[1]))
f.close()
# or print contents to the standard out stream
print convertPdf2String("/home/tom/Desktop/Hindi_Book.pdf").encode("ascii", "xmlcharrefreplace")
I get this error for a the 1st pdf file
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
and the following error for this pdf http://www.envis-icpe.com/pointcounterpointbook/Hindi_Book.pdf
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 38: ordinal not in range(128)
How to resolve this

I tried it myself and got the same result. Ignore my comment, I hadn't seen that you're writing the output to a file as well. This is the problem:
f.write(convertPdf2String(sys.argv[1]))
As convertPdf2String returns a Unicode string, but file.write can only write bytes, the call to f.write tries to automatically convert the Unicode string using ASCII encoding. As the PDF obviously contains non-ASCII characters, that fails. So it should be something like
f.write(convertPdf2String(sys.argv[1]).encode("utf-8"))
# or
f.write(convertPdf2String(sys.argv[1]).encode("ascii", "xmlcharrefreplace"))
EDIT:
The working source code, only one line changed.
# Execute with "Hindi_Book.pdf" in the same directory
import sys
import pyPdf
def convertPdf2String(path):
content = ""
# load PDF file
pdf = pyPdf.PdfFileReader(file(path, "rb"))
# iterate pages
for i in range(0, pdf.getNumPages()):
# extract the text from each page
content += pdf.getPage(i).extractText() + " \n"
# collapse whitespaces
content = u" ".join(content.replace(u"\xa0", u" ").strip().split())
return content
# convert contents of a PDF file and store retult to TXT file
f = open('a.txt','w+')
f.write(convertPdf2String(sys.argv[1]).encode("ascii", "xmlcharrefreplace"))
f.close()
# or print contents to the standard out stream
print convertPdf2String("Hindi_Book.pdf").encode("ascii", "xmlcharrefreplace")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read and upload to GitHub non UTF-8 file Python - python

Related

Transposing files from columns to rows for multiple files

Heroku Changes Encoding Automatically

How to return file contents from controller?

How to save an encoded pdf in a zip file using django?

UnicodeEncodeError when reading pdf with pyPdf

Categories

Resources