Stuck at translation of FTP-uploadscript from Python2.x towards Python3.x

Stuck at translation of FTP-uploadscript from Python2.x towards Python3.x - python

Python script for ftp-upload of various types of files from local Raspberry to remote Webserver:
original is running on several Raspberries under Python2.x & Raspian_Buster (and earlier Raspian_versions) without any problems.
The txt-file for this upload is generated by a lua-script-setup like the one below
file = io.open("/home/pi/PVOutput_Info.txt", "w+")
-- Opens a file named PVOutput_Info.txt (stored under the designated sub-folder of Domoticz)
file:write(" === PV-generatie & Consumptie === \n")
file:write(" Datum = " .. WSDatum .. "\n")
file:write(" Tijd = " .. WSTijd .. "\n")
file:close() -- closes the open file
os.execute("chmod a+rw /home/pi/PVTemp_Info.txt")
Trying to upgrade this simplest version towards use with Python3.x & Raspian_Bullseye, but stuck with solving the reported error.
It looks as if the codec now has a problem with a byte 0xb0 in the txt-file.
Any remedy or hint to circumvent this problem?
#!/usr/bin/python3
# (c)2017 script compiled by Toulon7559 from various material from forums, version 0.1 for upload of *.txt to /
# Original script running under Python2.x and Raspian_Buster
# Version 0165P3 of 20230201 is an experimental adaptation towards Python3.x and Raspian_Bullseye
# --------------------------------------------------
# Line006 = Function for FTP_UPLOAD to Server
# --------------------------------------------------
# Imports for script-operation
import ftplib
import os
# Definition of Upload_function
def upload(ftp, file):
ext = os.path.splitext(file)[1]
if ext in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + file, open(file))
else:
ftp.storbinary("STOR " + file, open(file, "rb"), 1024)
# --------------------------------------------------
# Line020 = Actual FTP-Login & -Upload
# --------------------------------------------------
ftp = ftplib.FTP("<FTP_server>")
ftp.login("<Login_UN>", "<login_PW>")
# set path to destination directory
ftp.cwd('/')
# set path to source directory
os.chdir("/home/pi/")
# upload of TXT-files
upload(ftp, "PVTemp_Info.txt")
upload(ftp, "PVOutput_Info.txt")
# reset path to root
ftp.cwd('/')
print ('End of script Misc_Upload_0165P3')
print
Putty_CLI_Command
sudo python3 /home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py
Resulting report at Putty's CLI
Start of script Misc_Upload_0165P3
Traceback (most recent call last):
File "/home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py", line 39, in <module>
upload(ftp, "PVTemp_Info.txt")
File "/home/pi/domoticz/scripts/python/Misc_upload_0165P3a.py", line 25, in upload
ftp.storlines("STOR " + file, open(file))
File "/usr/lib/python3.9/ftplib.py", line 519, in storlines
buf = fp.readline(self.maxline + 1)
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 175: invalid start byte

I'm afraid that there's no easy mapping to the Python 3. Two simple, but not 1:1 solutions for Python 3 would be:
Consider uploading all files using a binary mode. I.e. get rid of the
if ext in (".txt", ".htm", ".html"):
ftp.storlines("STOR " + file, open(file))
else:
Or open the text file using the actual encoding that the files use (you have to find out):
open(file, encoding='cp1252')
See Error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
If you really need the exact functionality that you had in Python 2 (that is: Upload any text file, in whatever encoding, using FTP text transfer mode), it would be more complicated. The Python 2 basically just translates any of CR/LF EOL sequences in the file to CRLF (what is the requirement of the FTP specification), keeping the rest of the file intact.
You can copy FTP.storbinary code and implement the above translation of buf byte-wise (without decoding/recording which Python 3 FTP.storlines/readline does).
If the files are not huge, a simple implementation is to load whole file to memory, convert in memory and upload. This is not difficult, if you know that all your files use the same EOL sequence. If not, the translation might be more difficult.
Or you may even give up on the translation, as most FTP servers do not care (they can handle any common EOL sequence). Just use the FTP.storbinary code as it is, only change TYPE I to TYPE A (what you need to do even if you implement the translation as per the previous point).
Btw, you also need to close the file in any case, so the correct code would be like:
with open(file) as f:
ftp.storlines("STOR " + file, f)
Likewise for storbinary.

Related

Transposing files from columns to rows for multiple files

I have approximately 200 files (plus more in the future) that I need to transpose data from columns into rows. I'm a microbiologist, so coding isn't my forte (have worked with Linux and R in the past). One of my computer science friends was trying to help me write code in Python, but I have never used it before today.
The files are in .lvm format, and I'm working on a Mac. Items with 2 stars on either side are paths that I've hidden to protect my privacy.
The for loop is where I've been getting the error, but I'm not sure if that's where my problem lies or if it's something else.
This is the Python code I've been working on:
import os
lvm_directory = "/Users/**path**"
output_file = "/Users/**path**/Transposed.lvm"
newFile = True
output_delim = "\t"
for filename in os.listdir(lvm_directory):
header = []
data = []
f = open(lvm_directory + "/" + filename)
for l in f:
sl = l.split()
if (newFile):
header += [sl[1]]
f. close()
This is the error message I've been getting and I can't figure out how to work through it:
File "<pyshell#97>", line 5, in <module>
for l in f:
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 345: invalid continuation byte
The rest of the code after this error is as follows, but I haven't worked through it yet due to the above error:
f = open(output_file, 'w')
f.write(output_delim.join(header))
newFile = False
else:
f = open(output_file, 'a')
f.write("\n"+output_delim.join(data))
f.close()

Looks like your files have a different encoding than the default utf-8 format. Probably ASCII. You'd use something like:
with open(lvm_directory + "/" + filename, encoding="ascii") as f:
for l in f:
# rest of your code here
^ It's generally more "pythonic" to use a with statement to handle resource management (i.e. opening and closing a file), hence the with approach demonstrated above. If your files aren't ASCII, see if any other encoding work. There are command-line tools like chardet that can help you identify the file's encoding.

Python script that executes another python script

I'm pretty new to the world of python. I decided to do a project but came to a stop, after my script wouldn't execute the right way. In which I mean the script that I need to be executed on its own through another script keeps on giving me nothing or some syntax error instead of all the stuff that is supposed to be happening (converting files). The other script in question writes new lines into the other script to change the file name (to be converted) to the newest file. The file looks something like this:
import glob
import os.path
folder_path = r'C:\User\Desktop\Folder\Audio'
file_type = r'\*mp4'
files = glob.glob(folder_path + file_type)
max_file = max(files, key=os.path.getctime)
mp3_file = max_file.replace('.mp4', '')
with open ("file.py", 'w') as f:
f.write("")
with open ("file.py", 'w') as f:
f.write('from moviepy.editor import *\n' "mp4_file = '{}'\n"
"mp3_file = '{}.mp3'\n" 'videoclip = VideoFileClip(mp4_file)\n' 'audioclip = videoclip.audio\n'
'audioclip.write_audiofile(mp3_file)\n' 'audioclip.close()\n' 'videoclip.close()\n'.format(max_file, mp3_file))
exec(open("file.py").read())
Right now it gives this error:
Traceback (most recent call last):
File "C:\Users\Desktop\Folder\Audio\File Manager.py", line 19, in <module>
exec(open("file.py").read())
File "<string>", line 2
mp4_file = 'C:\User\Desktop\Folder\Audio\test.mp4'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
I plan not on using that exact line of code to execute my python file since there are many alternatives, but if I was on the right trail, then I might as well. The other file that's supposed to be executed has generic file converting code:
from moviepy.editor import *
mp4_file = 'C:\User\Desktop\Folder\Audio\test.mp4'
mp3_file = 'C:\User\Desktop\Folder\Audio\test.mp3'
videoclip = VideoFileClip(mp4_file)
audioclip = videoclip.audio
audioclip.write_audiofile(mp3_file)
audioclip.close()
videoclip.close()
Other solutions mostly gave me a blank inactive shell; if the answer to this problem that it's impossible, then it might as well be, and I'll take that as a valid answer, but please explain why.

Corrections
You are using different quotes while writing to the file, from single quotes ' to double ", update it to be more consistent.
The error is suggesting that while writing to the file it is also writing some unicode characters which it cannot read hence the unicode error (look at where the carrot ^ is pointing at, it's a blank space since it's not a printable character).
Suggestions
Don't just write to a file and then immediately read from it. Different operating systems have different behaviour for such repeated access which will give you strange issues (this is not your issue tho)
Just create a function extractMp3FromVideoFile which takes two arguements max_file and mp3_file
Instead of writing to a file and increasing the HDD IO simply put the file's code into a variable and then exec it.
Solution
import glob
import os.path
folder_path = r'C:\User\Desktop\Folder\Audio'
file_type = r'\*mp4'
files = glob.glob(folder_path + file_type)
max_file = max(files, key=os.path.getctime)
mp3_file = max_file.replace('.mp4', '')
code = "from moviepy.editor import *\nmp4_file = '{}'\nmp3_file = '{}.mp3'\nvideoclip = VideoFileClip(mp4_file)\naudioclip = videoclip.audio\naudioclip.write_audiofile(mp3_file)\naudioclip.close()\nvideoclip.close()\n".format(max_file, mp3_file)
exec(code)

Continuing for loop after exception in Python

So first of all I saw similar questions, but nothing worked/wasn't applicable to my problem.
I'm writing a program that is taking in a Text file with a lot of search queries to be searched on Youtube. The program is iterating through the text file line by line. But these have special UTF-8 characters that cannot be decoded. So at a certain point the program stops with a
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1826: character maps to
As I cannot check every line of my entries, I want it to except the error, print the line it was working on and continue at that point.
As the error is not happening in my for loop, but rather the for loop itself, I don't know how to write an try...except statement.
This is the code:
import urllib.request
import re
from unidecode import unidecode
with open('out.txt', 'r') as infh,\
open("links.txt", "w") as outfh:
for line in infh:
try:
clean = unidecode(line)
search_keyword = clean
html = urllib.request.urlopen("https://www.youtube.com/results?search_query=" + search_keyword)
video_ids = re.findall(r"watch\?v=(\S{11})", html.read().decode())
outfh.write("https://www.youtube.com/watch?v=" + video_ids[0] + "\n")
#print("https://www.youtube.com/watch?v=" + video_ids[0])
except:
print("Error encounted with Line: " + line)
This is the full error message, to see that the for loop itself is causing the problem.
Traceback (most recent call last):
File "ytbysearchtolinks.py", line 6, in
for line in infh:
File "C:\Users\nfeyd\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1826: character maps to
If you need an example of input I'm working with: https://pastebin.com/LEkwdU06

The try-except-block looks correct and should allow you to catch all occurring exceptions.
The usage of unidecode probably won't help you because non-ASCII characters must be encoded in a specific way in URLs, see, e.g., here.
One solution is to use urllib's quote() function. As per documentation:
Replace special characters in string using the %xx escape.
This is what works for me with the input you've provided:
import urllib.request
from urllib.parse import quote
import re
with open('out.txt', 'r', encoding='utf-8') as infh,\
open("links.txt", "w") as outfh:
for line in infh:
search_keyword = quote(line)
html = urllib.request.urlopen("https://www.youtube.com/results?search_query=" + search_keyword)
video_ids = re.findall(r"watch\?v=(\S{11})", html.read().decode())
outfh.write("https://www.youtube.com/watch?v=" + video_ids[0] + "\n")
print("https://www.youtube.com/watch?v=" + video_ids[0])
EDIT:
After thinking about it, I believe you are running into the following problem:
You are running the code on Windows, and apparently, Python will try to open the file with cp1252 encoding when on Windows, while the file that you shared is in UTF-8 encoding:
$ file out.txt
out.txt: UTF-8 Unicode text, with CRLF line terminators
This would explain the exception you are getting and why it's not being caught by your try-except-block (it's occurring when trying to open the file).
Make sure that you are using encoding='utf-8' when opening the file.

i ran your code, but i didnt have some problems. Do you have create virtual environment with virtualenv and install all the packages you use ?

How to decompress a GZIP file pulled from SFTP in Python3 the same way Mac OS's gunzip does it?

Okay, I've been stuck on this one for hours which should have only taken a few minutes of work.
I have the following code which pulls a gzipped CSV file from a datastore:
from ftplib import FTP_TLS
import gzip
import csv
ftps = FTP_TLS('waws-prod.net')
ftps.login(user='foo', passwd='bar')
resp = ftps.retrbinary('RETR data/WFSIV0606201701.700.csv.gz', gzip.open('WFSIV0606201701.700.csv.gz', 'wb').write)
The file appears in the pwd, and I can even open my Mac Decompression tool, and the original CSV is decompressed perfectly.
However, if I try to decompress this file in using the gzip Library, i can't get a UTF8 encoded string to parse:
f=gzip.GzipFile('WFSIV0606201701.700.csv.gz', 'rb')
s = f.read()
I get what appears to be UTF8 bytestrings, however utf8 decoder can't parse the string.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
BUT! If i download directly from the SFTP server using FileZilla, and i do run the gzip.GzipFile code above, it reads it perfectly. Something must be wrong with my downloader/reader but i haven't a clue as to what could be wrong.

resp = ftps.retrbinary('RETR data/WFSIV0606201701.700.csv.gz', gzip.open('WFSIV0606201701.700.csv.gz', 'wb').write)
This line downloads a compressed file, and then compresses it again when writing it to disk.
Replace gzip.open(...).write with open(...).write to write the compressed file directly.

UnicodeDecodeError while processing filenames

I'm using Python 2.7.3 on Ubuntu 12 x64.
I have about 200,000 files in a folder on my filesystem. The file names of some of the files contain html encoded and escaped characters because the files were originally downloaded from a website. Here are examples:
Jamaica%2008%20114.jpg
thai_trip_%E8%B0%83%E6%95%B4%E5%A4%A7%E5%B0%8F%20RAY_5313.jpg
I wrote a simple Python script that goes through the folder and renames all of the files with encoded characters in the filename. The new filename is achieved by simply decoding the string that makes up the filename.
The script works for most of the files, but, for some of the files Python chokes and spits out the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128)
Traceback (most recent call last):
File "./download.py", line 53, in downloadGalleries
numDownloaded = downloadGallery(opener, galleryLink)
File "./download.py", line 75, in downloadGallery
filePathPrefix = getFilePath(content)
File "./download.py", line 90, in getFilePath
return cleanupString(match.group(1).strip()) + '/' + cleanupString(match.group(2).strip())
File "/home/abc/XYZ/common.py", line 22, in cleanupString
return HTMLParser.HTMLParser().unescape(string)
File "/usr/lib/python2.7/HTMLParser.py", line 472, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
Here is the contents of my cleanupString function:
def cleanupString(string):
string = urllib2.unquote(string)
return HTMLParser.HTMLParser().unescape(string)
And here's the snippet of code that calls the cleanupString function (this code is not the same code in the traceback above but it produces the same error):
rootFolder = sys.argv[1]
pattern = r'.*\.jpg\s*$|.*\.jpeg\s*$'
reobj = re.compile(pattern, re.IGNORECASE)
imgs = []
for root, dirs, files in os.walk(rootFolder):
for filename in files:
foundFile = os.path.join(root, filename)
if reobj.match(foundFile):
imgs.append(foundFile)
for img in imgs :
print 'Checking file: ' + img
newImg = cleanupString(img) #Code blows up here for some files
Can anyone provide me with a way to get around this error? I've already tried adding
# -*- coding: utf-8 -*-
to the top of the script but that has no effect.
Thanks.

Your filenames are byte strings that contain UTF-8 bytes representing unicode characters. The HTML parser normally works with unicode data instead of byte strings, particularly when it encounters a ampersand escape, so Python is automatically trying to decode the value for you, but it by default uses ASCII for that decoding. This fails for UTF-8 data as it contains bytes that fall outside of the ASCII range.
You need to explicitly decode your string to a unicode object:
def cleanupString(string):
string = urllib2.unquote(string).decode('utf8')
return HTMLParser.HTMLParser().unescape(string)
Your next problem will be that you now have unicode filenames, but your filesystem will need some kind of encoding to work with these filenames. You can check what that encoding is with sys.getfilesystemencoding(); use this to re-encode your filenames:
def cleanupString(string):
string = urllib2.unquote(string).decode('utf8')
return HTMLParser.HTMLParser().unescape(string).encode(sys.getfilesystemencoding())
You can read up on how Python deals with Unicode in the Unicode HOWTO.

Looks like you're bumping into this issue. I would try reversing the order you call unescape and unquote, since unquote would be adding non-ASCII characters into your filenames, although that may not fix the problem.
What is the actual filename it is choking on?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.