Making a unicode file into a different file

Making a unicode file into a different file - python

Currently, I have a Python script that opens a file and appends a Tweet from Twitter to a file. Part of my listener for stream listener goes like this:
class listener(StreamListener):
def __init__(self, api=None, path=None,outname='output',MAX_NUMBER_OF_TWEETS=100,TWEETS_PER_FILE=10,progress_bar=None):
self.api = api
self.path = path
self.count = 0
self.outname = outname
self.progress_bar = progress_bar
self.MAX_NUMBER_OF_TWEETS = MAX_NUMBER_OF_TWEETS
self.TWEETS_PER_FILE = TWEETS_PER_FILE
def on_data(self, data):
all_data = json.loads(data)
with open(filename,"a") as fid:
print>>fid,all_data
However, the file that is printed out is a Unicode file.
How would I get let's say a text file or a JSON file?

Unicode is a text file. What you are probably seeing is the output in the wrong output code (Say, the file could be in UTF-8 and your console might be in another coding).
Python let's you select the coding of the output file if you use the io module. On opening a file for write, you can select the coding you want to use. If you want to type on the screen, you should use the correct for the screen. (In Linux, use set to check the locale. No idea how to do that in Windows)

Related

Pytsk - Sending files to a server from a disk image

I am trying to send each file from a disk image to a remote server using paramiko.
class Server:
def __init__(self):
self.ssh = paramiko.SSHClient()
self.ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
self.ssh.connect('xxx', username='xxx', password='xxx')
def send_file(self, i_node, name):
sftp = self.ssh.open_sftp()
serverpath = '/home/paul/Testing/'
try:
sftp.chdir(serverpath)
except IOError:
sftp.mkdir(serverpath)
sftp.chdir(serverpath)
serverpath = '/home/Testing/' + name
sftp.putfo(fs.open_meta(inode = i_node), serverpath)
However when I run this I get an error saying that "pytsk.File has no attribute read".
Is there any other way of sending this file to the server?

After a quick investigation I think I found what your problem is. Paramiko's sftp.putfo expects a Python file object as the first parameter. The file object of Pytsk3 is a completely different thing. Your sftp object tries to perform "read" on this, but Pytsk3 file object does not have a method "read", hence the error.
You could in theory try expanding Pytsk3.File class and adding this method but I would not hold my breath that it actually works.
I would just read the file to a temporary one and send that. Something like this (you would need to make temp file name handling more clever and delete the file afterwards but you will get the idea):
serverpath = '/home/Testing/' + name
tmp_path = "/tmp/xyzzy"
file_obj = fs.open_meta(inode = i_node)
# Add here tests to confirm this is actually a file, not a directory
tha = open(tmp_path, "wb")
tha.write(file_obj.read_random(0, file_obj.info.meta.size))
tha.close()
rha = open(tmp_path, "rb")
sftp.putfo(rha, serverpath)
rha.close()
# Delete temp file here
Hope this helps. This will read the whole file in memory from fs image to be written to temp file, so if the file is massive you would run out of memory.
To work around that, you should read the file in chunks looping through it with read_random in suitable chunks (the parameters are start offset and amount of data to read), allowing you to construct the temp file in a chunks of for example a couple of megabytes.
This is just a simple example to illustrate your problem.
Hannu

Why are my pictures corrupted after downloading and writing them in python?

Preface
This is my first post on stackoverflow so I apologize if I mess up somewhere. I searched the internet and stackoverflow heavily for a solution to my issues but I couldn't find anything.
Situation
What I am working on is creating a digital photo frame with my raspberry pi that will also automatically download pictures from my wife's facebook page. Luckily I found someone who was working on something similar:
https://github.com/samuelclay/Raspberry-Pi-Photo-Frame
One month ago this gentleman added the download_facebook.py script. This is what I needed! So a few days ago I started working on this script to get it working in my windows environment first (before I throw it on the pi). Unfortunately there is no documentation specific to that script and I am lacking in python experience.
Based on the from urllib import urlopen statement, I can assume that this script was written for Python 2.x. This is because Python 3.x is now from urlib import request.
So I installed Python 2.7.9 interpreter and I've had fewer issues than when I was attempting to work with Python 3.4.3 interpreter.
Problem
I've gotten the script to download pictures from the facebook account; however, the pictures are corrupted.
Here is pictures of the problem: http://imgur.com/a/3u7cG
Now, I originally was using Python 3.4.3 and had issues with my method urlrequest(url) (see code at bottom of post) and how it was working with the image data. I tried decoding with different formats such as utf-8 and utf-16 but according to the content headers, it shows utf-8 format (I think).
Conclusion
I'm not quite sure if the problem is with downloading the image or with writing the image to the file.
If anyone can help me with this I'd be forever grateful! Also let me know what I can do to improve my posts in the future.
Thanks in advance.
Code
from urllib import urlopen
from json import loads
from sys import argv
import dateutil.parser as dateparser
import logging
# plugin your username and access_token (Token can be get and
# modified in the Explorer's Get Access Token button):
# https://graph.facebook.com/USER_NAME/photos?type=uploaded&fields=source&access_token=ACCESS_TOKEN_HERE
FACEBOOK_USER_ID = "**USER ID REMOVED"
FACEBOOK_ACCESS_TOKEN = "** TOKEN REMOVED - GET YOUR OWN **"
def get_logger(label='lvm_cli', level='INFO'):
"""
Return a generic logger.
"""
format = '%(asctime)s - %(levelname)s - %(message)s'
logging.basicConfig(format=format)
logger = logging.getLogger(label)
logger.setLevel(getattr(logging, level))
return logger
def urlrequest(url):
"""
Make a url request
"""
req = urlopen(url)
data = req.read()
return data
def get_json(url):
"""
Make a url request and return as a JSON object
"""
res = urlrequest(url)
data = loads(res)
return data
def get_next(data):
"""
Get next element from facebook JSON response,
or return None if no next present.
"""
try:
return data['paging']['next']
except KeyError:
return None
def get_images(data):
"""
Get all images from facebook JSON response,
or return None if no data present.
"""
try:
return data['data']
except KeyError:
return []
def get_all_images(url):
"""
Get all images using recursion.
"""
data = get_json(url)
images = get_images(data)
next = get_next(data)
if not next:
return images
else:
return images + get_all_images(next)
def get_url(userid, access_token):
"""
Generates a useable facebook graph API url
"""
root = 'https://graph.facebook.com/'
endpoint = '%s/photos?type=uploaded&fields=source,updated_time&access_token=%s' % \
(userid, access_token)
return '%s%s' % (root, endpoint)
def download_file(url, filename):
"""
Write image to a file.
"""
data = urlrequest(url)
path = 'C:/photos/%s' % filename
f = open(path, 'w')
f.write(data)
f.close()
def create_time_stamp(timestring):
"""
Creates a pretty string from time
"""
date = dateparser.parse(timestring)
return date.strftime('%Y-%m-%d-%H-%M-%S')
def download(userid, access_token):
"""
Download all images to current directory.
"""
logger = get_logger()
url = get_url(userid, access_token)
logger.info('Requesting image direct link, please wait..')
images = get_all_images(url)
for image in images:
logger.info('Downloading %s' % image['source'])
filename = '%s.jpg' % create_time_stamp(image['created_time'])
download_file(image['source'], filename)
if __name__ == '__main__':
download(FACEBOOK_USER_ID, FACEBOOK_ACCESS_TOKEN)

Answering the question of why #Alastair's solution from the comments worked:
f = open(path, 'wb')
From https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
(I was on a Mac, which explains why the problem wasn't reproduced for me.)

Alastair McCormack posted something that worked!
He said Try setting binary mode when you open the file for writing: f = open(path, 'wb')
It is now successfully downloading the images correctly. Does anyone know why this worked?

wxPython FileOutputStream error

I'm trying to learn wxPython/python and I want to save text in a file. I found this example
def OnSaveAs(self, e):
saveFileDialog = wx.FileDialog(self, "SAVE txt file", "", "", "Textdocument (*.txt)|*.txt", wx.FD_SAVE | wx.FD_OVERWRITE_PROMPT)
if saveFileDialog.ShowModal() == wx.ID_CANCEL:
return # User canceled
# save the current contents in the file
# this can be done with e.g. wxPython output streams:
output_stream = wx.FileOutputStream(saveFileDialog.GetPath())
#My question: Insert what to write to output_stream here?
if not out_stream.IsOk():
wx.LogError("Cannot save current contents in file '%s'."%saveFileDialog.GetPath())
return
I get the error
in OnSaveAs output_stream = wx.FileOutputStream(saveFileDialog.GetPath()) AttributeError 'module' object has no attribute 'FileOutputStream'
Shouldnt output_stream contain the path to the file i want to save. And then I write to output_stream to save text in the file?
Thanks in advance!

Just use the Python functions to open and write content to the file. Something like this:
output = open(saveFileDialog.GetPath(), 'w')
ouput.write(stuff)
ouput.close()
In almost all cases wxPython only wraps the wxWidgets classes and functions which do not already have an equivallent in Python, and the AttributeError is telling you that there is no wx.FileOutputStream class available.

Getting HTML body with cgitb

I'm using cgitb (python 2.7) to create html documents server end. I have on file that does a bunch of query and then produces html. I'd like to be able to link just the html so if I could print the html to a new file and link that that, it would work.
Is there a way to get the html the page will generate at the end of processing so that I can put it in a new file without keeping track of everything I've done so far along the way?
Edit: Found a snipped here: https://stackoverflow.com/a/616686/1576740
class Tee(object):
def __init__(self, name, mode):
self.file = open(name, mode)
self.stdout = sys.stdout
sys.stdout = self
def __del__(self):
sys.stdout = self.stdout
self.file.close()
def write(self, data):
self.file.write(data)
self.stdout.write(data)
You have to call it after you import cgi as it overrides stdout in what appears to be a less friendly way. But works like a charm.
I just did import cgi;.......
Tee(filname, "w") and then I have a link to the file.

From the Python Documentation
Optionally, you can save this information to a file instead of sending it to the browser.
In this case you would want to use
cgitb.enable(display=1, logdir=directory)

import cgitb
import sys
try:
...
except:
with open("/path/to/file.html", "w") as fd:
fd.write(cgitb.html(sys.exc_info()))

Lauch default editor (like 'webbrowser' module)

Is there a simple way to lauch the systems default editor from a Python command-line tool, like the webbrowser module?

Under windows you can simply "execute" the file and the default action will be taken:
os.system('c:/tmp/sample.txt')
For this example a default editor will spawn. Under UNIX there is an environment variable called EDITOR, so you need to use something like:
os.system('%s %s' % (os.getenv('EDITOR'), filename))

The modern Linux way to open a file is using xdg-open; however it does not guarantee that a text editor will open the file. Using $EDITOR is appropriate if your program is command-line oriented (and your users).

If you need to open a file for editing, you could be interested in this question.

You can actually use the webbrowser module to do this. All the answers given so far for both this and the linked question are just the same things the webbrowser module does behind the hood.
The ONLY difference is if they have $EDITOR set, which is rare. So perhaps a better flow would be:
editor = os.getenv('EDITOR')
if editor:
os.system(editor + ' ' + filename)
else:
webbrowser.open(filename)
OK, now that I’ve told you that, I should let you know that the webbrowser module does state that it does not support this case.
Note that on some platforms, trying to open a filename using this function, may work and start the operating system's associated program. However, this is neither supported nor portable.
So if it doesn't work, don’t submit a bug report. But for most uses, it should work.

As the committer of the python Y-Principle generator i had the need to check the generated files against the original and wanted to call a diff-capable editor from python.
My search pointed me to this questions and the most upvoted answer had some comments and follow-up issue that i'd also want to address:
make sure the EDITOR env variable is used if set
make sure things work on MacOS (defaulting to Atom in my case)
make sure a text can be opened in a temporary file
make sure that if an url is opened the html text is extracted by default
You'll find the solution at editory.py and the test case at test_editor.py in my project's repository.
Test Code
'''
Created on 2022-11-27
#author: wf
'''
from tests.basetest import Basetest
from yprinciple.editor import Editor
class TestEditor(Basetest):
"""
test opening an editor
"""
def test_Editor(self):
"""
test the editor
"""
if not self.inPublicCI():
# open this source file
Editor.open(__file__)
Editor.open("https://stackoverflow.com/questions/1442841/lauch-default-editor-like-webbrowser-module")
Editor.open_tmp_text("A sample text to be opened in a temporary file")
Screenshot
Source Code
'''
Created on 2022-11-27
#author: wf
'''
from sys import platform
import os
import tempfile
from urllib.request import urlopen
from bs4 import BeautifulSoup
class Editor:
"""
helper class to open the system defined editor
see https://stackoverflow.com/questions/1442841/lauch-default-editor-like-webbrowser-module
"""
#classmethod
def extract_text(cls,html_text:str)->str:
"""
extract the text from the given html_text
Args:
html_text(str): the input for the html text
Returns:
str: the plain text
"""
soup = BeautifulSoup(html_text, features="html.parser")
# kill all script and style elements
for script in soup(["script", "style"]):
script.extract() # rip it out
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
return text
#classmethod
def open(cls,file_source:str,extract_text:bool=True)->str:
"""
open an editor for the given file_source
Args:
file_source(str): the path to the file
extract_text(bool): if True extract the text from html sources
Returns:
str: the path to the file e.g. a temporary file if the file_source points to an url
"""
# handle urls
# https://stackoverflow.com/a/45886824/1497139
if file_source.startswith("http"):
url_source = urlopen(file_source)
#https://stackoverflow.com/a/19156107/1497139
charset=url_source.headers.get_content_charset()
# if charset fails here you might want to set it to utf-8 as a default!
text = url_source.read().decode(charset)
if extract_text:
# https://stackoverflow.com/a/24618186/1497139
text=cls.extract_text(text)
return cls.open_tmp_text(text)
editor_cmd=None
editor_env=os.getenv('EDITOR')
if editor_env:
editor_cmd=editor_env
if platform == "darwin":
if not editor_env:
# https://stackoverflow.com/questions/22390709/how-can-i-open-the-atom-editor-from-the-command-line-in-os-x
editor_cmd="/usr/local/bin/atom"
os_cmd=f"{editor_cmd} {file_source}"
os.system(os_cmd)
return file_source
#classmethod
def open_tmp_text(cls,text:str)->str:
"""
open an editor for the given text in a newly created temporary file
Args:
text(str): the text to write to a temporary file and then open
Returns:
str: the path to the temp file
"""
# see https://stackoverflow.com/a/8577226/1497139
# https://stackoverflow.com/a/3924253/1497139
with tempfile.NamedTemporaryFile(delete=False) as tmp:
with open(tmp.name,"w") as tmp_file:
tmp_file.write(text)
tmp_file.close()
return cls.open(tmp.name)
Stackoverflow answers applied
https://stackoverflow.com/a/45886824/1497139
https://stackoverflow.com/a/19156107/1497139
https://stackoverflow.com/a/24618186/1497139
How can I open the Atom editor from the command line in OS X?
https://stackoverflow.com/a/8577226/1497139
https://stackoverflow.com/a/3924253/1497139

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Making a unicode file into a different file - python

Related

Pytsk - Sending files to a server from a disk image

Why are my pictures corrupted after downloading and writing them in python?

wxPython FileOutputStream error

Getting HTML body with cgitb

Lauch default editor (like 'webbrowser' module)

Categories

Resources