Python gzip gives error and asks for bytes like input - python

I'm unable to get this to work
import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()
I get output:
============== RESTART: C:/Users/Public/Documents/Trial_vr06.py ==============
Traceback (most recent call last):
File "C:/Users/Public/Documents/Trial_vr06.py", line 4, in <module>
f.write(content)
File "C:\Users\SONID\AppData\Local\Programs\Python\Python36\lib\gzip.py", line 260, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
This was linked in an answer Python Gzip - Appending to file on the fly to Is it possible to compress program output within python to reduce output file's size?
I've tried integer data but no effect. What is the issue here

by default gzip streams are binary (Python 3: gzip.open() and modes).
No problem in Python 2, but Python 3 makes a difference between binary & text streams.
So either encode your string (or use b prefix if it's a literal like in your example, not always possible)
f.write(content.encode("ascii"))
or maybe better for text only: open the gzip stream as text:
f = gzip.open('file.txt.gz', 'at', 9)
note that append mode on a gzip file works is not that efficient (Python Gzip - Appending to file on the fly)

In order to compress your string, it must be a binary value. In order to do that simply put a "b" in front of your string. This will tell the interpreter to read this as the binary value and not the string value.
content = b"Lots of content here"
https://docs.python.org/3/library/gzip.html

Related

Python: Convert data file format to string

I have a file that has the following output when file command is run:
#file test.bin
#test.bin : data
#file -i test.bin
#test.bin: application/octet-stream; charset=binary
I want to read the contents of this file and forward to a python library that accepts this read-data as a string.
file = open("test.bin", "rb")
readBytes = file.read() # python type : <class 'bytes'>
output = test.process(readBytes) # process expects a string
I have tried str(readBytes), however that did not work. I see that there are also unprintable strings in the file test.bin, as the output of strings test.bin produces far lesser output than the actual bytes present in the file.
Is there a way to convert the bytes read into strings? Or am I trying to achieve something that makes no sense at all?
Try to use Bitstring. It's good package for reading bits.
# import module
from bitstring import ConstBitStream
# read file
x = ConstBitStream(filename='file.bin')
# read 5 bits
output = x.read(5)
# convert to unsigned int
int_val = output.uint
Do you mean by?
output = test.process(readBytes.decode('latin1'))

Convert bytes to a file object in python

I have a small application that reads local files using:
open(diefile_path, 'r') as csv_file
open(diefile_path, 'r') as file
and also uses linecache module
I need to expand the use to files that send from a remote server.
The content that is received by the server type is bytes.
I couldn't find a lot of information about handling IOBytes type and I was wondering if there is a way that I can convert the bytes chunk to a file-like object.
My goal is to use the API is specified above (open,linecache)
I was able to convert the bytes into a string using data.decode("utf-8"),
but I can't use the methods above (open and linecache)
a small example to illustrate
data = 'b'First line\nSecond line\nThird line\n'
with open(data) as file:
line = file.readline()
print(line)
output:
First line
Second line
Third line
can it be done?
open is used to open actual files, returning a file-like object. Here, you already have the data in memory, not in a file, so you can instantiate the file-like object directly.
import io
data = b'First line\nSecond line\nThird line\n'
file = io.StringIO(data.decode())
for line in file:
print(line.strip())
However, if what you are getting is really just a newline-separated string, you can simply split it into a list directly.
lines = data.decode().strip().split('\n')
The main difference is that the StringIO version is slightly lazier; it has a smaller memory foot print compared to the list, as it splits strings off as requested by the iterator.
The answer above that using StringIO would need to specify an encoding, which may cause wrong conversion.
from Python Documentation using BytesIO:
from io import BytesIO
f = BytesIO(b"some initial binary data: \x00\x01")

readline() Produces Unexpected String

Getting some practice playing with dictionaries and file i/o today when a file gave me an unexpected output that I'm curious about. I wrote the following simple function that just takes the first line of a text file, breaks it into individual words, and puts each word into a dictionary:
def create_dict(file):
dict = {}
for i, item in enumerate(file.readline().split(' ')):
dict[i]= item
file.seek(0)
return dict
print "Enter a file name:"
f = open(raw_input('-> '))
dict1 = create_dict(f)
print dict1
Simple enough, in every case it produces exactly the expected output. Every case except for one. I have one text file that was created by piping the output of another python script to a text file via the following shell command:
C:\> python script.py > textFile.txt
When I use textFile.txt with my dictionary script, I get an output that looks like:
{0: '\xff\xfeN\x00Y\x00', 1: '\x00S\x00t\x00a\x00t\x00e\x00', 2: '\x00h\x00a\x00s\x00:\x00', 3: '\x00', 4: '\x00N\x00e\x00w\x00', 5: '\x00Y\x00o\x00r\x00k\x00\r\x00\n'}
What is this output called? Why does piping the output of the script to a text file via the command line produce a different type of string than any other text file? Why are there no visible differences when I open this file in my text editor? I searched and searched but I don't even know what that would be called as I'm still pretty new.
Your file is UTF-16 encoded. The first 2 characters is a Byte Order Marker (BOM) \xff and \xfe. Also you will notice that each character appears to take 2 bytes, one of which is \x00.
You can use the codecs module to decode for you:
import codecs
f = codecs.open(raw_input('-> '), 'r', encoding='utf-16')
Or, if you are using Python 3 you can supply the encoding argument to open().
I guess the problem you met is the 'Character Code' problem.
In python, the default character code is asciiļ¼Œso when you use the open() fuction to read the file, the value will be explain to ascii code.
But, the output may not know what the character code means, you need to decode the output message to see it 'normal like'.
As normal, the system use the utf-8 code to read, you can try to decode(item, 'utf-8').
And you can search for more information about character code, ascii, utf-8, unicode and the transfer method of them.
Hope can helping.
>>> import codecs
>>> codecs.BOM_UTF16_LE
'\xff\xfe'
To read utf-16 encoded file you could use io module:
import io
with io.open(filename, encoding='utf-16') as file:
words = [word for line in file for word in line.split()]
The advantage compared to codecs.open() is that it supports the universal newline mode like the builtin open(), and io.open() is the builtin open() in Python 3.

how to check if a file is a .gz file in Python

I am working on python input-output and was given a CSV file(possible gzipped)
. If it is gzipped, I have to decompress it, and then read it.
I was trying to read the first two bytes do like this:
def func(filename):
fi = open(filenam,"rb")
byte1 = fi.read(1)
byte2 = fi.read(1)
then I will check byte1 and byte2 to see if they are equal to 0x1f and 0x8b, then decompress it then print every line of it.
But when I run it, I got this error:
TypeError: 'NoneType' object is not iterable
I'm new to python, can anyone help?
Understanding from what you said in the comment - "that's all I have in the function" I would assume the issue is that the function has no return value. So probably the caller of the function tries to run on the result of a function call with no return value, i.e NoneType.
you need to use endwith() in Python to check whether a folder has .gz extension file then use gzip module to decompress it and read .gz contents
import os
import gzip
for file in os.listdir(r"C:\Directory_name"):
if file.endswith(".gz"):
print file
os.chdir(r"C:\Directory_name")
f = gzip.open(file, 'rb')
file_content = f.read()
f.close()
so here "file_content" variable will hold the data of your csv gzipped file

beginner python translator: I am unable to convert a list from shelve module into a string

I am trying to make a simple translator that from a dictionary in a shelve module I can type words in English and the program translates the input word by word and then puts the results into a .txt file. This is pretty much what I have so far.
import shelve
s = shelve.open("THAI.dat")
entry = input("English word")
define = input("Thai word")
s[entry]=define
text_file = open("THAI.txt", "w+")
trys = input("Input english word")
if trys in s:
print(s[trys])
part = s[trys]
text_file.write(part)
this is where the error appears. I think the problem is that part is a list and is should be a string to be able to be written to a .txt file. What should I do. I am just a beginner so I am probably missing something basic. This is the error.
Traceback (most recent call last):
File "C:\Users\Austen\Desktop\phython fun\thai translator.py", line 29, in <module>
text_file.write(part)
TypeError: must be str, not list
>>>
at the end I would like to be able to do this
text_file.readlines()
and then be able to even go into the text file and see the translation.
From your comments, besides not having s[entry]=[define], I think you need to read and write a Thai file using the right codec.
Assuming the file thai.dat was written with UTF-8 (an assumption) you now need to compare the strings using the same codec and the write your data file with the same codec.
As a start, try this line from your command shell:
python -c 'import sys; print sys.getdefaultencoding()'
If it prints ascii then you may need to set your default encoding to UTF-8 or the string comparisons will not work properly.
Also, you need open the output file in UTF-8 mode like so:
>>>import codecs
>>>f = codecs.open("THAI.txt", "w+", "utf-8")
Then write to this file as usual.

Categories

Resources