Python - Writing to a text file using functions? - python

i wrote a simple function to write into a text file. like this,
def write_func(var):
var = str(var)
myfile.write(var)
a= 5
b= 5
c= a + b
write_func(c)
this will write the output to a desired file.
now, i want the output in another format. say,
write_func("Output is :"+c)
so that the output will have a meaningful name in the file. how do i do it?
and why is that i cant write an integer to a file? i do, int = str(int) before writing to a file?

You can't add/concatenate a string and integer directly.
If you do anything more complicated than "string :"+str(number), I would strongly recommend using string formatting:
write_func('Output is: %i' % (c))

Python is a strongly typed language. This means, among other things, that you cannot concatenate a string and an integer. Therefore you'll have to convert the integer to string before concatenating. This can be done using a format string (as Nick T suggested) or passing the integer to the built in str function (as NullUserException suggested).

Simple, you do:
write_func('Output is' + str(c))
You have to convert c to a string before you can concatenate it with another string. Then you can also take off the:
var = str(var)
From your function.
why is that i cant write an integer to
a file? i do, int = str(int) before
writing to a file?
You can write binary data to a file, but byte representations of numbers aren't really human readable. -2 for example is 0xfffffffe in a 2's complement 32-bit integer. It's even worse when the number is a float: 2.1 is 0x40066666.
If you plan on having a human-readable file, you need to human-readable characters on them. In an ASCII file '0.5' isn't a number (at least not as a computer understands numbers), but instead the characters '0', '.' and '5'. And that's why you need convert your numbers to strings.

From http://docs.python.org/library/stdtypes.html#file.write
file.write(str)
Write a string to the file. There is no return value. Due to buffering,
the string may not actually show up in
the file until the flush() or close()
method is called.
Note how documentation specifies that write's argument must be a string.
So you should create a string yourself before passing it to file.write().

Related

hex header of file, magic numbers, python

I have a program in Python which analyses file headers and decides which file type it is. (https://github.com/LeoGSA/Browser-Cache-Grabber)
The problem is the following:
I read first 24 bytes of a file:
with open (from_folder+"/"+i, "rb") as myfile:
header=str(myfile.read(24))
then I look for pattern in it:
if y[1] in header:
shutil.move (from_folder+"/"+i,to_folder+y[2]+i+y[3])
where y = ['/video', r'\x47\x40\x00', '/video/', '.ts']
y[1] is the pattern and = r'\x47\x40\x00'
the file has it inside, as you can see from the picture below.
the program does NOT find this pattern (r'\x47\x40\x00') in the file header.
so, I tried to print header:
You see? Python sees it as 'G#' instead of '\x47\x40'
and if i search for 'G#'+r'\x00' in header - everything is ok. It finds it.
Question: What am I doing wrong? I want to look for r'\x47\x40\x00' and find it. Not for some strange 'G#'+r'\x00'.
OR
why python sees first two numbers as 'G#' and not as '\x47\x40', though the rest of header it sees in HEX? Is there a way to fix it?
with open (from_folder+"/"+i, "rb") as myfile:
header=myfile.read(24)
header = str(binascii.hexlify(header))[2:-1]
the result I get is:
And I can work with it
4740001b0000b00d0001c100000001efff3690e23dffffff
P.S. But anyway, if anybody will explain what was the problem with 2 first bytes - I would be grateful.
In Python 3 you'll get bytes from a binary read, rather than a string.
No need to convert it to a string by str.
Print will try to convert bytes to something human readable.
If you don't want that, convert your bytes to e.g. hex representations of the integer values of the bytes by:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (''.join ([hex (aByte) for aByte in aBytes]))
Output as redirected from the console:
b'\x00G#\x00\x13\x00\x00\xb0'
0x00x470x400x00x130x00x00xb0
You can't search in aBytes directly with the in operator, since aBytes isn't a string but an array of bytes.
If you want to apply a string search on '\x00\x47\x40', use:
aBytes = b'\x00\x47\x40\x00\x13\x00\x00\xb0'
print (aBytes)
print (r'\x'.join ([''] + ['%0.2x'%aByte for aByte in aBytes]))
Which will give you:
b'\x00G#\x00\x13\x00\x00\xb0'
\x00\x47\x40\x00\x13\x00\x00\xb0
So there's a number of separate issues at play here:
print tries to print something human readable, which succeeds only for the first two chars.
You can't directly search for bytearrays in bytearrays with in, so convert them to a string containing fixed length hex representations as substrings, as shown.

Storing int or str in the list

I created a text file and opened it in Python using:
for word_in_line in open("test.txt"):
To loop through the words in a line in txt file.
The text file only has one line, which is:
int 111 = 3 ;
When I make a list using .split():
print("Input: {}".format(word_in_line))
line_list = word_in_line.split()
It creates:
['int', '111', '=', '3', ';']
And I was looking for a way to check if line_list[1] ('111') is an integer.
But when I try type(line_list[1]), it says that its str because of ''.
My goal is to read through the txt file and see if it is integer or str or other data type, etc.
What you have in your list is a string. So the type coming is correct and expected.
What you are looking to do is check to see if what you have are all digits in your string. So to do that use the isdigit string method:
line_list[1].isdigit()
Depending on what exactly you are trying to validate here, there are cases where all you want are purely digits, where this solution provides exactly that.
There could be other cases where you want to check whether you have some kind of number. For example, 10.5. This is where isdigit will fail. For cases like that, you can take a look at this answer that provides an approach to check whether you have a float
I don't agree with the above answer.
Any string parsing like #idjaw's answer of line_list[1].isdigit() will fail on an odd edge case. For example, what if the number is a float and like .50 and starts with a dot? The above approach won't work. Technically we only care about ints in this example so this won't matter, but in general it is dangerous.
In general if you are trying to check whether a string is a valid number, it is best to just try to convert the string to a number and then handle the error accordingly.
def isNumber(string):
try:
val = int(string)
return True
except ValueError:
return False

why python not converts float or any other data type to string when written with string?

Suppose
p1="python"
p2="script"
r=0.242424
print(p1+" "+p2+" "+r) #wrong or error in python
print(p1+" "+p2+" "+str(r)) #correct
Why do I we have to convert a float to string explicitly in Python, but other languages like Java convert it implicitly?
You don't. You only need to convert it explicitly when concatenating. print() will use spaces to separate arguments on its own, converting each to a string:
print(p1, p2, r)
Quoting the print() documentation:
All non-keyword arguments are converted to strings like str() does and written to the stream, separated by sep and followed by end.
sep defaults to the ' ' string.
You usually use string formatting to interpolate values into a string:
print("Running a {} {} to show the value of r={:.6f}".format(p1, p2, r))
Otherwise, trying to concatenate strings with other values does not convert values implicitly; this goes against the Zen of Python:
Explicit is better than implicit.
If you want to join various data types into a string, concatenation with + or with the str.join method will require them to be strings before you do it.
As Martijn shows, it's not needed for printing, but that can be confusing because printing by default includes newlines and a space that separates elements separated by commas that are being printed.
If you want to implicitly convert them to strings, I recomend the string.format method:
>>> '{0} {1} {2}'.format(p1, p2, r)
'python script 0.242424'
This is quite a useful and efficient way to format text and not have to worry about the types, so long as you don't mind getting their default string representation.

Writing and reading headers with struct

I have a file header which I am reading and planning on writing which contains information about the contents; version information, and other string values.
Writing to the file is not too difficult, it seems pretty straightforward:
outfile.write(struct.pack('<s', "myapp-0.0.1"))
However, when I try reading back the header from the file in another method:
header_version = struct.unpack('<s', infile.read(struct.calcsize('s')))
I have the following error thrown:
struct.error: unpack requires a string argument of length 2
How do I fix this error and what exactly is failing?
Writing to the file is not too difficult, it seems pretty straightforward:
Not quite as straightforward as you think. Try looking at what's in the file, or just printing out what you're writing:
>>> struct.pack('<s', 'myapp-0.0.1')
'm'
As the docs explain:
For the 's' format character, the count is interpreted as the size of the string, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1.
So, how do you deal with this?
Don't use struct if it's not what you want. The main reason to use struct is to interact with C code that dumps C struct objects directly to/from a buffer/file/socket/whatever, or a binary format spec written in a similar style (e.g. IP headers). It's not meant for general serialization of Python data. As Jon Clements points out in a comment, if all you want to store is a string, just write the string as-is. If you want to store something more complex, consider the json module; if you want something even more flexible and powerful, use pickle.
Use fixed-length strings. If part of your file format spec is that the name must always be 255 characters or less, just write '<255s'. Shorter strings will be padded, longer strings will be truncated (you might want to throw in a check for that to raise an exception instead of silently truncating).
Use some in-band or out-of-band means of passing along the length. The most common is a length prefix. (You may be able to use the 'p' or 'P' formats to help, but it really depends on the C layout/binary format you're trying to match; often you have to do something ugly like struct.pack('<h{}s'.format(len(name)), len(name), name).)
As for why your code is failing, there are multiple reasons. First, read(11) isn't guaranteed to read 11 characters. If there's only 1 character in the file, that's all you'll get. Second, you're not actually calling read(11), you're calling read(1), because struct.calcsize('s') returns 1 (for reasons which should be obvious from the above). Third, either your code isn't exactly what you've shown above, or infile's file pointer isn't at the right place, because that code as written will successfully read in the string 'm' and unpack it as 'm'. (I'm assuming Python 2.x here; 3.x will have more problems, but you wouldn't have even gotten that far.)
For your specific use case ("file header… which contains information about the contents; version information, and other string values"), I'd just use write the strings with newline terminators. (If the strings can have embedded newlines, you could backslash-escape them into \n, use C-style or RFC822-style continuations, quote them, etc.)
This has a number of advantages. For one thing, it makes the format trivially human-readable (and human-editable/-debuggable). And, while sometimes that comes with a space tradeoff, a single-character terminator is at least as efficient, possibly more so, than a length-prefix format would be. And, last but certainly not least, it means the code is dead-simple for both generating and parsing headers.
In a later comment you clarify that you also want to write ints, but that doesn't change anything. A 'i' int value will take 4 bytes, but most apps write a lot of small numbers, which only take 1-2 bytes (+1 for a terminator/separator) if you write them as strings. And if you're not writing small numbers, a Python int can easily be too large to fit in a C int—in which case struct will silently overflow and just write the low 32 bits.

Is it possible to suppress Python's escape sequence processing on a given string without using the raw specifier?

Conclusion: It's impossible to override or disable Python's built-in escape sequence processing, such that, you can skip using the raw prefix specifier. I dug into Python's internals to figure this out. So if anyone tries designing objects that work on complex strings (like regex) as part of some kind of framework, make sure to specify in the docstrings that string arguments to the object's __init__() MUST include the r prefix!
Original question: I am finding it a bit difficult to force Python to not "change" anything about a user-inputted string, which may contain among other things, regex or escaped hexadecimal sequences. I've already tried various combinations of raw strings, .encode('string-escape') (and its decode counterpart), but I can't find the right approach.
Given an escaped, hexadecimal representation of the Documentation IPv6 address 2001:0db8:85a3:0000:0000:8a2e:0370:7334, using .encode(), this small script (called x.py):
#!/usr/bin/env python
class foo(object):
__slots__ = ("_bar",)
def __init__(self, input):
if input is not None:
self._bar = input.encode('string-escape')
else:
self._bar = "qux?"
def _get_bar(self): return self._bar
bar = property(_get_bar)
#
x = foo("\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
print x.bar
Will yield the following output when executed:
$ ./x.py
\x01\r\xb8\x85\xa3\x00\x00\x00\x00\x8a.\x03ps4
Note the \x20 got converted to an ASCII space character, along with a few others. This is basically correct due to Python processing the escaped hex sequences and converting them to their printable ASCII values.
This can be solved if the initializer to foo() was treated as a raw string (and the .encode() call removed), like this:
x = foo(r"\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34")
However, my end goal is to create a kind of framework that can be used and I want to hide these kinds of "implementation details" from the end user. If they called foo() with the above IPv6 address in escaped hexadecimal form (without the raw specifier) and immediately print it back out, they should get back exactly what they put in w/o knowing or using the raw specifier. So I need to find a way to have foo's __init__() do whatever processing is necessary to enable that.
Edit: Per this SO question, it seems it's a defect of Python, in that it always performs some kind of escape sequence processing. There does not appear to be any kind of facility to completely turn off escape sequence processing, even temporarily. Sucks. I guess I am going to have to research subclassing str to create something like rawstr that intelligently determines what escape sequences Python processed in a string, and convert them back to their original format. This is not going to be fun...
Edit2: Another example, given the sample regex below:
"^.{0}\xcb\x00\x71[\x00-\xff]"
If I assign this to a var or pass it to a function without using the raw specifier, the \x71 gets converted to the letter q. Even if I add .encode('string-escape') or .replace('\\', '\\\\'), the escape sequences are still processed. thus resulting in this output:
"^.{0}\xcb\x00q[\x00-\xff]"
How can I stop this, again, without using the raw specifier? Is there some way to "turn off" the escape sequence processing or "revert" it after the fact thus that the q turns back into \x71? Is there a way to process the string and escape the backslashes before the escape sequence processing happens?
I think you have an understandable confusion about a difference between Python string literals (source code representation), Python string objects in memory, and how that objects can be printed (in what format they can be represented in the output).
If you read some bytes from a file into a bytestring you can write them back as is.
r"" exists only in source code there is no such thing at runtime i.e., r"\x" and "\\x" are equal, they may even be the exact same string object in memory.
To see that input is not corrupted, you could print each byte as an integer:
print " ".join(map(ord, raw_input("input something")))
Or just echo as is (there could be a difference but it is unrelated to your "string-escape" issue):
print raw_input("input something")
Identity function:
def identity(obj):
return obj
If you do nothing to the string then your users will receive the exact same object back. You can provide examples in the docs what you consider a concise readable way to represent input string as Python literals. If you find confusing to work with binary strings such as "\x20\x01" then you could accept ascii hex-representation instead: "2001" (you could use binascii.hexlify/unhexlify to convert one to another).
The regex case is more complex because there are two languages:
Escapes sequences are interpreted by Python according to its string literal syntax
Regex engine interprets the string object as a regex pattern that also has its own escape sequences
I think you will have to go the join route.
Here's an example:
>>> m = {chr(c): '\\x{0}'.format(hex(c)[2:].zfill(2)) for c in xrange(0,256)}
>>>
>>> x = "\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34"
>>> print ''.join(map(m.get, x))
\x20\x01\x0d\xb8\x85\xa3\x00\x00\x00\x00\x8a\x2e\x03\x70\x73\x34
I'm not entirely sure why you need that though. If your code needs to interact with other pieces of code, I'd suggest that you agree on a defined format, and stick to it.

Categories

Resources