Python import txt formatting

Python import txt formatting - python

I have an Excel file with a list of numbers and I saved it as a .txt file and then went to say:
open_file = open('list_of_numbers.txt','r')
for number in open_file:
number = int(number)
while x < 20000:
if (x > number):
print number
x = x + 100
y = y + 100
And I received this error message:
ValueError: invalid literal for int() with base 10: '2100.00\r\n'
How can I strip the ' and the \r\n'?
My ultimate goal is to create another column next to the column of numbers and, if the number is 145 for example,
145, '100-199'
167, '100-199'
1167, '1100-1199'
that sort of output.

Let's put it as an answer. The problem is not \r\n. The problem is that you try to parse string that contains a float value as an integer. See (no line feed, new line characters):
>>> int("2100.00")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '2100.00'
(as you can see, the quotation marks ' are not part of the value, they just indicate that you are dealing with a string)
whereas
>>> int("2100\r\n")
2100
The documentation says:
If the argument is a string, it must contain a possibly signed decimal number representable as a Python integer, possibly embedded in whitespace.
where the Python integer literal definition can be found here.
Solution:
Use float:
>>> float("2100.00\r\n")
2100.0
then you can convert it to an integer if you want to (also consider round):
>>> int(float("2100.00\r\n"))
2100
Converting a float value to integer works (from the documentation):
Conversion of floating point numbers to integers truncates (towards zero).

To address your immediate problem, go with the answer by #Felix Kling.
If you are interested in your FUTURE problems, please read on.
(1) That \r is not part of the problem IN THIS PARTICULAR CASE, but is intriguing: Are you creating the file on Windows and reading it on Linux/OSX/etc? If so, you should open your text file with "rU" (universal newlines), so that the input line on Python has only the \n.
(2) In any case, it's a very good idea to do line = line.rstrip('\n') ... otherwise, depending on how you split up the lines, you may end up with your last field containing an unwanted \n.
(3) You may prefer to use xlrd to read from an Excel file directly -- this saves all sorts of hassles. [Dis]claimer: I'm the author of xlrd.

Try this:
number = int(number.strip(string.whitespace + "'"))
You will need to add import string to the beginning of the your script. See also: http://docs.python.org/library/stdtypes.html#str.strip

Related

Python equivalent of Ruby's Array#pack, how to pack unknown string length and bytes together

I am working my way through the book "Building Git", which goes through building Git with Ruby. I decided to write it in python while still following along in the book.
The author uses a function defined in ruby Array#pack to pack a git tree object. Git uses binary representation for the 40 character blob hash to reduce it to 20 bytes. In the authors words:
Putting everything together, this generates a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID. Ruby’s Array#pack
supports many more data encodings and is very useful for generating binary representations of
values. If you wanted to, you could implement all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte, but Array#pack is so convenient that I
usually reach for that first.
He uses the following code to implement this:
def to_s
entries = #entries.sort_by(&:name).map do |entry|
["#{ MODE } #{ entry.name }", entry.oid].pack(ENTRY_FORMAT)
end
with ENTRY_FORMAT = "Z*H40" and MODE = "100644".
entry is class that has :name and :oid attributes, representing the name and the SHA1 hash of a filename.
The goal is also explained by the author:
Putting everything together, this generates a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID. Ruby’s Array#pack
supports many more data encodings and is very useful for generating binary representations of
values. If you wanted to, you could implement all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte, but Array#pack is so convenient that I
usually reach for that first.
And the format "Z*H40" means the following:
Our usage here consists of two separate encoding instructions:
Z*: this encodes the first string, "#{ MODE } #{ entry.name }", as an arbitrary-length null-
padded string, that is, it represents the string as-is with a null byte appended to the end
H40: this encodes a string of forty hexadecimal digits, entry.oid, by packing each pair of
digits into a single byte as we saw in Section 2.3.3, “Trees on disk”
I have tried for many hours to replicate this in python using struct.pack and other various methods, but either i am not getting the format correct, or I am just missing something very obvious. In any case, this is what I currently have:
def to_s(self):
entries = sorted(self.entries, key=lambda x: x.name)
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
return packed_entries
but obviously this will give a concat error from bytes() to str().
Traceback (most recent call last):
File "jit.py", line 67, in <module>
database.store(tree)
File "/home/maslin/jit/pyJit/database.py", line 12, in store
string = obj.to_s()
File "/home/maslin/jit/pyJit/tree.py", line 40, in to_s
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
File "/home/maslin/jit/pyJit/tree.py", line 40, in <listcomp>
entries = [f"{self.MODE} {entry.name}" + entry.oid.encode() for entry in entries]
TypeError: can only concatenate str (not "bytes") to str
So then I tried to keep everything as a string, and tried using struct.pack to format it for me, but it gave me a struct.error: bad char in struct format error.
def to_s(self):
entries = sorted(self.entries, key=lambda x: x.name)
entries = [f"{self.MODE} {entry.name}" + entry.oid for entry in entries]
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
return packed_entries
And the traceback:
Traceback (most recent call last):
File "jit.py", line 67, in <module>
database.store(tree)
File "/home/maslin/jit/pyJit/database.py", line 12, in store
string = obj.to_s()
File "/home/maslin/jit/pyJit/tree.py", line 41, in to_s
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
File "/home/maslin/jit/pyJit/tree.py", line 41, in <genexpr>
packed_entries = b"".join(pack("!Z*40s", entry) for entry in entries)
struct.error: bad char in struct format
How can I pack a string for each entry consisting of the mode 100644,
a space, the filename, a null byte, and then twenty bytes for the object ID?
The author notes above that this can be done by "implementing all the maths for reading pairs of digits from
the object ID and turning each pair into a single byte", so if your solution involves this method, that is also ok.
P.S. this question did not help me nor did this.
P.P.S. ChatGPT was no help as well

So, I had to look this up. The binary format is simple,
the mode as an ascii byte string,
an ascii space
the filename as a byte string,
a null byte
the sha digest in binary format.
So,
mode = b"100644"
Note, mode is a bytes object. You should probably just have it as a bytes object,but if it is a string, you can just .encode it and it should work with utf-8 since it will only be in the ascii range.
Now, your filename is probably a string, e.g.:
filename = "foo.py"
Now, you didn't say exactly, but I presume your oid is the sha1 hexdigest, i.e. a length 40 string of the digest in hexadecimal. However, you probably should just work with the raw digest. Assuming you consumed
>>> import hashlib
>>> sha = hashlib.sha1(b"print('hello, world')")
>>> sha.hexdigest()
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
>>> sha.digest()
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
You want just the .digest() directly. You should probably just keep around the hash object and get whatever you need from there, or you can convert back and for, so if you have the hexdigest, you can get to the binary using:
>>> oid = sha.hexdigest()
>>> oid
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
>>> int(oid, 16).to_bytes(20)
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Bute really, if you are just going to keep one around, I'd keep the binary form, it seems more natural to me to convert to an int then format that in hex:
>>> oid = sha.digest()
>>> oid
b'\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
>>> int.from_bytes(oid)
1247667085693497210187506196029418989550863244446
>>> f"{int.from_bytes(oid):x}"
'da8b53bb595a2bd0161f6470a4c3a82f6aa1dc9e'
So, I'm going to assume you have:
>>> import hashlib
>>> mode = b"100644"
>>> filename = "foo.py"
>>> sha = hashlib.sha1(b"print('hello, world')")
>>> oid = sha.digest()
Now, there is no f-string-like interpolation for bytes-literals, but you can use the old-school % based formatting:
>>> entry = b"%s %s\x00%s" % (mode, filename.encode(), oid)
>>> entry
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Or since this is so simple, just concatenation:
>>> entry = mode + b" " + filename.encode() + b"\x00" + oid
>>> entry
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
Now, you could use struct.pack here, but it's a bit unwieldy. There's no good way to add a space except as a single characer. Also, you'd have to dynamically come up with the format string, since there is no format for "arbitrary sized, null terminated bytes string". But you can use an f-string and len(file.encode()) + 1. So it would need to be something like:
>>> struct.pack(f">6sc{len(filename.encode())+1}s20s", mode, b" ", filename.encode(), oid)
b'100644 foo.py\x00\xda\x8bS\xbbYZ+\xd0\x16\x1fdp\xa4\xc3\xa8/j\xa1\xdc\x9e'
>>> struct.pack(f">6sc{len(filename.encode())+1}s20s", mode, b" ", filename.encode(), oid) == entry
True

How to find a randomly placed numerical value after a string match

I have a text file as follows:
GROSS WE GHT
MARKS AND NUMBERS:
PCS:
(KILO):
POW- 40162463
PAF. 128993.1
BOM
1 USTER QUANTUM 3
1.10
VIA MUMBAI
AIRPORT/INDIA
CO210044158
Here the output I want is using regex and python print "weight= 1.10 Kilos".
import re
with open('file_new1.txt') as fd:
for line in fd:
match = re.search(r'KILO', line)
if match:
print('found')
I have made the following code to match KILO in the above text file. My question is How do I match the numeric 1.10 after I find string 'KILO'? Please note :
1) 1.10 is sample weight it can also have value of 2322.00 or other integer value
2) 1.10 always occurs after KILO and on a new line
3) String can have value of KILO or KG

The below loops through the file until it sees a line containing KILO or KG. It tries to convert every line after that one to a float, and returns that float when successful
def get_weight(fp):
next(line for line in fp if 'KILO' in line or 'KG' in line)
for line in fp:
try:
return float(line)
except ValueError:
continue
raise ValueError("No numeric line after 'KILO'")
with open('file_new1.txt') as fd:
print(get_weight(fd))
# 1.1
The search for KILO or KG is pretty basic, and prone to false positives. If you know it will always appear with other characteristics (Surrounded by parentheses, for example), you may want to include those in the search.

Placing variable in single quotes

I am receiving an integer error when reading from my CSV sheet. Its giving me problems reading the last column. I know theres characters in the last column but how do I define digit as a character. The API function psspy.two_winding_chg_4 requires an input using single quotes ' ' as shown below in that function(3rd element of the array)
Traceback (most recent call last):
File "C:\Users\RoszkowskiM\Desktop\win4.py", line 133, in <module>
psspy.two_winding_chng_4(from_,to,'%s'%digit,[_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i],[_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f, max_value, min_value,_f,_f,_f],[])
File ".\psspy.py", line 25578, in two_winding_chng_4
TypeError: an integer is required
ValueError: invalid literal for int() with base 10: 'T1'
The code:
for row in data:
data_location, year_link, from_, to, min_value,max_value,name2,tla_2,digit = row[5:14]
output = 'From Bus #: {}\tTo Bus #: {}\tVMAX: {} pu\tVMIN: {} pu\t'
if year_link == year and data_location == location and tla_2==location:
from_=int(from_)
to=int(to)
min_value=float(min_value)
max_value=float(max_value)
digit=int(digit)
print(output.format(from_, to, max_value, min_value))
_i=psspy.getdefaultint()
_f=psspy.getdefaultreal()
psspy.two_winding_chng_4(from_,to,'%s'%digit,[_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i,_i],[_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f,_f, max_value, min_value,_f,_f,_f],[])

The easiest and probable most usable option would be to used your own function to filter on only digits. Example:
def return_digits(string):
return int(''.join([x for x in string if x.isdigit()]))

How can I convert hexadecimal file data into ASCII?

Am writing a program with python gui. that program concept is when we run the prgm it will ask to open one file(witch contains hexa decimal value as TASK.txt) with read mode.
am storing the data of one line in one variable.
how can i convert that data into ascii value. Am new to python. This is my code:
import binascii
import base64
from tkinter import *
from tkinter.filedialog import askopenfilename
def callback():
with open(askopenfilename(),'r') as r:
next(r)
for x in r:
z = str(x[1:-2])
if len(z) % 2:
z = '0' + 'x' + z
print(binascii.unhexlify(z))
a = Button(text='select file', command=callback)
a.pack()
mainloop()
This is the error I am getting:
Exception in Tkinter callback
Traceback (most recent call last):
File "D:\python sw\lib\tkinter\__init__.py", line 1699, in __call__
return self.func(*args)
File "C:\Users\LENOVO\Downloads\hex2.py", line 16, in callback
print(binascii.unhexlify(z))
binascii.Error: Non-hexadecimal digit found"""

Just reread your question correctly, new answer:
Do not prefix with 0x since it does not work with unhexlify and won't even make the string-length even.
You need an even string length, since each pair of hex-digits represent one byte (being one character)
unhexlify returns a byte array, which can be decoded to a string using .decode()
As pointed out here you don't even need the import binascii and can convert hex-to-string with bytearray.fromhex("7061756c").decode()
list(map(lambda hx: bytearray.fromhex(hx).decode(),"H7061756c H7061756c61".replace("H","").split(" ")))
Returns ['paul', 'paula']
What I wrote before I thoroughly read your question
may still be of use
As PM 2Ring noted, unhexilify only works without prefixes like 0x.
Your hex-strings are separated by spaces and are prefixed with H, which must be removed. You already did this, but I think this can be done in a nicer way:
r = "H247314748F8 HA010001FD" # one line in your file
z_arrary = data.replace("H","").split(" ")
# this returns ['247314748F8','A010001FD']
# now we can apply unhexlify to all those strings:
unhexed = map(binascii.unhexlify, z_array)
# and print it.
print(list(unhexed))
This will throw you an Error: Odd-length string. Make sure you really want to unhexilify your data. As stated in the docs you'll need an even number of hexadecimal characters, each pair representing a byte.
If you want to convert the hexadecimal numbers to decimal integers numbers instead, try this one:
list(map(lambda hx: int(hx,16),"H247314748F8 HA010001FD".replace("H","").split(" ")))
int(string, base) will convert from one number system (hexadecimal has base 16) to decimal (with base 10).
** Off topic **
if len(z) % 2:
z = '0' + 'x' + z
Will lead to z still being of uneven length, since you added an even amount of characters.

Why am I getting an IndexError in Python 3 when indexing a string and not slicing?

I'm new to programming, and experimenting with Python 3. I've found a few topics which deal with IndexError but none that seem to help with this specific circumstance.
I've written a function which opens a text file, reads it one line at a time, and slices the line up into individual strings which are each appended to a particular list (one list per 'column' in the record line). Most of the slices are multiple characters [x:y] but some are single characters [x].
I'm getting an IndexError: string index out of range message, when as far as I can tell, it isn't. This is the function:
def read_recipe_file():
recipe_id = []
recipe_book = []
recipe_name = []
recipe_page = []
ingred_1 = []
ingred_1_qty = []
ingred_2 = []
ingred_2_qty = []
ingred_3 = []
ingred_3_qty = []
f = open('recipe-file.txt', 'r') # open the file
for line in f:
# slice out each component of the record line and store it in the appropriate list
recipe_id.append(line[0:3])
recipe_name.append(line[3:23])
recipe_book.append(line[23:43])
recipe_page.append(line[43:46])
ingred_1.append(line[46])
ingred_1_qty.append(line[47:50])
ingred_2.append(line[50])
ingred_2_qty.append(line[51:54])
ingred_3.append(line[54])
ingred_3_qty.append(line[55:])
f.close()
return recipe_id, recipe_name, recipe_book, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, \
ingred_3_qty
This is the traceback:
Traceback (most recent call last):
File "recipe-test.py", line 84, in <module>
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, ingred_3, ingred_3_qty = read_recipe_file()
File "recipe-test.py", line 27, in read_recipe_file
ingred_1.append(line[46])
The code which calls the function in question is:
print('To show list of recipes: 1')
print('To add a recipe: 2')
user_choice = input()
recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty, ingred_2, ingred_2_qty, \
ingred_3, ingred_3_qty = read_recipe_file()
if int(user_choice) == 1:
print_recipe_table(recipe_id, recipe_book, recipe_name, recipe_page, ingred_1, ingred_1_qty,
ingred_2, ingred_2_qty, ingred_3, ingred_3_qty)
elif int(user_choice) == 2:
#code to add recipe
The failing line is this:
ingred_1.append(line[46])
There are more than 46 characters in each line of the text file I am trying to read, so I don't understand why I'm getting an out of bounds error (a sample line is below). If I change to the code to this:
ingred_1.append(line[46:])
to read a slice, rather than a specific character, the line executes correctly, and the program fails on this line instead:
ingred_2.append(line[50])
This leads me to think it is somehow related to appending a single character from the string, rather than a slice of multiple characters.
Here is a sample line from the text file I am reading:
001Cheese on Toast Meals For Two 012120038005002
I should probably add that I'm well aware this isn't great code overall - there are lots of ways I could generally improve the program, but as far as I can tell the code should actually work.

This will happen if some of the lines in the file are empty or at least short. A stray newline at the end of the file is a common cause, since that comes up as an extra blank line. The best way to debug a case like this is to catch the exception, and investigate the particular line that fails (which almost certainly won't be the sample line you reproduced):
try:
ingred_1.append(line[46])
except IndexError:
print(line)
print(len(line))
Catching this exception is also usually the right way to deal with the error: you've detected a pathological case, and now you can consider what to do. You might for example:
continue, which will silently skip processing that line,
Log something and then continue
Bail out by raising a new, more topical exception: eg raise ValueError("Line too short").
Printing something relevant, with or without continuing, is almost always a good idea if this represents a problem with the input file that warrants fixing. Continuing silently is a good option if it is something relatively trivial, that you know can't cause flow-on errors in the rest of your processing. You may want to differentiate between the "too short" and "completely empty" cases by detecting the "completely empty" case early such as by doing this at the top of your loop:
if not line:
# Skip blank lines
continue
And handling the error for the other case appropriately.
The reason changing it to a slice works is because string slices never fail. If both indexes in the slice are outside the string (in the same direction), you will get an empty string - eg:
>>> 'abc'[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> 'abc'[4:]
''
>>> 'abc'[4:7]
''

Your code fails on line[46] because line contains fewer than 47 characters. The slice operation line[46:] still works because an out-of-range string slice returns an empty string.
You can verify that the line is too short by replacing
ingred_1.append(line[46])
with
try:
ingred_1.append(line[46])
except IndexError:
print('line = "%s", length = %d' % (line, len(line)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python import txt formatting - python

Try this: number = int(number.strip(string.whitespace + "'")) You will need to add import string to the beginning of the your script. See also: http://docs.python.org/library/stdtypes.html#str.strip

Related

Python equivalent of Ruby's Array#pack, how to pack unknown string length and bytes together

How to find a randomly placed numerical value after a string match

Placing variable in single quotes

How can I convert hexadecimal file data into ASCII?

Why am I getting an IndexError in Python 3 when indexing a string and not slicing?

Categories

Resources