unpack requires a string argument of length 24

unpack requires a string argument of length 24 - python

I am not sure what I am doing wrong here but I am trying to open a file, trace1.flow, read the header information then throw the source IP and destination IP into dictionaries. This is done in Python running on a Fedora VM. I am getting the following error:
(secs, nsecs, booted, exporter, mySourceIP, myDestinationIP) = struct.unpack('IIIIII',myBuf)
struct.error: unpack requires a string argument of length 24
Here is my code:
import struct
import socket
#Dictionaries
uniqSource = {}
uniqDestination = {}
def int2quad(i):
z = struct.pack('!I', i)
return socket.inet_ntoa(z)
myFile = open('trace1.flow')
myBuf = myFile.read(8)
(magic, endian, version, headerLen) = struct.unpack('HBBI', myBuf)
print "Magic: ", hex(magic), "Endian: ", endian, "Version: ", version, "Header Length: ", headerLen
myFile.read(headerLen - 8)
try:
while(True):
myBuf = myFile.read(24)
(secs, nsecs, booted, exporter, mySourceIP, myDestinationIP) = struct.unpack('IIIIII',myBuf)
mySourceIP = int2quad(mySourceIP)
myDestinationIP = int2quad(myDestinationIP)
if mySourceIP not in uniqSource:
uniqSource[mySourceIP] = 1
else:
uniqSource[mySourceIP] += 1
if myDestinationIP not in uniqDestination:
uniqDestination[myDestinationIP] = 1
else:
uniqDestination[myDestinationIP] += 1
myFile.read(40)
except EOFError:
print "END OF FILE"

You seem to assume that file.read will raise EOFError on end of file, but this error is only raised by input() and raw_input(). file.read will simply return a string that's shorter than requested (possibly empty).
So you need to check the length after reading:
myBuf = myFile.read(24)
if len(myBuf) < 24:
break

Perhaps your have reached end-of-file. Check the length of myBuf:
len(myBuf)
It's probably less than 24 chars long. Also you don't need those extra parenthesis, and try to specify duplicated types using 'nI' like this:
secs, nsecs, booted, exporter, mySourceIP, myDestinationIP = struct.unpack('6I',myBuf)

Related

Python 3.6 script surprisingly slow on Windows 10 but not on Ubuntu 17.10

I recently had to write a challenge for a company that was to merge 3 CSV files into one based on the first attribute of each (the attributes were repeating in all files).
I wrote the code and sent it to them, but they said it took 2 minutes to run. That was funny because it ran for 10 seconds on my machine. My machine had the same processor, 16GB of RAM, and had an SSD as well. Very similar environments.
I tried optimising it and resubmitted it. This time they said they ran it on an Ubuntu machine and got 11 seconds, while the code ran for 100 seconds on the Windows 10 still.
Another peculiar thing was that when I tried profiling it with the Profile module, it went on forever, had to terminate after 450 seconds. I moved to cProfiler and it recorded it for 7 seconds.
EDIT: The exact formulation of the problem is
Write a console program to merge the files provided in a timely and
efficient manner. File paths should be supplied as arguments so that
the program can be evaluated on different data sets. The merged file
should be saved as CSV; use the id column as the unique key for
merging; the program should do any necessary data cleaning and error
checking.
Feel free to use any language you’re comfortable with – only
restriction is no external libraries as this defeats the purpose of
the test. If the language provides CSV parsing libraries (like
Python), please avoid using them as well as this is a part of the
test.
Without further ado here's the code:
#!/usr/bin/python3
import sys
from multiprocessing import Pool
HEADERS = ['id']
def csv_tuple_quotes_valid(a_tuple):
"""
checks if a quotes in each attribute of a entry (i.e. a tuple) agree with the csv format
returns True or False
"""
for attribute in a_tuple:
in_quotes = False
attr_len = len(attribute)
skip_next = False
for i in range(0, attr_len):
if not skip_next and attribute[i] == '\"':
if i < attr_len - 1 and attribute[i + 1] == '\"':
skip_next = True
continue
elif i == 0 or i == attr_len - 1:
in_quotes = not in_quotes
else:
return False
else:
skip_next = False
if in_quotes:
return False
return True
def check_and_parse_potential_tuple(to_parse):
"""
receives a string and returns an array of the attributes of the csv line
if the string was not a valid csv line, then returns False
"""
a_tuple = []
attribute_start_index = 0
to_parse_len = len(to_parse)
in_quotes = False
i = 0
#iterate through the string (line from the csv)
while i < to_parse_len:
current_char = to_parse[i]
#this works the following way: if we meet a quote ("), it must be in one
#of five cases: "" | ", | ," | "\0 | (start_of_string)"
#in case we are inside a quoted attribute (i.e. "123"), then commas are ignored
#the following code also extracts the tuples' attributes
if current_char == '\"':
if i == 0 or (to_parse[i - 1] == ',' and not in_quotes): # (start_of_string)" and ," case
#not including the quote in the next attr
attribute_start_index = i + 1
#starting a quoted attr
in_quotes = True
elif i + 1 < to_parse_len:
if to_parse[i + 1] == '\"': # "" case
i += 1 #skip the next " because it is part of a ""
elif to_parse[i + 1] == ',' and in_quotes: # ", case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#not including the quote and comma in the next attr
attribute_start_index = i + 2
in_quotes = False #the quoted attr has ended
#skip the next comma - we know what it is for
i += 1
else:
#since we cannot have a random " in the middle of an attr
return False
elif i == to_parse_len - 1: # "\0 case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#reached end of line, so no more attr's to extract
attribute_start_index = to_parse_len
in_quotes = False
else:
return False
elif current_char == ',':
if not in_quotes:
a_tuple.append(to_parse[attribute_start_index:i].strip())
attribute_start_index = i + 1
i += 1
#in case the last attr was left empty or unquoted
if attribute_start_index < to_parse_len or (not in_quotes and to_parse[-1] == ','):
a_tuple.append(to_parse[attribute_start_index:])
#line ended while parsing; i.e. a quote was openned but not closed
if in_quotes:
return False
return a_tuple
def parse_tuple(to_parse, no_of_headers):
"""
parses a string and returns an array with no_of_headers number of headers
raises an error if the string was not a valid CSV line
"""
#get rid of the newline at the end of every line
to_parse = to_parse.strip()
# return to_parse.split(',') #if we assume the data is in a valid format
#the following checking of the format of the data increases the execution
#time by a factor of 2; if the data is know to be valid, uncomment 3 lines above here
#if there are more commas than fields, then we must take into consideration
#how the quotes parse and then extract the attributes
if to_parse.count(',') + 1 > no_of_headers:
result = check_and_parse_potential_tuple(to_parse)
if result:
a_tuple = result
else:
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
else:
a_tuple = to_parse.split(',')
if not csv_tuple_quotes_valid(a_tuple):
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
#if the format is correct but more data fields were provided
#the following works faster than an if statement that checks the length of a_tuple
try:
a_tuple[no_of_headers - 1]
except IndexError:
raise TypeError('Error while parsing CSV line %s. Unknown reason' % to_parse)
#this replaces the use my own hashtables to store the duplicated values for the attributes
for i in range(1, no_of_headers):
a_tuple[i] = sys.intern(a_tuple[i])
return a_tuple
def read_file(path, file_number):
"""
reads the csv file and returns (dict, int)
the dict is the mapping of id's to attributes
the integer is the number of attributes (headers) for the csv file
"""
global HEADERS
try:
file = open(path, 'r');
except FileNotFoundError as e:
print("error in %s:\n%s\nexiting...")
exit(1)
main_table = {}
headers = file.readline().strip().split(',')
no_of_headers = len(headers)
HEADERS.extend(headers[1:]) #keep the headers from the file
lines = file.readlines()
file.close()
args = []
for line in lines:
args.append((line, no_of_headers))
#pool is a pool of worker processes parsing the lines in parallel
with Pool() as workers:
try:
all_tuples = workers.starmap(parse_tuple, args, 1000)
except TypeError as e:
print('Error in file %s:\n%s\nexiting thread...' % (path, e.args))
exit(1)
for a_tuple in all_tuples:
#add quotes to key if needed
key = a_tuple[0] if a_tuple[0][0] == '\"' else ('\"%s\"' % a_tuple[0])
main_table[key] = a_tuple[1:]
return (main_table, no_of_headers)
def merge_files():
"""
produces a file called merged.csv
"""
global HEADERS
no_of_files = len(sys.argv) - 1
processed_files = [None] * no_of_files
for i in range(0, no_of_files):
processed_files[i] = read_file(sys.argv[i + 1], i)
out_file = open('merged.csv', 'w+')
merged_str = ','.join(HEADERS)
all_keys = {}
#this is to ensure that we include all keys in the final file.
#even those that are missing from some files and present in others
for processed_file in processed_files:
all_keys.update(processed_file[0])
for key in all_keys:
merged_str += '\n%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str)
out_file.close()
if __name__ == '__main__':
# merge_files()
import cProfile
cProfile.run('merge_files()')
# import time
# start = time.time()
# print(time.time() - start);
Here is the profiler report I got on my Windows.
EDIT: The rest of the csv data provided is here. Pastebin was taking too long to process the files, so...
It might not be the best code and I know that, but my question is what slows down Windows so much that doesn't slow down an Ubuntu? The merge_files() function takes the longest, with 94 seconds just for itself, not including the calls to other functions. And there doesn't seem to be anything too obvious to me for why it is so slow.
Thanks
EDIT: Note: We both used the same dataset to run the code with.

It turns out that Windows and Linux handle very long strings differently. When I moved the out_file.write(merged_str) inside the outer for loop (for key in all_keys:) and stopped appending to merged_str, it ran for 11 seconds as expected. I don't have enough knowledge on either of the OS's memory management systems to be able to give a prediction on why it is so different.
But I would say that the way that the second one (the Windows one) is the more fail-safe method because it is unreasonable to keep a 30 MB string in memory. It just turns out that Linux sees that and doesn't always try to keep the string in cache, or to rebuild it every time.
Funny enough, initially I did run it a few times on my Linux machine with these same writing strategies, and the one with the large string seemed to go faster, so I stuck with it. I guess you never know.
Here's the modified code
for key in all_keys:
merged_str = '%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str + '\n')
out_file.close()

When I run your solution on Ubuntu 16.04 with the three given files, it seems to take ~8 seconds to complete. The only modification I made was to uncomment the timing code at the bottom and use it.
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
8.039648056030273
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
7.78482985496521
I rewrote my first attempt without using csv from the standard library and am now getting times of ~4.3 seconds.
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.332579612731934
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.305467367172241
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.27345871925354
This is my solution code (lettuce_merge.py):
from collections import defaultdict
def split_row(csv_row):
return [col.strip('"') for col in csv_row.rstrip().split(',')]
def merge_csv_files(files):
file_headers = []
merged_headers = []
for i, file in enumerate(files):
current_header = split_row(next(file))
unique_key, *current_header = current_header
if i == 0:
merged_headers.append(unique_key)
merged_headers.extend(current_header)
file_headers.append(current_header)
result = defaultdict(lambda: [''] * (len(merged_headers) - 1))
for file_header, file in zip(file_headers, files):
for line in file:
key, *values = split_row(line)
for col_name, col_value in zip(file_header, values):
result[key][merged_headers.index(col_name) - 1] = col_value
file.close()
quotes = '"{}"'.format
with open('lettuce_merged.csv', 'w') as f:
f.write(','.join(quotes(a) for a in merged_headers) + '\n')
for key, values in result.items():
f.write(','.join(quotes(b) for b in [key] + values) + '\n')
if __name__ == '__main__':
from argparse import ArgumentParser, FileType
from time import time
parser = ArgumentParser()
parser.add_argument('files', nargs='*', type=FileType('r'))
args = parser.parse_args()
start_time = time()
merge_csv_files(args.files)
print(time() - start_time)
I'm sure this code could be optimized even further but sometimes just seeing another way to solve a problem can help spark new ideas.

Index Error: list index out of range in python

I have project in internet security class. My partner started the project and wrote some python code and i have to continue from where he stopped. But i don't know python and i was planning to learn by running his code and checking how it works. however when i am executing his code i get an error which is "IndexError: list index out of range".
import os
# Deauthenticate devices
os.system("python2 ~/Downloads/de_auth.py -s 00:22:b0:07:58:d4 -d & sleep 30; kill $!")
# renew DHCP on linux "sudo dhclient -v -r & sudo dhclient -v"
# Capture DHCP Packet
os.system("tcpdump -lenx -s 1500 port bootps or port bootpc -v > dhcp.txt & sleep 20; kill $!")
# read packet txt file
DHCP_Packet = open("dhcp.txt", "r")
# Get info from txt file of saved packet
line1 = DHCP_Packet.readline()
line1 = line1.split()
sourceMAC = line1[1]
destMAC = line1[3]
TTL = line1[12]
length = line1[8]
#Parse packet
line = DHCP_Packet.readline()
while "0x0100" not in line:
line = DHCP_Packet.readline()
packet = line + DHCP_Packet.read()
packet = packet.replace("0x0100:", "")
packet = packet.replace("0x0110:", "")
packet = packet.replace("0x0120:", "")
packet = packet.replace("0x0130:", "")
packet = packet.replace("0x0140:", "")
packet = packet.replace("0x0150:", "")
packet = packet.replace("\n", "")
packet = packet.replace(" ", "")
packet = packet.replace(" ", "")
packet = packet.replace("000000000000000063825363", "")
# Locate option (55) = 0x0037
option = "0"
i=0
length = 0
while option != "37":
option = packet[i:i+2]
hex_length = packet[i+2:i+4]
length = int(packet[i+2:i+4], 16)
i = i+ length*2 + 4
i = i - int(hex_length, 16)*2
print "Option (55): " + packet[i:i+length*2 ] + "\nLength: " + str(length) + " Bytes"
print "Source MAC: " + sourceMAC
Thank you a lot

The index error probably means you have an empty or undefined section (index) in your lists. It's most likely in the loop condition at the bottom:
while option != "37":
option = packet[i:i+2]
hex_length = packet[i+2:i+4]
length = int(packet[i+2:i+4], 16)
i = i+ length*2 + 4
Alternatively, it could be earlier in reading your text file:
# Get info from txt file of saved packet
line1 = DHCP_Packet.readline()
line1 = line1.split()
sourceMAC = line1[1]
destMAC = line1[3]
TTL = line1[12]
length = line1[8]
Try actually opening the text file and make sure all the lines are referred to correctly.
If you're new to coding and not used to understanding error messages or using a debugger yet, one way to find the problem area is including print ('okay') between lines in the code, moving it down progressively until the line no longer prints.
I'm pretty new to python as well, but I find it easier to learn by writing your own code and googling what you want to achieve (especially when a partner leaves you code like that...). This website provides documentation on in-built commands (choose your version at the top): https://docs.python.org/3.4/contents.html,
and this website contains more in-depth tutorials for common functions: http://www.tutorialspoint.com/python/index.htm

I think the variable line1 that being split does not have as much as 13 numbers,so you will get error when executing statement TTL = line1[12].
Maybe you do not have the same environment as your partner worked with ，so the result you get(file dhcp.txt) by executing os.system("") maybe null(or with a bad format).
You should check the content of the file dhcp.txt or add statement print line1 after line1 = DHCP_Packet.readline() to check if it has a correct format.

unpack_from requires a buffer of at least 4 bytes

I am receiving a packet from client, consisting of many fields. I read all fields successfully, but when it comes to the last field which is tag_end, python gives me an error:
unpack_from requires a buffer of at least 4 bytes not found.
this is the code:
def set_bin(self, buf):
"""Reads a vector of bytes (probably received from network or
read from file) and tries to construct the packet structure
from it, by reading each packet member from the buffer. This
is somehow like deserializing the packet.
"""
assert isinstance(buf, bytearray), 'buffer type is not valid'
offset = 0
print("$$$$$$$$$$$$$$$$ set bin $$$$$$$$$$$$$$$$$")
try:
(self._tag_start, self._version, self._checksum, self._connection_id,
self._packet_seq) = Packet.PACKER_1.unpack_from(str(buf), offset)
except struct.error as e:
print(e)
raise DeserializeError(e)
except ValueError as e:
print(e)
raise DeserializeError(e)
#I=4 H=2 B=1
offset = Packet.OFFSET_GUID #14 correct
self._guid = buf[offset:offset+Packet.UUID_SIZE] #14-16 correct
offset = Packet.OFFSET_GUID + Packet.UUID_SIZE
print("$$$$$$$$$$$$$$$$ GUID read successfully $$$$$$$$$$$$$$$$$")
try:
(self._timestamp_sec, self._timestamp_microsec, self._command,
self._command_seq, self._subcommand, self._data_seq,
self._data_length) = Packet.PACKER_3.unpack_from(str(buf), offset)
except struct.error as e:
print(e)
raise DeserializeError(e)
except ValueError as e:
print(e)
raise DeserializeError(e)
print("$$$$$$$$$$$$$$$$ timestamps read successfully $$$$$$$$$$$$$$$$$")
offset = Packet.OFFSET_AUTHENTICATE
self._username = buf[offset:offset + self.USERNAME_SIZE] #Saman
offset += self.USERNAME_SIZE
print("$$$$$$$$$$$$$$$$ username read successfully $$$$$$$$$$$$$$$$$")
self._password = buf[offset:offset+self.USERNAME_SIZE]
offset += self.PASSWORD_SIZE
print("$$$$$$$$$$$$$$$$ password read successfully $$$$$$$$$$$$$$$$$")
self._data = buf[offset:offset+self._data_length]
offset = offset + self._data_length
print("$$$$$$$$$$$$$$$$ data read successfully $$$$$$$$$$$$$$$$$")
try:
(self._tag_end,) = Packet.PACKER_4.unpack_from(str(buf), offset)
except struct.error as e:
print(e)
raise DeserializeError(e)
except ValueError as e:
print(e)
raise DeserializeError(e)
print("$$$$$$$$$$$$$$$$ tag end read successfully $$$$$$$$$$$$$$$$$")
if len(buf) != Packet.PACKER.size + self._data_length:
print('failed to deserialize binary data correctly and construct the packet due to extra data')
else:
print('############### Deserialized Successfully')
and this is some constants used in the code:
STRUCT_FORMAT_STR = r'=IHIHH 16B IIHHHHH I 6c 9c' #Saman
STRUCT_FORMAT_STR_1 = r'=IHIHH'
STRUCT_FORMAT_STR_2 = r'=16B'
STRUCT_FORMAT_STR_3 = r'=IIHHHHH'
STRUCT_FORMAT_STR_4 = r'=I'
STRUCT_FORMAT_STR_5 = r'=6c'
STRUCT_FORMAT_STR_6 = r'=9c'
UUID_SIZE = 16
OFFSET_GUID = 14
#OFFSET_DATA = 48 #shifting offset data by 15 char
OFFSET_AUTHENTICATE = 48
PACKER = struct.Struct(str(STRUCT_FORMAT_STR)) #Saman
PACKER_1 = struct.Struct(str(STRUCT_FORMAT_STR_1))
PACKER_2 = struct.Struct(str(STRUCT_FORMAT_STR_2))
PACKER_3 = struct.Struct(str(STRUCT_FORMAT_STR_3))
PACKER_4 = struct.Struct(str(STRUCT_FORMAT_STR_4))
PACKER_5 = struct.Struct(str(STRUCT_FORMAT_STR_5))
PACKER_6 = struct.Struct(str(STRUCT_FORMAT_STR_6))
BYTES_TAG_START = PACKER_4.pack(TAG_START)
BYTES_TAG_END = PACKER_4.pack(TAG_END)
and initialization of the packet object, where it initializes the fields:
def init(self, **kwargs):
if 'buf' in kwargs:
self.set_bin(kwargs['buf'])
else:
assert kwargs['command'] in Packet.RTCINET_COMMANDS.values() and kwargs['subcommand'] in Packet.RTCINET_COMMANDS.values(), 'Undefined protocol command'
assert isinstance(kwargs['data'], bytearray), 'invalid type for data field'
for field in ('command', 'subcommand', 'data'):
setattr(self, '_' + field, kwargs[field])
self._tag_start = Packet.TAG_START
self._version = Packet.VERSION_CURRENT % (Packet.USHRT_MAX + 1)
self._checksum = Packet.CRC_INIT
self._connection_id = kwargs.get('connection_id', 0) % (Packet.USHRT_MAX + 1)
self._packet_seq = Packet.PACKET_SEQ
Packet.PACKET_SEQ = (Packet.PACKET_SEQ + 1) % (Packet.USHRT_MAX + 1)
self._guid = uuid.uuid4().bytes
dt = datetime.datetime.now()
self._timestamp_sec = int(time.mktime(dt.timetuple()))
self._timestamp_microsec = dt.microsecond
# self._command = kwargs['command']
self._command_seq = kwargs.get('command_seq', 0)
# self._subcommand = kwargs['subcommand']
self._data_seq = kwargs.get('data_seq', 0)
self._data_length = len(kwargs['data'])
self._username = Packet.USERNAME #Saman
self._password = Packet.PASSWORD
I have made sure that I read all fields in the right order, as it was written in the packet by the client program. but still I couldn't manage to solve this problem.
Do you have any idea how this could be solved?

The problem seems to be that you're converting things to str all over the place for no good reason.
In some places, like PACKER_1 = struct.Struct(str(STRUCT_FORMAT_STR_1)), it makes your code less readable and understandable, but doesn't affect the actual output. For example, STRUCT_FORMAT_STR_1 is already a str, so str(STRUCT_FORMAT_STR_1) is the same str.
But in other places, it's far worse than that. In particular, look at all the lines like Packet.PACKER_1.unpack_from(str(buf), offset). There, buf is a bytearray. (It has to be, because you assert it.) Calling str on a bytearray gives you the string representation of that bytearray. For example:
>>> b = bytearray(b'abc')
>>> len(b)
3
>>> s = str(b)
>>> s
"bytearray(b'abc')"
>>> len(s)
17
That string representation is obviously not generally going to have the same length as the actual buffer you're representing. So it's no wonder that you get errors about the length being wrong. (And if you got really unlucky and didn't have any such errors, you'd be reading garbage values instead.)
So, what should you do to convert the bytearray into something the struct module can handle? Nothing! As the docs say:
Several struct functions (and methods of Struct) take a buffer argument. This refers to objects that implement the Buffer Protocol and provide either a readable or read-writable buffer. The most common types used for that purpose are bytes and bytearray…

Python - Changing File Object within function

Sorry - My questions is how can I change a file object within a function from a different function?
I've been trying to work out this error in my first python script for too long now, Dr Google and the forums aren't helping me too much, but I'm hoping you can.
I have a looping function that generates alot of data and I would like to output it to a text file, and create a new text file after the third loop.
I have 2 functions defined, one to create the data hashes, the other to create the new files.
The new files are being created as expected (aaa.txt, baa.txt...etc) but the "hashit" function only ever writes to the first file (aaa.txt) even though the others are being created.
I have tried fo.close() fo.flush(), as well as referencing fo in the functions but can't seem to make it work. Also I've moved the fo.write from the function to the main body.
I have included a cut down version of the code that I've been using to troubleshoot this issue, the real one has several more loops increasing the string length.
Thanks in advance
import smbpasswd, hashlib
base = '''abcdefghijklmnopqrstuvwxyz '''
# base length 95
print(base)
baselen = len(base)
name = 'aaa.txt'
fo = open(name, "w")
print "Name of the file: ", fo.name
print "Closed or not : ", fo.closed
print "Opening mode : ", fo.mode
print "Softspace flag : ", fo.softspace
pw01 = 0
pw02 = 0
pw03 = 0
def hashit(passwd):
#2
# Need to install module
# sudo apt-get install python-smbpasswd
hex_dig_lm = smbpasswd.lmhash(passwd)
hex_dig_ntlm = smbpasswd.nthash(passwd)
#print '%s:%s' % smbpasswd.hash(passwd)
hash_md5 = hashlib.md5(passwd)
hex_dig_md5 = hash_md5.hexdigest()
print(passwd)
print(hex_dig_lm)
print(hex_dig_ntlm)
print(hex_dig_md5)
hashstring = passwd +","+ hex_dig_lm +","+ hex_dig_md5 + '\n'
fo.write(hashstring);
def newfile(name):
fo.flush()
fo = open(name, "a")
print("-------newfile------")
print "Name of the file: ", fo.name
print "Closed or not : ", fo.closed
print('NewFile : ' + name)
raw_input("\n\nPress the enter key to exit.")
# add 3rd digit
while (pw03 < baselen):
pwc03 = base[pw03]
name = pwc03 + 'aa.txt'
fo.close
newfile(name);
pw03 += 1
while (pw02 < baselen):
pwc02 = base[pw02]
pw02 += 1
while (pw01 < baselen):
pwc01 = base[pw01]
pw01 += 1
passwd = pwc03 + pwc02 + pwc01
hashit(passwd);
else:
pw01 = 0
else:
pw02 = 0
else:
pw03 = 0

In your newfile() function, add this line first:
global fo

Python - Read .b4u files - error sequence item 0: expected str instance, bytes found

I'm trying to use http://grantcox.com.au/2012/01/decoding-b4u-binary-file-format/ Python code to export .b4u files to HTML format, but for some reason after at the program point:
# find the initial caret position - this changes between files for some reason - search for the "Cards" string
for i in range(3):
addr = 104 + i*4
if ''.join(self.parser.read('sssss', addr)) == 'Cards':
caret = addr + 32
break
if caret is None:
return
I get the following error:
if ''.join(self.parser.read('sssss', addr)) == 'Cards':
TypeError: sequence item 0: expected str instance, bytes found
The Python version I'm using is: Python 3.3.1 (v3.3.1:d9893d13c628, Apr 6 2013, 20:25:12).
Any idea how to solve that problem?

I have got it working under Python 2.7.4 My Python 3.3.2 is giving me the same error. I'll get back to you if I find out how to port this piece of code to Python 3.x.x Must have something to do with unicode being default for strings in Python 3.
Here is a solution I came up with:
def read(self, fmt, offset):
if self.filedata is None:
return None
read = struct.unpack_from('<' + fmt, self.filedata, offset)
xread = []
for each in range(0,len(read)):
try:
xread.append(read[each].decode())
except:
xread.append(read[each])
read = xread
if len(read) == 1:
return read[0]
return read
def string(self, offset):
if self.filedata is None:
return None
s = u''
if offset > 0:
length = self.read('H', offset)
for i in range(length):
raw = self.read('H', offset + i*2 +2)
char = raw ^ 0x7E
s = s + chr(char)
return s
def plain_fixed_string(self, offset):
if self.filedata is None:
return None
plain_bytes = struct.unpack_from('<ssssssssssssssssssssssss', self.filedata, offset)
xplain_bytes = []
for each in range(0,len(plain_bytes)):
try:
xplain_bytes.append(plain_bytes[each].decode())
except:
xplain_bytes.append(plain_bytes[each])
plain_bytes = xplain_bytes
plain_string = ''.join(plain_bytes).strip('\0x0')
return plain_string
You can just use these methods instead of provided by original author.
Beware that you should also change unicode() to str() and unichr() to chr() if you see it anywhere. Also remember that print is a function and cannot be used without brackets ().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

unpack requires a string argument of length 24 - python

Related

Python 3.6 script surprisingly slow on Windows 10 but not on Ubuntu 17.10

Index Error: list index out of range in python

unpack_from requires a buffer of at least 4 bytes

Python - Changing File Object within function

Python - Read .b4u files - error sequence item 0: expected str instance, bytes found

Categories

Resources