Python: grouping items already in a list and reverse them - python

I have a binary file likes this:
00 01 02 04 03 03 03 03 00 05 06 03 03 03 03 03 00 07 03 03 03 03 03 03 ...
and I would like to make groups of 8 items each
[00 01 02 04 03 03 03 03] [00 05 06 03 03 03 03 03] [00 07 03 03 03 03 03 03]...
and then reverse the items inside each group like this:
[03 03 03 03 04 02 01 00] [03 03 03 03 03 06 05 00] [03 03 03 03 03 03 07 00]
I tried reverse() but it reverse all the list.
I've imagined something like that: in a loop I should count until 8 (or 7), make a group, reverse it, and then increment the row, count 8, reverse and so on but I am not able to code that.
I have tried
i=0
for item in (list_reverse):
i+=1
if i>8:
list_reverse.reverse()
i=0
but it doesn't work.
Maybe I should try a nested loop?

.split() the strings, then loop through it.
t = """00 01 02 04 03 03 03 03
00 05 06 03 03 03 03 03
00 07 03 03 03 03 03 03"""
out = ""
lines = t.split("\n")
for n, line in enumerate(lines):
lst = line.split(" ")
c = 0
reversed_lst = ""
while c < len(lst):
reversed_lst += (lst[len(lst)- c -1]) + " "; c+=1
if n != len(lines) - 1:
out += reversed_lst + "\n"
else:
out += reversed_lst
print(out)
Output:
03 03 03 03 04 02 01 00
03 03 03 03 03 06 05 00
03 03 03 03 03 03 07 00

This is a good usecase for Python's builtin bytearray. First, you can open the binary input file and use a bytearray to store its contents:
with open("binary.file", "rb") as f:
bytes = bytearray(f.read())
Then, we'll use your algorithm to loop over the bytes and store them in a variable called group (which is also a bytearray) every 8 iterations:
i = 0
groups = []
group = bytearray()
for byte in bytes:
i += 1
group.append(byte)
if i == 8:
groups.append(group)
i = 0
group = bytearray()
Just as quick sanity check, test the value of variable i because if it's not zero by now the final group would be less than 8 bytes long:
if i != 0:
raise EOFError("Input file does not align to 8 byte boundary!")
Finally, we'll reverse each group and print the output:
for group in groups:
group.reverse()
print(groups)
Depending on your usecase, you could also concatenate the reversed bytes and store them in another file or even overwrite the same file. Although my guess is if you would do that to any ordinary binary files like a JPEG or an EXE you will break them completely. Luckily, you could run the program again to restore them!

Related

How do I match multiline expressions with junk in the middle?

I'm trying to match a multiline expression from some logs we have. The biggest problem is due to race-conditions, we sometimes have to use a custom print function with a mutex, and sometimes (when that's not necessary) we just use printf. This results in two types of logs.
My solution was this monstrosity:
changed key '(\w+)' value: <((([0-9a-f]{2} *)+)(?:\n)*(?:<\d+> \w+ (?:.*?] \[\d+\])\s*)*)*>
Explanation of the above regex:
changed key '(\w+)' value: - This is how we detect a print (and save the keyname in a capture group).
<{regex}> - The value output starts with < and ends with >
([0-9a-f]{2} *) - The bytes are hexadecimal pairs followed by an optional space (because last byte doesn't have a space). Let's call this capture group 4.
({group4}+) - One or more of group 4.
(?:\n)* - There can be 0 or more newlines after this "XX " pair. (non-capture)
(?:<\d+> \w+ (?:.*?] \[\d+\])\s*)* - There can be 0 or more prints of the timestamp. (non-capture)
This works for the Case 2 logs, but not for the Case 1 logs. In Case 1, for some reason only the last line is matched.
Essentially, I'm trying to match this (two capture groups):
changed key '(\w+)' value: <({only hexadecimal pairs})>
group 1: key
group 2: value
Below is the dummy cases (same value in all cases):
// Case 1
<22213> Nov 30 00:00:00.287 [D1] [128]changed key 'KEY_NAME' value: <ab ab ab ab 00 00 00
<22213> Nov 30 00:00:00.287 [D1] [128]
<22213> Nov 30 00:00:00.287 [D1] [128]00 04 00 00
<22213> Nov 30 00:00:00.287 [D1] [128]ff ff
<22213> Nov 30 00:00:00.287 [D1] [128]00 00 00 11 00 00 00 00 00 21>
// Case 2
changed key 'KEY_NAME' value: <ab ab ab ab 00 00 00 00 04 00 00 ff ff 00 00 00 11 00 00 00 00 00 21>
// Case 2 with some newlines in the middle
changed key 'KEY_NAME' value: <ab ab ab ab 00 00 00 00
04 00 00 ff
ff 00 00 00 11 00
00 00 00 00 21>
The key isn't always the same key, so the value (and the value length) can change.
This approach starts by first stripping out the leading log content of each line, leaving behind the content you want to target. After that, it does an re.findall search using a regex pattern similar to the one you are already using.
inp = """<22213> Nov 30 00:00:00.287 [D1] [128]changed key 'KEY_NAME' value: <ab ab ab ab 00 00 00
<22213> Nov 30 00:00:00.287 [D1] [128]
<22213> Nov 30 00:00:00.287 [D1] [128]00 04 00 00
<22213> Nov 30 00:00:00.287 [D1] [128]ff ff
<22213> Nov 30 00:00:00.287 [D1] [128]00 00 00 11 00 00 00 00 00 21>"""
inp = re.sub(r'^<.*?>.*?(?:\s+\[.*?\])+', '', inp, flags=re.M)
matches = re.findall(r"changed key '(\w+)' value: <(.*?)>", inp, flags=re.S)
matches = [(x[0], re.sub(r'\s+', ' ', x[1])) for x in matches]
print(matches)
This prints:
[('KEY_NAME', 'ab ab ab ab 00 00 00 00 04 00 00 ff ff 00 00 00 11 00 00 00 00 00 21')]
Assuming there could be unwanted values in between 'KEY_NAME' value: < and the closing >, we can use re.findall on the second group to match all hexadecimal values:
inp = re.sub(r'^<.*?>.*?(?:\s+\[.*?\])+', '', inp, flags=re.M)
matches = re.findall(r"changed key '(\w+)' value: <(.*?)>", inp, flags=re.S)
matches = [(x[0], ' '.join(re.findall(r'\b[a-f0-9]{2}\b', x[1]))) for x in matches]
print(matches) # output same as above

Regex find greedy and lazy matches and all in-between

I have a sequence like such '01 02 09 02 09 02 03 05 09 08 09 ', and I want to find a sequence that starts with 01 and ends with 09, and in-between there can be one to nine double-digit, such as 02, 03, 04 etc. This is what I have tried so far.
I'm using w{2}\s (w{2} for matching the two digits, and \s for the whitespace). This can occur one to nine times, which leads to (\w{2}\s){1,9}. The whole regex becomes
(01\s(\w{2}\s){1,9}09\s). This returns the following result:
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
If I use the lazy quantifier ?, it returns the following result:
<regex.Match object; span=(0, 9), match='01 02 09 '>
How can I obtain the results in-between too. The desired result would include all the following:
<regex.Match object; span=(0, 9), match='01 02 09 '>
<regex.Match object; span=(0, 15), match='01 02 09 02 09 '>
<regex.Match object; span=(0, 27), match='01 02 09 02 09 02 03 05 09 '>
<regex.Match object; span=(0, 33), match='01 02 09 02 09 02 03 05 09 08 09 '>
You can extract these strings using
import re
s = "01 02 09 02 09 02 03 05 09 08 09 "
m = re.search(r'01(?:\s\w{2})+\s09', s)
if m:
print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] )
# => ['01 02 09 02 09 02 03 05 09 08 09', '01 02 09 02 09 02 03 05 09', '01 02 09 02 09', '01 02 09']
See the Python demo.
With the 01(?:\s\w{2})+\s09 pattern and re.search, you can extract the substrings from 01 to the last 09 (with any space separated two word char chunks in between).
The second step - [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])] - is to reverse the string and the pattern to get all overlapping matches from 09 to 01 and then reverse them to get final strings.
You may also reverse the final list if you add [::-1] at the end of the list comprehension: print( [x[::-1] for x in re.findall(r'(?=\b(90.*?10$))', m.group()[::-1])][::-1] ).
Here would be a non-regex answer that post-processes the matching elements:
s = '01 02 09 02 09 02 03 05 09 08 09 '.trim().split()
assert s[0] == '01' \
and s[-1] == '09' \
and (3 <= len(s) <= 11) \
and len(s) == len([elem for elem in s if len(elem) == 2 and elem.isdigit() and elem[0] == '0'])
[s[:i+1] for i in sorted({s.index('09', i) for i in range(2,len(s))})]
# [
# ['01', '02', '09'],
# ['01', '02', '09', '02', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09'],
# ['01', '02', '09', '02', '09', '02', '03', '05', '09', '08', '09']
# ]

How to extract IPv6 field attribute values from the header present in hexadecimal string value in a file using Python?

An IPv6 header has the following value :
68 01 00 00 31 02 FF 2A 01 3F 4D 9C 7E 11 14 56 19 DE A0 BD CD 17 FF CD DF 01 03 04 BC 2B 3A 4E 9D AB DE 9D AE 07 FF (IN TXT FILE)
After removing white spaces:
680100003102FF2A013F4D9C7E11145619DEA0BDCD17FFCDDF010304BC2B3A4E9DABDE9DAE07FF
The above header is present in a file and I'm trying to extract all the field attributes present in the string as mentioned in the picture.
Here the picture contains header field attribute details:
I tried slicing and checking if the value falls below the upper limit(by using byte size) but it doesn't work when alphabets (in the hex format) come into the picture.
Is there any optimal and error-free way to do this generically in python?

Reading EEPROM addresses in Python and perform operations

I am currently trying to match pattern for an eeprom dump text file to locate a certain address and then traverse 4 steps once I hit upon in the search. I have tried the following code for finding the pattern
regexp_list = ('A1 B2')
line = open("dump.txt", 'r').read()
pattern = re.compile(regexp_list)
matches = re.findall(pattern,line)
for match in matches:
print(match)
this scans the dump for A1 B2 and displays if found. I need to add more such addresses in search criteria for ex: 'C1 B2', 'D1 F1'.
I tried making the regexp_list as a list and not a tuple, but it didn't work.
This is one of the problem. Next when I hit upon the search, I want to traverse 4 places and then read the address from there on (See below).
Input:
0120 86 1B 00 A1 B2 FF 15 A0 05 C2 D1 E4 00 25 04 00
Here when the search finds A1 B2 pattern, I want to move 4 places i.e to save data from C2 D1 E4 from the dump.
Expected Output:
C2 D1 E4
I hope the explanation was clear.
#
Thanks to #kcorlidy
Here's the final piece of code which I had to enter to delete the addresses in the first column.
newtxt = (text.split("A0 05")[1].split()[4:][:5])
for i in newtxt:
if len(i) > 2:
newtxt.remove(i)
and so the full code looks like
import re
text = open('dump.txt').read()
regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"
for ele in re.findall(regex,text,re.MULTILINE):
print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))
print(text.split("A1 B2")[1].split()[4:][:5])
#selects the next 5 elements in the array including the address in 1st col
newtxt = (text.split("A1 B2")[1].split()[4:][:5])
for i in newtxt:
if len(i) > 2:
newtxt.remove(i)
Input:
0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00
Output:
C2 0130 D1 E4 00
C2 D1 E4 00
Using regex can extract text, but also you can complete it through split text.
Regex:
(A1\s+B2) string start with A1 + one or more space + B2
(\s+\w+){4} move 4 places
((\s+\w+(\s+\w{4})?){3}) extract 3 group of string, and There may be 4 unneeded characters in the group. Then combine them into one.
Split:
Note: If you have a very long text or multiple lines, don't use this way.
text.split("A1 B2")[1] split text to two part. the after is we need
.split() split by blank space and became the list ['FF', '15', 'A0', '05', 'C2', 'D1', 'E4', '00', '25', '04', '00']
[4:][:3] move 4 places, and select the first three
Test code:
import re
text = """0120 86 1B 00 A1 B2 FF 15 A0 05 C2 D1 E4 00 25 04 00
0120 86 1B 00 00 C1 FF 15 00 00 A1 B2 00 00 00 00 C2
0130 D1 E4 00 00 FF 04 01 54 00 EB 00 54 89 B8 00 00 """
regex = r"(A1\s+B2)(\s+\w+){4}((\s+\w{2}(\s\w{4})?){3})"
for ele in re.findall(regex,text,re.MULTILINE):
#remove the string we do not need, such as blankspace, 0123, \n
print(" ".join([ok for ok in ele[2].split() if len(ok) == 2]))
print( text.split("A1 B2")[1].split()[4:][:3] )
Output
C2 D1 E4
C2 D1 E4
['C2', 'D1', 'E4']

Negative look ahead python regex

I would like to regex match a sequence of bytes when the string '02 d0' does not occur at a specific position in the string. The position where this string of two bytes cannot occur are byte positions 6 and 7 starting with the 0th byte on the right hand side.
This is what I have been using for testing:
#!/usr/bin/python
import re
p0 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])| (0[^2])|(02 [^d])|(02 d[^0])) 01 c2 [\da-f]{2} [\da-f]{2} [\da-f]{2} 23')
p1 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])|(0[^2])|(02 [^d])|(02 d[^0])) 01')
p2 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (([^0])|(0[^2])|(02 [^d])|(02 d[^0]))')
p3 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (?!02 d0) 01')
p4 = re.compile('^24 [\da-f]{2} 03 (01|03) [\da-f]{2} [\da-f]{2} [\da-f]{2} (?!02 d0)')
yes = '24 0f 03 01 42 ff 00 04 a2 01 c2 00 c5 e5 23'
no = '24 0f 03 01 42 ff 00 02 d0 01 c2 00 c5 e5 23'
print p0.match(yes) # fail
print p0.match(no) # fail
print '\n'
print p1.match(yes) # fail
print p1.match(no) # fail
print '\n'
print p2.match(yes) # PASS
print p2.match(no) # fail
print '\n'
print p3.match(yes) # fail
print p3.match(no) # fail
print '\n'
print p4.match(yes) # PASS
print p4.match(no) # fail
I looked at this example, but that method is less restrictive than I need. Could someone explain why I can only match properly when the negative look ahead is at the end of the string? What do I need to do to match when '02 d0' does not occur in this specific bit position?
Lookaheads are "zero-width", meaning they do not consume any characters. For example, these two expressions will never match:
(?=foo)bar
(?!foo)foo
To make sure a number is not some specific number, you could use:
(?!42)\d\d # will match two digits that are not 42
In your case it could look like:
(?!02)[\da-f]{2} (?!0d)[\da-f]{2}
or:
(?!02 d0)[\da-f]{2} [\da-f]{2}

Categories

Resources