re.sub replace hexadecimal with string - python

I have a file that contains hexadecimal numbers that I want to convert to strings:
'\x73\x63\x6f\x72\x65\x73': '\x4c\x6f\x72\x65\x6d\x20\x69\x70\x73\x75\x6d',
'Status', ['\x75\x70\x64\x61\x74\x65']
But when using re.sub to replace each occurrence of a hexadecimal escaped number by its ascii representation, it doesn't seem to find the hexadecimal number in the first place.
I've tried using raw strings, but it didn't change anything. I still can't replace them.
import re, binascii
with open('hex.txt', 'r') as f:
file = f.read()
hexList = re.findall(r"'([\\x\w+]*)'", file)
for item in hexList:
file = re.sub(r"('{}')".format(item), str(binascii.unhexlify(item.replace('\\x', ''))), file)
#file = re.sub("('"+item+"')".format(item), str(binascii.unhexlify(item.replace('\\x', ''))), file)
print(file)```

You can use the following code fragment
import re, binascii
with open('hex.txt', 'r') as f:
file = f.read()
hexList = re.findall(r'((?:\\x[0-9a-f][0-9a-f])+)', file)
for item in hexList:
file = re.sub(r"('{}')".format(item.replace('\\', '\\\\')), str(binascii.unhexlify(item.replace('\\x', ''))), file)
print(file)
The regex used by you for finding the hex-strings is wrong because it is even finding the Status as the hex-string.

Related

python3.5.2 deleting all matching characters from a file

Given the following exemple how can i remove all "a" characters from a file that have the following content:
asdasdasd \n d1233sss \n aaa \n 123
I wrote the following solution but it does not work:
with open("testfisier","r+") as file:
for line in file:
for index in range(len(line)):
if line[index] is "a":line[index].replace("a","")
There weren't any changes because you didn't write it back to the file.
with open("testfisier", "r+") as file:
for line in file:
for index in range(len(line)):
if line[index] is "a":
replace_file = line[index].replace("a", "")
# Write the changes.
file.write(replace_file)
Or:
with open("testfisier", "r+") as f:
f.write(f.read().replace("a", ""))
Try using regexp substitution. For instance, assuming you have read in the string and named it a_string
import re
re.sub('a','',a_string,'')
This would be one of many possible solutions.
Hope this helps!
You can try this:
import re
data = open("testfisier").read()
final_data = re.sub('a+', '', data)
You can call replace on a long string. No need to call it on single chars. Also, replace does not change a string, but returns a new one:
with open("testfisier", "r+") as file:
text = file.read()
text = text.replace("a", "") # replace a's in the entire text
file.seek(0) # move file pointer back to start
file.write(text)

finding string in txt file and displaying nearest integer

I want my code to accept input that is a string (for example: "ABC") and then read through a txt file and find that string; once it finds the string, it should output the closest integer (for example: 456) to the string in the file. Is this possible?
So far, I've found code that can print "related lines," so lines that are 2 lines away from my string. This code is:
f = open("textfile.txt", "r")
searchlines = f.readlines()
f.close()
for i, line in enumerate(searchlines):
if "string" in line:
for l in searchlines[i:i+2]: print
print
This code outputs the two lines in front of and behind my string. But however, I need to print a specific integer, so I'm not sure how to proceed from there.
For my purposes, I need the "closest integer" to the right of the string.
You could read the file into one string with f.read() and then get the integer via a regular expression that captures the integer into a group:
import re
f = open("textfile.txt", "r")
content = f.read()
f.close()
find = 'hello' # the string to find
result = re.search(find + r'\D*(\d+)', content)
print(result.group(1))

python write umlauts into file

i have the following output, which i want to write into a file:
l = ["Bücher", "Hefte, "Mappen"]
i do it like:
f = codecs.open("testfile.txt", "a", stdout_encoding)
f.write(l)
f.close()
in my Textfile i want to see: ["Bücher", "Hefte, "Mappen"] instead of B\xc3\xbccher
Is there any way to do so without looping over the list and decode each item ? Like to give the write() function any parameter?
Many thanks
First, make sure you use unicode strings: add the "u" prefix to strings:
l = [u"Bücher", u"Hefte", u"Mappen"]
Then you can write or append to a file:
I recommend you to use the io module which is Python 2/3 compatible.
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
for line in l:
fd.write(line + "\n")
To read your text file in one piece:
with io.open("testfile.txt", mode="r", encoding="UTF8") as fd:
content = fd.read()
The result content is an Unicode string.
If you decode this string using UTF8 encoding, you'll get bytes string like this:
b"B\xc3\xbccher"
Edit using writelines.
The method writelines() writes a sequence of strings to the file. The sequence can be any iterable object producing strings, typically a list of strings. There is no return value.
# add new lines
lines = [line + "\n" for line in l]
with io.open("testfile.txt", mode="a", encoding="UTF8") as fd:
fd.writelines(lines)

How to search for a different string in a different file using Python 3.x

I am trying to search a large group of text files (160K) for a specific string that changes for each file. I have a text file that has every file in the directory with the string value I want to search. Basically I want to use python to create a new text file that gives the file name, the string, and a 1 if the string is present and a 0 if it is not.
The approach I am using so far is to create a dictionary from a text file. From there I am stuck. Here is what I figure in pseudo-code:
**assign dictionary**
d = {}
with open('file.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
**loop through directory**
for filename in os.listdir(os.getcwd()):
***here is where I get lost***
match file name to dictionary
look for string
write filename, string, 1 if found
write filename, string, 0 if not found
Thank you. It needs to be somewhat efficient since its a large amount of text to go through.
Here is what I ended up with
d = {}
with open('ibes.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
import os
for filename in os.listdir(os.getcwd()):
string = d.get(filename, "!##$%^&*")
if string in open(filename, 'r').read():
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 1))
else:
with open("ibes_in.txt", 'a') as out:
out.write("{} {} {}\n".format(filename, string, 0))
As I understand your question, the dictionary relates file names to strings
d = {
"file1.txt": "widget",
"file2.txt": "sprocket", #etc
}
If each file is not too large you can read each file into memory:
for filename in os.listdir(os.getcwd()):
string = d[filename]
if string in open(filename, 'r').read():
print(filename, string, "1")
else:
print(filename, string, "0")
This example uses print, but you could write to a file instead. Open the output file before the loop outfile = open("outfile.txt", 'w') and instead of printing use
outfile.write("{} {} {}\n".format(filename, string, 1))
On the other hand, if each file is too large to fit easily into memory, you could use a mmap as described in Search for string in txt file Python

Python: Decode base64 multiple strings in a file

I'm new to python and I have a file like this:
cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA==
It's an keybord input, coded with base64, and new I want to decode it
I try this by the code is stoping at first character decoded.
import base64
file = "my_file.txt"
fin = open(file, "rb")
binary_data = fin.read()
fin.close()
b64_data = base64.b64decode(binary_data)
b64_fname = "original_b64.txt"
fout = open(b64_fname, "w")
fout.write(b64_data)
fout.close
Any help is welcome. thanks
I assume that you created your test input string yourself.
If I split your test input string in blocks of 4 characters and decode each one apart, I get the following:
>>> import base64
>>> s = 'cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA=='
>>> ''.join(base64.b64decode(s[i:i+4]) for i in range(0, len(s), 4))
'sdadasdasdasdasdtest'
However, the correct base64 encoding of your test string sdadasdasdasdasdtest is:
>>> base64.b64encode('sdadasdasdasdasdtest')
'c2RhZGFzZGFzZGFzZGFzZHRlc3Q='
If you place this string in my_file.txt (and rewriting your code to be a bit more concise) then it all works.
import base64
with open("my_file.txt") as f, open("original_b64.txt", 'w') as g:
encoded = f.read()
decoded = base64.b64decode(encoded)
g.write(decoded)

Categories

Resources