I have a rather large text document and would like to replace all instances of hexadecimals inside with regular decimals. Or if possible convert them into text surrounded by '' e.g. 'I01A' instead of $49303141
The hexadecimals are currently marked by starting with $ but I can ctrl+F change that into 0x if that helps, and I need the program to detect the end of the number since some are short $A, while others are long like $568B1F
How could I do this with python, or is it not possible?
Thank you for the help thus far, hoping to clarify my request a bit more to hopefully get a complete solution.
I used a version of Grismar's answer and the output it gives me is
"if not (GetItemTypeId(GetSoldItem())==I0KB) then
set int1= 2+($3E8*3)"
However, I would like to add the ' around the newly created text and convert hex strings smaller then 8 to decimals instead so the output becomes
"if not (GetItemTypeId(GetSoldItem())=='I0KB') then
set int1= 2+(1000*3)"
Hoping for some more help tog et the rest of the way.
def hex2dec(s):
return int(s,16)
was my attempt to convert the shorter hexadecimals to decimal but clearly has not worked, throws syntax errors instead.
Also, I will manually deal with the few $ not used to denote a hexadecimal.
# just creating an example file
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'w') as f:
f.write('if not (GetItemTypeId(GetSoldItem())==$49304B42) then\n')
f.write('set int1= 2+($3E8*3)\n')
def hex_match_to_string(m):
return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])
def hex2dec(s):
return int(s,16)
# open the file for reading
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r') as file_in:
# open the same file again for reading and writing
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r+') as file_out:
# start writing at the start of the existing file, overwriting the contents
file_out.seek(0)
while True:
line = file_in.readline()
if line == '':
# end of file
break
# replace the parts of the string matching the regex
line = re.sub(r'\$((?:\w\w\w\w\w\w\w\w)+)', hex_match_to_string, line)
#line = re.sub(r'$\w+', hex2dec,line)
file_out.write(line)
# the resulting file is shorter, truncate it from the current position
file_out.truncate()
See the answer https://stackoverflow.com/a/12597709/1780027 for how to use re.sub to replace specific content of a string with the output of a function. Using this you could presumably use the "int("FFFF", 16) " code snippet you're talking about to perform the action you desire.
EG:
>>> def replace(match):
... match = match.group(1)
... return str(int(match, 16))
>>> sample = "here's a hex $49303141 and there's a nother 1034B and another $8FD0B"
>>> re.sub(r'\$([a-fA-F0-9]+)', replace, sample)
"here's a hex 1227895105 and there's a nother 41803 and another 589067"
Since you are replacing parts of the file with something that's shorter, you can write to the same file you're reading. But keep in mind that, if you were replacing those parts with something that was longer, you would need to write the result to a new file and replace the old file with the new file once you were done.
Also, from your description, it appears you are reading a text file, which makes reading the file line by line the easiest, but if your file was some sort of binary file, using re wouldn't be as convenient and you'd probably need a different solution.
Finally, your question doesn't mention whether $ might also appear elsewhere in the text file (not just in front of pairs of characters that should be read as hexadecimal numbers). This answer assumes $ only appears in front of strings of 2-character hexadecimal numbers.
Here's a solution:
import re
# just creating an example file
with open('test.txt', 'w') as f:
f.write('example line $49303141\n')
f.write('$49303141 example line, with more $49303141\n')
f.write('\n')
f.write('just some text\n')
def hex_match_to_string(m):
return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])
# open the file for reading
with open('test.txt', 'r') as file_in:
# open the same file again for reading and writing
with open('test.txt', 'r+') as file_out:
# start writing at the start of the existing file, overwriting the contents
file_out.seek(0)
while True:
line = file_in.readline()
if line == '':
# end of file
break
# replace the parts of the string matching the regex
line = re.sub(r'\$((?:\w\w)+)', hex_match_to_string, line)
file_out.write(line)
# the resulting file is shorter, truncate it from the current position
file_out.truncate()
The regex is simple r'\$((?:\w\w)+)', which matches any string starting with an actual $ (the backslash avoids it being interpreted as 'the beginning of the string') and followed by 1 or more (+) pairs of letters and numbers (\w\w).
The function hex_match_to_string(m) expects a regex match object and loops over pairs of characters in the first matched group. Each pair is turned into its decimal value by interpreting it as a hexadecimal string (int(pair, 16)) and that decimal value is then turned into a character with that ASCII value (chr(value)). All the resulting characters are joined into a single string (''.join(list)).
A different way or writing hex_match_to_string(m):
def hex_match_to_string(m):
hex_nums = iter(m.group(1))
return ''.join([chr(int(a, 16) * 16 + int(b, 16)) for a, b in zip(hex_nums, hex_nums)])
This may perform a bit better, since it avoids manipulating strings, but it does the same thing.
Related
Imagine you have a textfile input.txt containing text and floats, but without a regular structure (such as header, .csv etc.), for instance :
Banana 1.4030391
(4.245, -345.2456)
4.245 -345.2456
Hello how are you?
Based on this file, you want to generate output.txt where each float has been rounded to 1 decimal, the remaining content left untouched. This would give
Banana 1.4
(4.2, -345.2)
4.2 -345.2
Hello how are you?
To achieve this in Python, you need following steps.
Open the inputfile and read each line
f = open('input.txt')
f.readlines()
Extract the floats
How to proceed? The difficulty lies in the fact that there is no regular structure in the file.
Round the floats
np.round(myfloat)
Write the line to the output file
...
Check this out. Use regular expression to match floating point numbers, then replace them.
import re
f = open('input.txt')
tempstring=f.readlines()
string = ""
string = string.join(tempstring)
def check_string(string):
temp = re.findall(r"\d+\.\d+",string)
for i in temp:
string=string.replace(i,str(round(float(i),1)))
return string
output=check_string(string)
file2=open("output.txt","a+")
file2.write(output)
Since it seems like you need ideas how to extracts floats from the text file, I can only contribute an idea.
I think it is simpler to create an empty list and add each words and numbers to it.
You can strip each items in the text file by stripping it where there is a space and newline. Then you can check if those items in list are floats, by using for loop.
Functions you can use are ".append", ".rstrip", "isinstance()"
Below code DOESN'T extract float numbers but you can work on it to strip each items in text file.
mylines = [] # Declare an empty list.
with open ('text.txt', 'rt') as myfile: # Open txt for reading text.
for myline in myfile: # For each line in the file,
mylines.append(myline.rstrip('\n' and ' ')) # strip newline and add to list.
for element in mylines:
print(element)
for item in element:
print(item)
I am trying to write a program that solves simple math equations from a txt file and puts the answers in a different txt file. For example:
qustions.txt:
1+2=
4+7=
10*2=
10/2=
And then the answers will be in a different txt file
answers.txt:
1+2=3
4+7=11
10*2=20
10/2=5
So there are simple math equations in a text file and the answers in a different one. The math equations are only number - operator - number
You can use eval to evaluate everything to the left of the equal sign.
with open('questions.txt') as fp:
qs = fp.readlines()
answers = [eval(q.split('=')[0]) for q in qs if q.strip()]
with open('answers.txt', 'w') as fp:
for q, a in zip(qs, answers):
fp.write(q.strip() + str(a) + '\n')
answers is a list of the evaluated expressions on the left side of the equal sign. eval takes whatever string is given to it and tries to run it as a command in Python. q.split('=')[0] splits each question into two parts: everything to the left of the equal sign (part 0) and everything to the right (part 1). We are grabbing only the first part to evaluate. The rest of the line is iterating over the questions in your file and checking to make sure the line is not just a extra blank line.
Using zip matches up each question q to the corresponding answer a, so the for loop yield both the first q and a, then second q and a, etc. fp is a file object that we opened for writing. fp.write tells python to write to disk the string argument. I am using q.strip() to remove the newline characters, appending the answer as a string, and then adding a newline character to the end.
That's how I ended up doing it:
with open('questions.txt') as fp:
qs = fp.readlines() # reading the qustion file
with open('answers.txt', 'w') as fp:# writing the text file by the name fp
for q in qs:
Deleteequal = q.split('=')
a = eval(Deleteequal[0]) # always going to be line 0 because I am reading a line by line
f = q + str(a)
f = f.replace("\n", "") # for some reason it printed the numbers really weird if I've just did f.write(q+str(a)+'\n') the result would be 1 line down for some reason
fp.write(f)
fp.write('\n')
# str(a) to write the final answer
I need a program to find a string (S) in a file (P), and return the number of thimes it appears in the file, to do this i decided tocreate a function:
def file_reading(P, S):
file1= open(P, 'r')
pattern = S
match1 = "re.findall(pattern, P)"
if match1 != None:
print (pattern)
I know it doesn't look very good, but for some reason it's not outputing anything, let alone the right answer.
There are multiple problems with your code.
First of all, calling open() returns a file object. It does not read the contents of the file. For that you need to use read() or iterate through the file object.
Secondly, if your goal is to count the number of matches of a string, you don't need regular expressions. You can use the string function count(). Even still, it doesn't make sense to put the regular expression call in quotes.
match1 = "re.findall(pattern, file1.read())"
Assigns the string "re.findall(pattern, file1.read())" to the variable match1.
Here is a version that should work for you:
def file_reading(file_name, search_string):
# this will put the contents of the file into a string
file1 = open(file_name, 'r')
file_contents = file1.read()
file1.close() # close the file
# return the number of times the string was found
return file_contents.count(search_string)
You can read line by line instead of reading the entire file and find the nunber of time the pattern is repeated and add it to the total count c
def file_reading(file_name, pattern):
c = 0
with open(file_name, 'r') as f:
for line in f:
c + = line.count(pattern)
if c: print c
There are a few errors; let's go through them one by one:
Anything in quotes is a string. Putting "re.findall(pattern, file1.read())" in quotes just makes a string. If you actually want to call the re.findall function, no quotes are needed :)
You check whether match1 is None or not, which is really great, but then you should return that matches, not the initial pattern.
The if-statement should not be indented.
Also:
Always a close a file once you have opened it! Since most people forget to do this, it is better to use the with open(filename, action) syntax.
So, taken together, it would look like this (I've changed some variable names for clarity):
def file_reading(input_file, pattern):
with open(input_file, 'r') as text_file:
data = text_file.read()
matches = re.findall(pattern, data)
if matches:
print(matches) # prints a list of all strings found
I have a file like this below.
0 0 0
0.00254 0.00047 0.00089
0.54230 0.87300 0.74500
0 0 0
I want to modify this file. If a value is less than 0.05, then a value is to be 1. Otherwise, a value is to be 0.
After python script runs, the file should be like
1 1 1
1 1 1
0 0 0
1 1 1
Would you please help me?
OK, since you're new to StackOverflow (welcome!) I'll walk you through this. I'm assuming your file is called test.txt.
with open("test.txt") as infile, open("new.txt", "w") as outfile:
opens the files we need, our input file and a new output file. The with statement ensures that the files will be closed after the block is exited.
for line in infile:
loops through the file line by line.
values = [float(value) for value in line.split()]
Now this is more complicated. Every line contains space-separated values. These can be split into a list of strings using line.split(). But they are still strings, so they must be converted to floats first. All this is done with a list comprehension. The result is that, for example, after the second line has been processed this way, values is now the following list: [0.00254, 0.00047, 0.00089].
results = ["1" if value < 0.05 else "0" for value in values]
Now we're creating a new list called results. Each element corresponds to an element of values, and it's going to be a "1" if that value < 0.05, or a "0" if it isn't.
outfile.write(" ".join(results))
converts the list of "integer strings" back to a string, separated by 7 spaces each.
outfile.write("\n")
adds a newline. Done.
The two list comprehensions could be combined into one, if you don't mind the extra complexity:
results = ["1" if float(value) < 0.05 else "0" for value in line.split()]
if you can use libraries I'd suggest numpy :
import numpy as np
myarray = np.genfromtxt("my_path_to_text_file.txt")
my_shape = myarray.shape()
out_array = np.where(my_array < 0.05, 1, 0)
np.savetxt(out_array)
You can add formating as arguments to the savetxt function. The docstrings of the function are pretty self explanatory.
If you are stuck with pure python :
with open("my_path_to_text_file") as my_file:
list_of_lines = my_file.readlines()
list_of_lines = [[int( float(x) < 0.05) for x in line.split()] for line in list_of_lines]
then write that list to file as you see fit.
You can use this code
f_in=open("file_in.txt", "r") #opens a file in the reading mode
in_lines=f_in.readlines() #reads it line by line
out=[]
for line in in_lines:
list_values=line.split() #separate elements by the spaces, returning a list with the numbers as strings
for i in range(len(list_values)):
list_values[i]=eval(list_values[i]) #converts them to floats
# print list_values[i],
if list_values[i]<0.05: #your condition
# print ">>", 1
list_values[i]=1
else:
# print ">>", 0
list_values[i]=0
out.append(list_values) #stores the numbers in a list, where each list corresponds to a lines' content
f_in.close() #closes the file
f_out=open("file_out.txt", "w") #opens a new file in the writing mode
for cur_list in out:
for i in cur_list:
f_out.write(str(i)+"\t") #writes each number, plus a tab
f_out.write("\n") #writes a newline
f_out.close() #closes the file
The following code performs the replacements in-place: for that , the file is opened in 'rb+' mode. It's absolutely mandatory to open it in binary mode b. The + in 'rb+' means that it's possible to write and to read in the file. Note that the mode can be written 'r+b' also.
But using 'rb+' is awkward:
if you read with for line in f , the file is read by chunks and several lines are kept in the buffer where they are really read one after the other, until another chunk of data is read and loaded in the buffer. That makes it harder to perform transformations, because one must follow the position of the file's pointer with the help of tell() and to move the pointer with seek() and in fact I've not completly understood how it must done.
.
Happily, there's a solution with replace(), because , I don't know why, but I believe the facts, when readline() reads a line, the file 's pointer doesn't go further on disk than the end of the line (that is to say it stops at the newline).
Now it's easy to move and know positions of the file's pointer
to make writing after reading, it's necessary to make seek() being executed , even if it should be to do seek(0,1), meaning a move of 0 caracters from the actual position. That must change the state of the file's pointer, something like that.
Well, for your problem, the code is as follows:
import re
from os import fsync
from os.path import getsize
reg = re.compile('[\d.]+')
def ripl(m):
g = m.group()
return ('1' if float(g)<0.5 else '0').ljust(len(g))
path = ...........'
print 'length of file before : %d' % getsize(path)
with open('Copie de tixti.txt','rb+') as f:
line = 'go'
while line:
line = f.readline()
lg = len(line)
f.seek(-lg,1)
f.write(reg.sub(ripl,line))
f.flush()
fsync(f.fileno())
print 'length of file after : %d' % getsize(path)
flush() and fsync() must be executed to ensure that the instruction f.write(reg.sub(ripl,line)) effectively writes at the moment it is ordred to.
Note that I've never managed a file encoded in unicode like. It's certainly still more dificult since every unicode character is encoded on several bytes (and in the case of UTF8 , variable number of bytes depending on the character)
I have a text file in this format:
abc? cdfde" nhj.cde' dfwe-df$sde.....
How can i ignore all the special characters, blanks, numbers, end of the lines, etc and write only the characters in another file?For example, the above file becomes
abccdfdenhjcdedfwedfsde.....
And from this output file,
Should able to read single character by character till the end of file.
Should be able to read two characters at a time, like ab,bc,cc,cd,df,... from above file
Should be able to read three characters at a time, like abc,bcc,ccd,cdf,... from the above file
First of all, how can i read only characters and write to external file?
I can read single character by character by using f.read(1) till end of file.How can i apply this to read 2,3 chars at a time, that too skipping only one character(that is, if i have abcd, i should read ab,bc,cd but not ab,cd(this, i think can be done by f.read(2))). Thanks. I am doing this for cryptanalysis work to analyze ciphertexts by frequency.
If you need to peek ahead (read a few extra characters at a time), you need a buffered file object. The following class does just that:
import io
class AlphaPeekReader(io.BufferedReader):
def readalpha(self, count):
"Read one character, and peek ahead (count - 1) *extra* characters"
val = [self.read1(1)]
# Find first alpha character
while not val[0].isalpha():
if val == ['']:
return '' # EOF
val = [self.read1(1)]
require = count - len(val)
peek = self.peek(require * 3) # Account for a lot of garbage
if peek == '': # EOF
return val[0]
for c in peek:
if c.isalpha():
require -= 1
val.append(c)
if not require:
break
# There is a chance here that there were not 'require' alpha chars in peek
# Return anyway.
return ''.join(val)
This attempts to find extra characters beyond the one character you are reading, but doesn't make a guarantee it'll be able to satisfy your requirements. It could read fewer if we are at the end of the file or if there is a lot of non-alphabetic text in the next block.
Usage:
with AlphaPeekReader(io.open(filename, 'rb')) as alphafile:
alphafile.readalpha(3)
Demo, using a file with your example input:
>>> f = io.open('/tmp/test.txt', 'rb')
>>> alphafile = AlphaPeekReader(f)
>>> alphafile.readalpha(3)
'abc'
>>> alphafile.readalpha(3)
'bcc'
>>> alphafile.readalpha(3)
'ccd'
>>> alphafile.readalpha(10)
'cdfdenhjcd'
>>> alphafile.readalpha(10)
'dfdenhjcde'
To use the readalpha() calls in a loop, where you get each and every character separately plus the two next 2 bytes, use the iter() with a sentinel:
for alpha_with_extra in iter(lambda: alphafile.readalpha(3), ''):
# Do something with alpha_with_extra
To read a line from a file:
import fileinput
text_file = open("Output.txt", "w")
for line in fileinput.input("sample.txt"):
outstring = ''.join(ch for ch in line if ch.isalpha())
text_file.write("%s"%outstring)
text_file.close()