Imagine you have a textfile input.txt containing text and floats, but without a regular structure (such as header, .csv etc.), for instance :
Banana 1.4030391
(4.245, -345.2456)
4.245 -345.2456
Hello how are you?
Based on this file, you want to generate output.txt where each float has been rounded to 1 decimal, the remaining content left untouched. This would give
Banana 1.4
(4.2, -345.2)
4.2 -345.2
Hello how are you?
To achieve this in Python, you need following steps.
Open the inputfile and read each line
f = open('input.txt')
f.readlines()
Extract the floats
How to proceed? The difficulty lies in the fact that there is no regular structure in the file.
Round the floats
np.round(myfloat)
Write the line to the output file
...
Check this out. Use regular expression to match floating point numbers, then replace them.
import re
f = open('input.txt')
tempstring=f.readlines()
string = ""
string = string.join(tempstring)
def check_string(string):
temp = re.findall(r"\d+\.\d+",string)
for i in temp:
string=string.replace(i,str(round(float(i),1)))
return string
output=check_string(string)
file2=open("output.txt","a+")
file2.write(output)
Since it seems like you need ideas how to extracts floats from the text file, I can only contribute an idea.
I think it is simpler to create an empty list and add each words and numbers to it.
You can strip each items in the text file by stripping it where there is a space and newline. Then you can check if those items in list are floats, by using for loop.
Functions you can use are ".append", ".rstrip", "isinstance()"
Below code DOESN'T extract float numbers but you can work on it to strip each items in text file.
mylines = [] # Declare an empty list.
with open ('text.txt', 'rt') as myfile: # Open txt for reading text.
for myline in myfile: # For each line in the file,
mylines.append(myline.rstrip('\n' and ' ')) # strip newline and add to list.
for element in mylines:
print(element)
for item in element:
print(item)
Related
I am a new to the os module
I've been stuck for hours on this question and I really would appreciate any type of help:)
I have a file that contains 6 lines where each line has 6 numbers separated by a comma.
What I want to do is to get all of these numbers into a list so that I later can convert them from str to int. My problem is that I can't get rid of "\n".
Here is my code
Thank you
def read_integers(path):
content_lst=[]
with open(path,'r') as file:
content=file.read()
for i in content.split('\n'):
content_lst+=i.split(', ')
return content_lst
Your file actually has multiple lines (in addition to the comma separated numbers). If you want all the numbers in a single "flat" list you'll need to process both line and comma separators. One way to do it simply is to replace all the end of line characters with commas so that you can use split(',') on a single string:
with open(path,'r') as file:
content_lst = file.read().replace("\n",", ").split(", ")
You could also do it the other way around and change the commas to end of lines. Converting the values to integers can be done using the map function:
with open(path,'r') as file:
content_lst = file.read().replace(",","\n").split()
content_lst = list(map(int,content_lst))
This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 5 years ago.
I have a .txt file with values in it.
The values are listed like so:
Value1
Value2
Value3
Value4
My goal is to put the values in a list. When I do so, the list looks like this:
['Value1\n', 'Value2\n', ...]
The \n is not needed.
Here is my code:
t = open('filename.txt')
contents = t.readlines()
This should do what you want (file contents in a list, by line, without \n)
with open(filename) as f:
mylist = f.read().splitlines()
I'd do this:
alist = [line.rstrip() for line in open('filename.txt')]
or:
with open('filename.txt') as f:
alist = [line.rstrip() for line in f]
You can use .rstrip('\n') to only remove newlines from the end of the string:
for i in contents:
alist.append(i.rstrip('\n'))
This leaves all other whitespace intact. If you don't care about whitespace at the start and end of your lines, then the big heavy hammer is called .strip().
However, since you are reading from a file and are pulling everything into memory anyway, better to use the str.splitlines() method; this splits one string on line separators and returns a list of lines without those separators; use this on the file.read() result and don't use file.readlines() at all:
alist = t.read().splitlines()
After opening the file, list comprehension can do this in one line:
fh=open('filename')
newlist = [line.rstrip() for line in fh.readlines()]
fh.close()
Just remember to close your file afterwards.
I used the strip function to get rid of newline character as split lines was throwing memory errors on 4 gb File.
Sample Code:
with open('C:\\aapl.csv','r') as apple:
for apps in apple.readlines():
print(apps.strip())
for each string in your list, use .strip() which removes whitespace from the beginning or end of the string:
for i in contents:
alist.append(i.strip())
But depending on your use case, you might be better off using something like numpy.loadtxt or even numpy.genfromtxt if you need a nice array of the data you're reading from the file.
from string import rstrip
with open('bvc.txt') as f:
alist = map(rstrip, f)
Nota Bene: rstrip() removes the whitespaces, that is to say : \f , \n , \r , \t , \v , \x and blank ,
but I suppose you're only interested to keep the significant characters in the lines. Then, mere map(strip, f) will fit better, removing the heading whitespaces too.
If you really want to eliminate only the NL \n and RF \r symbols, do:
with open('bvc.txt') as f:
alist = f.read().splitlines()
splitlines() without argument passed doesn't keep the NL and RF symbols (Windows records the files with NLRF at the end of lines, at least on my machine) but keeps the other whitespaces, notably the blanks and tabs.
.
with open('bvc.txt') as f:
alist = f.read().splitlines(True)
has the same effect as
with open('bvc.txt') as f:
alist = f.readlines()
that is to say the NL and RF are kept
I had the same problem and i found the following solution to be very efficient. I hope that it will help you or everyone else who wants to do the same thing.
First of all, i would start with a "with" statement as it ensures the proper open/close of the file.
It should look something like this:
with open("filename.txt", "r+") as f:
contents = [x.strip() for x in f.readlines()]
If you want to convert those strings (every item in the contents list is a string) in integer or float you can do the following:
contents = [float(contents[i]) for i in range(len(contents))]
Use int instead of float if you want to convert to integer.
It's my first answer in SO, so sorry if it's not in the proper formatting.
I recently used this to read all the lines from a file:
alist = open('maze.txt').read().split()
or you can use this for that little bit of extra added safety:
with f as open('maze.txt'):
alist = f.read().split()
It doesn't work with whitespace in-between text in a single line, but it looks like your example file might not have whitespace splitting the values. It is a simple solution and it returns an accurate list of values, and does not add an empty string: '' for every empty line, such as a newline at the end of the file.
with open('D:\\file.txt', 'r') as f1:
lines = f1.readlines()
lines = [s[:-1] for s in lines]
The easiest way to do this is to write file.readline()[0:-1]
This will read everything except the last character, which is the newline.
I have a list of sample codes which I input into a website to get information about each of them (they are codes for stars, but it doesn't matter what the codes are, they are just a long string of numbers). All these numbers are in one column, one number per row. The website I need to input this file into accepts the numbers to still be in a column, but with a comma next to the numbers. This is an example:
Instead of:
164891738509173
184818483848283
18483943491u385
It's supposed to look like this:
164891738509173,
184818483848283,
18483943491u385,
I wanted to program a quick python code to do that automatically for each number in the entire column. How do I do that? I can manage theoretically to do that manually if the number of stars I'm dealing with is little, but unfortunately in the website, I need to input something like 60000 stars (so 60000 of these numbers) so doing it manually is suicide.
Very simple:
open('output.txt', 'w').writelines( # open 'output.txt' for writing and write multiple lines
line.rstrip('\n') + ',\n' # append comma to each line
for line in open('input.txt') # read lines with numbers from 'input.txt'
)
You could do it more idiomatically and use a with block, but that's probably overkill for such a small task:
with open('input.txt') as In, open('output.txt', 'w') as Out:
for line in In:
Out.write(line.rstrip('\n') + ',\n')
Is this what you want?
If you want to add comma at end the every entry during printing, you can do this:
>>> codes = ['164891738509173', '184818483848283', '18483943491u385']
>>> for code in codes:
... print(code, end=',\n')
...
164891738509173,
184818483848283,
18483943491u385,
To add a comma to every item within the list,
>>> end_comma = [f"{code}," for code in codes]
>>> end_comma
['164891738509173,', '184818483848283,', '18483943491u385,']
I have a rather large text document and would like to replace all instances of hexadecimals inside with regular decimals. Or if possible convert them into text surrounded by '' e.g. 'I01A' instead of $49303141
The hexadecimals are currently marked by starting with $ but I can ctrl+F change that into 0x if that helps, and I need the program to detect the end of the number since some are short $A, while others are long like $568B1F
How could I do this with python, or is it not possible?
Thank you for the help thus far, hoping to clarify my request a bit more to hopefully get a complete solution.
I used a version of Grismar's answer and the output it gives me is
"if not (GetItemTypeId(GetSoldItem())==I0KB) then
set int1= 2+($3E8*3)"
However, I would like to add the ' around the newly created text and convert hex strings smaller then 8 to decimals instead so the output becomes
"if not (GetItemTypeId(GetSoldItem())=='I0KB') then
set int1= 2+(1000*3)"
Hoping for some more help tog et the rest of the way.
def hex2dec(s):
return int(s,16)
was my attempt to convert the shorter hexadecimals to decimal but clearly has not worked, throws syntax errors instead.
Also, I will manually deal with the few $ not used to denote a hexadecimal.
# just creating an example file
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'w') as f:
f.write('if not (GetItemTypeId(GetSoldItem())==$49304B42) then\n')
f.write('set int1= 2+($3E8*3)\n')
def hex_match_to_string(m):
return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])
def hex2dec(s):
return int(s,16)
# open the file for reading
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r') as file_in:
# open the same file again for reading and writing
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r+') as file_out:
# start writing at the start of the existing file, overwriting the contents
file_out.seek(0)
while True:
line = file_in.readline()
if line == '':
# end of file
break
# replace the parts of the string matching the regex
line = re.sub(r'\$((?:\w\w\w\w\w\w\w\w)+)', hex_match_to_string, line)
#line = re.sub(r'$\w+', hex2dec,line)
file_out.write(line)
# the resulting file is shorter, truncate it from the current position
file_out.truncate()
See the answer https://stackoverflow.com/a/12597709/1780027 for how to use re.sub to replace specific content of a string with the output of a function. Using this you could presumably use the "int("FFFF", 16) " code snippet you're talking about to perform the action you desire.
EG:
>>> def replace(match):
... match = match.group(1)
... return str(int(match, 16))
>>> sample = "here's a hex $49303141 and there's a nother 1034B and another $8FD0B"
>>> re.sub(r'\$([a-fA-F0-9]+)', replace, sample)
"here's a hex 1227895105 and there's a nother 41803 and another 589067"
Since you are replacing parts of the file with something that's shorter, you can write to the same file you're reading. But keep in mind that, if you were replacing those parts with something that was longer, you would need to write the result to a new file and replace the old file with the new file once you were done.
Also, from your description, it appears you are reading a text file, which makes reading the file line by line the easiest, but if your file was some sort of binary file, using re wouldn't be as convenient and you'd probably need a different solution.
Finally, your question doesn't mention whether $ might also appear elsewhere in the text file (not just in front of pairs of characters that should be read as hexadecimal numbers). This answer assumes $ only appears in front of strings of 2-character hexadecimal numbers.
Here's a solution:
import re
# just creating an example file
with open('test.txt', 'w') as f:
f.write('example line $49303141\n')
f.write('$49303141 example line, with more $49303141\n')
f.write('\n')
f.write('just some text\n')
def hex_match_to_string(m):
return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])
# open the file for reading
with open('test.txt', 'r') as file_in:
# open the same file again for reading and writing
with open('test.txt', 'r+') as file_out:
# start writing at the start of the existing file, overwriting the contents
file_out.seek(0)
while True:
line = file_in.readline()
if line == '':
# end of file
break
# replace the parts of the string matching the regex
line = re.sub(r'\$((?:\w\w)+)', hex_match_to_string, line)
file_out.write(line)
# the resulting file is shorter, truncate it from the current position
file_out.truncate()
The regex is simple r'\$((?:\w\w)+)', which matches any string starting with an actual $ (the backslash avoids it being interpreted as 'the beginning of the string') and followed by 1 or more (+) pairs of letters and numbers (\w\w).
The function hex_match_to_string(m) expects a regex match object and loops over pairs of characters in the first matched group. Each pair is turned into its decimal value by interpreting it as a hexadecimal string (int(pair, 16)) and that decimal value is then turned into a character with that ASCII value (chr(value)). All the resulting characters are joined into a single string (''.join(list)).
A different way or writing hex_match_to_string(m):
def hex_match_to_string(m):
hex_nums = iter(m.group(1))
return ''.join([chr(int(a, 16) * 16 + int(b, 16)) for a, b in zip(hex_nums, hex_nums)])
This may perform a bit better, since it avoids manipulating strings, but it does the same thing.
This question already has answers here:
How to read a file without newlines?
(12 answers)
Closed 5 years ago.
I have a .txt file with values in it.
The values are listed like so:
Value1
Value2
Value3
Value4
My goal is to put the values in a list. When I do so, the list looks like this:
['Value1\n', 'Value2\n', ...]
The \n is not needed.
Here is my code:
t = open('filename.txt')
contents = t.readlines()
This should do what you want (file contents in a list, by line, without \n)
with open(filename) as f:
mylist = f.read().splitlines()
I'd do this:
alist = [line.rstrip() for line in open('filename.txt')]
or:
with open('filename.txt') as f:
alist = [line.rstrip() for line in f]
You can use .rstrip('\n') to only remove newlines from the end of the string:
for i in contents:
alist.append(i.rstrip('\n'))
This leaves all other whitespace intact. If you don't care about whitespace at the start and end of your lines, then the big heavy hammer is called .strip().
However, since you are reading from a file and are pulling everything into memory anyway, better to use the str.splitlines() method; this splits one string on line separators and returns a list of lines without those separators; use this on the file.read() result and don't use file.readlines() at all:
alist = t.read().splitlines()
After opening the file, list comprehension can do this in one line:
fh=open('filename')
newlist = [line.rstrip() for line in fh.readlines()]
fh.close()
Just remember to close your file afterwards.
I used the strip function to get rid of newline character as split lines was throwing memory errors on 4 gb File.
Sample Code:
with open('C:\\aapl.csv','r') as apple:
for apps in apple.readlines():
print(apps.strip())
for each string in your list, use .strip() which removes whitespace from the beginning or end of the string:
for i in contents:
alist.append(i.strip())
But depending on your use case, you might be better off using something like numpy.loadtxt or even numpy.genfromtxt if you need a nice array of the data you're reading from the file.
from string import rstrip
with open('bvc.txt') as f:
alist = map(rstrip, f)
Nota Bene: rstrip() removes the whitespaces, that is to say : \f , \n , \r , \t , \v , \x and blank ,
but I suppose you're only interested to keep the significant characters in the lines. Then, mere map(strip, f) will fit better, removing the heading whitespaces too.
If you really want to eliminate only the NL \n and RF \r symbols, do:
with open('bvc.txt') as f:
alist = f.read().splitlines()
splitlines() without argument passed doesn't keep the NL and RF symbols (Windows records the files with NLRF at the end of lines, at least on my machine) but keeps the other whitespaces, notably the blanks and tabs.
.
with open('bvc.txt') as f:
alist = f.read().splitlines(True)
has the same effect as
with open('bvc.txt') as f:
alist = f.readlines()
that is to say the NL and RF are kept
I had the same problem and i found the following solution to be very efficient. I hope that it will help you or everyone else who wants to do the same thing.
First of all, i would start with a "with" statement as it ensures the proper open/close of the file.
It should look something like this:
with open("filename.txt", "r+") as f:
contents = [x.strip() for x in f.readlines()]
If you want to convert those strings (every item in the contents list is a string) in integer or float you can do the following:
contents = [float(contents[i]) for i in range(len(contents))]
Use int instead of float if you want to convert to integer.
It's my first answer in SO, so sorry if it's not in the proper formatting.
I recently used this to read all the lines from a file:
alist = open('maze.txt').read().split()
or you can use this for that little bit of extra added safety:
with f as open('maze.txt'):
alist = f.read().split()
It doesn't work with whitespace in-between text in a single line, but it looks like your example file might not have whitespace splitting the values. It is a simple solution and it returns an accurate list of values, and does not add an empty string: '' for every empty line, such as a newline at the end of the file.
with open('D:\\file.txt', 'r') as f1:
lines = f1.readlines()
lines = [s[:-1] for s in lines]
The easiest way to do this is to write file.readline()[0:-1]
This will read everything except the last character, which is the newline.