Replacing strings in a .data file - python

I have a seemingly simple problem. I have a dataset: archive.ics.uci.edu/ml/machine-learning-databases/acute/diagnosis.data
and I want to replace the "no"s to "0"s and "yes" to "1"s
I have tried this code:
fString = open("diagnosis.data","r")
fBool = open("diagnosis1.txt","w")
for line in fString:
line.replace("no","0")
line.replace("yes","1")
fBool.write(line)
fString.close()
fBool.close()
The only thing that happened is that the last yes/no gets an ਍ഀ added. I dont know why it's not working.

Since replace returns the modified string you need to assign it. The original is left untouched. I guess you need:
with open("diagnosis.data", "r") as fString, open("diagnosis1.txt", "w") as fBool:
for line in fString:
nline = line.replace("no", "0")
nline = nline.replace("yes", "1")
fBool.write(nline)

Your issue may be that open() doesn't return a list of strings (or some other iterable type), and therefore you can't do for line in fString:, because that will not yield strings which you can then .replace().
Instead you need to do something like:
fString = open("diagnosis.data","r")
lines = fString.read().split('\n')
fBool = open("diagnosis1.txt","w")
for line in lines:
newLine = line.replace("no","0")
newLine = newLine.replace("yes","1")
fBool.write(newLine)
fString.close()
fBool.close()
This approach gets a list of strings, each of which is a line of a file, and the iterates through that. You need to make sure you use the .replace() method correctly as well, because it returns the new string, but doesn't modify the original string.

The .replace string method returns the string with the replaced parameters but it doesnt change the object so:
>>> k="lolo"
>>> k.replace('l','k')
'koko'
>>> k
'lolo'
>>> k=k.replace('l','k')
>>> k
'koko'
What you want is:
line=line.replace("no","0")
or with an auxiliar variable:
aux=line.replace("no","0").replace("yes","1")

Related

How to split string with multiple delimiters in Python?

My First String
xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv
But I want to result like this below
bonding_err_bond0-if_eth2
I try some code but seems not work correctly
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
x = csv.rsplit('.', 4)[2]
print(x)
But Result that I get is com-bonding_err_bond0-if_eth2-d But my purpose is bonding_err_bond0-if_eth2
If you are allowed to use the solution apart from regex,
You can break the solution into a smaller part to understand better and learn about join if you are not aware of it. It will come in handy.
solution= '-'.join(csv.split('.', 4)[2].split('-')[1:3])
Thanks,
Shashank
Probably you got the answer, but if you want a generic method for any string data you can do this:
In this way you wont be restricted to one string and you can loop the data as well.
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
first_index = csv.find("-")
second_index = csv.find("-d")
result = csv[first_index+1:second_index]
print(result)
# OUTPUT:
# bonding_err_bond0-if_eth2
You can just separate the string with -, remove the beginning and end, and then join them back into a string.
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
x = '-'.join(csv.split('-')[1:-1])
Output
>>> csv
>>> bonding_err_bond0-if_eth2

Get a value from a string in python

Program Details:
I am writing a program for python that will need to look through a text file for the line:
Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.
Problem:
Then after the program has found that line, it will then store the line into an array and get the value 19.612545, from f = 19.612545.
Question:
I so far have been able to store the line into an array after I have found it. However I am having trouble as to what to use after I have stored the string to search through the string, and then extract the information from variable f. Does anyone have any suggestions or tips on how to possibly accomplish this?
Depending upon how you want to go at it, CosmicComputer is right to refer you to Regular Expressions. If your syntax is this simple, you could always do something like:
line = 'Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.'
splitByComma=line.split(',')
fValue = splitByComma[1].replace('f= ', '').strip()
print(fValue)
Results in 19.612545 being printed (still a string though).
Split your line by commas, grab the 2nd chunk, and break out the f value. Error checking and conversions left up to you!
Using regular expressions here is maddness. Just use string.find as follows: (where string is the name of the variable the holds your string)
index = string.find('f=')
index = index + 2 //skip over = and space
string = string[index:] //cuts things that you don't need
string = string.split(',') //splits the remaining string delimited by comma
your_value = string[0] //extracts the first field
I know its ugly, but its nothing compared with RE.

Python program treating dictionary like a string

I set up a dictionary, and filled it from a file, like so:
filedusers = {} # cheap way to keep track of users, not for production
FILE = open(r"G:\School\CS442\users.txt", "r")
filedusers = ast.literal_eval("\"{" + FILE.readline().strip() + "}\"")
FILE.close()
then later I did a test on it, like this:
if not filedusers.get(words[0]):
where words[0] is a string for a username, but I get the following error:
'str' object has no attribute 'get'
but I verified already that after the FILE.close() I had a dictionary, and it had the correct values in it.
Any idea what's going on?
literal_eval takes a string, and converts it into a python object. So, the following is true...
ast.literal_eval('{"a" : 1}')
>> {'a' : 1}
However, you are adding in some quotations that aren't needed. If your file simply contained an empty dictionary ({}), then the string you create would look like this...
ast.literal_eval('"{}"') # The quotes that are here make it return the string "{}"
>> '{}'
So, the solution would be to change the line to...
ast.literal_eval("{" + FILE.readline().strip() + "}")
...or...
ast.literal_eval(FILE.readline().strip())
..depending on your file layout. Otherwise, literal_eval sees your string as an ACTUAL string because of the quotes.
>>> import ast
>>> username = "asd: '123'"
>>> filedusers = ast.literal_eval("\"{" + username + "}\"")
>>> print filedusers, type(filedusers)
{asd} <type 'str'>
You don't have a dictionary, it just looks like one. You have a string.
Python is dynamically typed: it does not require you to define variables as a specific type. And it lets you define variables implicitly. What you are doing is defining filedusers as a dictionary, and then redefining it as a string by assigning the result of ast.literal_eval to it.
EDIT: You need to remove those quotes. ast.literal_eval('"{}"') evaluates to a string. ast.literal_eval('{}') evaluates to a dictionary.

Explicit line joining in Python

I am reading a file, line-by-line and doing some text processing in order to get output in a certain format
My string processing code goes as follows:
file1=open('/myfolder/testfile.txt')
scanlines=file1.readlines()
string = ''
for line in scanlines:
if line.startswith('>from'):
continue
if line.startswith('*'):
continue
string.join(line.rstrip('\n'))
The output of this code is as follows:
abc
def
ghi
Is there a way to join these physical lines into one logical line, e.g:
abcdefghi
Basically, how can I concatenate multiple strings into one large string?
If I was reading from a file with very long strings is there the risk of an overflow by concatenating multiple physical lines into one logical line?
there are several ways to do this. for example just using + should do the trick.
"abc" + "def" # produces "abcdef"
If you try to concatenate multiple strings you can do this with the join method:
', '.join(('abc', 'def', 'ghi')) # produces 'abc, def, ghi'
If you want no delimiter, use the empty string ''.join() method.
Cleaning things up a bit, it would be easiest to append to array and then return the result
def joinfile(filename) :
sarray = []
with open(filename) as fd :
for line in fd :
if line.startswith('>from') or line.startswith('*'):
continue
sarray.append(line.rstrip('\n'))
return ''.join(sarray)
If you wanted to get really cute you could also do the following:
fd = open(filename)
str = ''.join([line.rstrip('\n') for line in fd if not (line.startswith('>from') or line.startswith('*'))])
Yes of course you could read a file big enough to overflow memory.
Use string addition
>>> s = 'a'
>>> s += 'b'
>>> s
'ab'
I would prefer:
oneLine = reduce(lambda x,y: x+y, \
[line[:-1] for line in open('/myfolder/testfile.txt')
if not line.startswith('>from') and \
not line.startswith('*')])
line[:-1] in order to remove all the \n
the second argument of reduce is a list comprehension which extracts all the lines you are interested in and removes the \n from the lines.
the reduce (just if you actually need that) to make one string from the list of strings.

Workarounds when a string is too long for a .join. OverflowError occurs

I'm working through some python problems on pythonchallenge.com to teach myself python and I've hit a roadblock, since the string I am to be using is too large for python to handle. I receive this error:
my-macbook:python owner1$ python singleoccurrence.py
Traceback (most recent call last):
File "singleoccurrence.py", line 32, in <module>
myString = myString.join(line)
OverflowError: join() result is too long for a Python string
What alternatives do I have for this issue? My code looks like such...
#open file testdata.txt
#for each character, check if already exists in array of checked characters
#if so, skip.
#if not, character.count
#if count > 1, repeat recursively with first character stripped off of page.
# if count = 1, add to valid character array.
#when string = 0, print valid character array.
valid = []
checked = []
myString = ""
def recursiveCount(bigString):
if len(bigString) == 0:
print "YAY!"
return valid
myChar = bigString[0]
if myChar in checked:
return recursiveCount(bigString[1:])
if bigString.count(myChar) > 1:
checked.append(myChar)
return recursiveCount(bigString[1:])
checked.append(myChar)
valid.append(myChar)
return recursiveCount(bigString[1:])
fileIN = open("testdata.txt", "r")
line = fileIN.readline()
while line:
line = line.strip()
myString = myString.join(line)
line = fileIN.readline()
myString = recursiveCount(myString)
print "\n"
print myString
string.join doesn't do what you think. join is used to combine a list of words into a single string with the given seperator. Ie:
>>> ",".join(('foo', 'bar', 'baz'))
'foo,bar,baz'
The code snippet you posted will attempt to insert myString between every character in the variable line. You can see how that will get big quickly :-). Are you trying to read the entire file into a single string, myString? If so, the way you want to concatenate the strings is like this:
myString = myString + line
While I'm here... since you're learning Python here are some other suggestions.
There are easier ways to read an entire file into a variable. For instance:
fileIN = open("testdata.txt", "r")
myString = fileIN.read()
(This won't have the exact behaviour of your existing strip() code, but may in fact do what you want.)
Also, I would never recommend practical Python code use recursion to iterate over a string. Your code will make a function call (and a stack entry) for every character in the string. Also I'm not sure Python will be very smart about all the uses of bigString[1:]: it may well create a second string in memory that's a copy of the original without the first character. The simplest way to process every character in a string is:
for mychar in bigString:
... do your stuff ...
Finally, you are using the list named "checked" to see if you've ever seen a particular character before. But the membership test on lists ("if myChar in checked") is slow. In Python you're better off using a dictionary:
checked = {}
...
if not checked.has_key(myChar):
checked[myChar] = True
...
This exercise you're doing is a great way to learn several Python idioms.

Categories

Resources