Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm writing a program that will create a binary tree of the Morse Code alphabet (as well as a period and an apostrophe), and which will then read a line of Morse Code and translate it into English. (Yes, I know that a look-up table would be easier, but I need to sort out my binary trees). I think a good bit of my problem is that I want to put the values into the tree in alphabetical order, rather than by symbol order. But surely there must be a way to do that? Because if I had a million such values that weren't numeric, I wouldn't need to sort them into the simplest order for insertion...right?
It's reading from a text file where each line has one sentence in Morse Code.
- .... .. ... .. ... ..-. ..- -. .-.-.- for example, which is "This is fun."
1 space between symbols means it's a new letter, 2 spaces means it's a new word.
As it stands, I'm getting the output ".$$$" for that line given above, which means it's reading a period and then getting an error which is symbolized by ('$$$'), which is obviously wrong...
Like I said before, I know I'm being complicated, but surely there's a way to do this without sorting the values in my tree first, and I'd like to figure this out now, rather than when I'm in a time crunch.
Does anyone have any insight? Is this something so horribly obvious that I should be embarrassed for asking about it?
Welcome to SO and thanks for an interesting question. Yes, it looks to me like you're overcomplicating things a bit. For example, there's absolutely no need to use classes here. You can reuse existing python data structures to represent a tree:
def add(node, value, code):
if code:
add(node.setdefault(code[0], {}), value, code[1:])
else:
node['value'] = value
tree = {}
for value, code in alphabet:
add(tree, value, code)
import pprint; pprint.pprint(tree)
This gives you a nested dict with keys ., -, and value which will be easier to work with.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have some invalid characters in my file that I'm trying to remove. But I ran into a strange problem with one of them.
When I try to use the replace function then I'm getting an error SyntaxError: EOL while scanning string literal.
I found that I was dealing with \x1d which is a group separator. I have this code to remove it:
import pandas as pd
df = pd.read_csv('C:/Users/tkp/Desktop/Holdings_Download/dws/example.csv',index_col=False, sep=';', encoding='utf-8')
print(df['col'][0])
df = df['col'][0].encode("utf-8").replace(b"\x1d", b"").decode()
df = pd.DataFrame([x.split(';') for x in df.split('\n')])
print(df[0][0])
Output:
Is there another way to do this? Because it seems to me that I couldn't do it any worse this.
Notice that you are getting a SyntaxError. This means that Python never gets as far as actually running your program, because it can't figure out what the program is!
To be honest, I'm not quite sure why this happens in this case, but using "exotic" characters in string constants is always a bit iffy, because it makes you dependent on what the character encoding of the source code is, and puts you at the mercy of all sorts of buggy editors. Therefore, I would recommend using the '\uXXXX' syntax to explicitly write the Unicode number for the character you wish to replace. (It looks like what you have here is U+2194 DOUBLE ARROW, so '\u2194' should do it.)
Having said that, I would first verify that this is actually the problem, by changing the '↔' bit to something more mundane, like 'x' and seeing whether that causes the same error. If it does, then your problem is somewhere else...
You have to specify the encoding for which this character is defined in the charset.
df = df.replace('#', '', encoding='utf-8')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
for x in s[:].split():
s = s.replace(x, x.capitalize())
I want to know how the for loop will progress and what exactly s[:] means and will do?
Assuming s is a string, s[:] makes a copy of the string, and split() splits the string at spaces and returns an array of substrings, the for loop will then iterate over the substrings.
It's actually unnecessary because split returns an array, so even the though the for loop modifies the original string, the loop iterable isn't reevaluated multiple times, so you don't need to copy it.
s is very likely to be a string because the split is a method in str (of course, you can also say that s is an instance, which comes from a class that is defined by the user in which also has a split method ...)
Normally s[:] is like a slice. The following can help you to understand.
s ='abc AB C dd'
print(s)
print(s[:]) # same as s
print(s[:3]) # abc
print(s[4:6]) # AB
print(s[-1]) # d
for x in s[:].split():
s = s.replace(x, x.capitalize())
print(s) # Abc Ab C Dd # Now you know the capitalize is what, right?
digression
The following is a digression.
I think your question is very bad,
First, this question is very basic.
second, its subject is not good.
Note that an ideal Stack Overflow question is specific and narrow -- the idea is to be a huge FAQ.
And now, you tell me searching how the loop will work? I mean, if you are a programmer who must know how the loop it is.
so when you ask a question, you have to think twice about what the title name can benefit everyone. (not just you only)
I suggest that you can delete this question after you understand it.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a .txt file that has a really long RNAm sequence. I don´t know the exact length of the sequence.
What I need to do is extract the part of the sequence that is valid, meaning it starts with "AUG" and ends in "UAA" "UAG" or "UGA". Since the sequence is too long I don´t know the index of any of the letters or where the valid sequence is.
I need to save the new sequence in another variable.
Essentially, what you need to do, without coding the whole thing for you, is:
Example string:
rnaSequence = 'ACGUAFBHUAUAUAGAAAAUGGAGAGAGAAAAUUUGGGGGGGAAAAAAUAAAAAGGGUAUAUAGAUGAGAGAGA'
You will want to find the index of the 'AUG' and the index of 'UAA', 'UAG', or 'UGA' .. Something like this
rnaStart = rnaSequence.index(begin)
Then you'll need to set the slice of the string to a new variable
rnaSubstring = rnaSequence[rnaStart:rnaEnd+3]
Which in my string above, returns:
AUGGAGAGAGAAAAUUUGGGGGGGAAAAAAUAA
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
With the following expected input:
[u'able,991', u'about,11', u'burger,15', u'actor,22']
How can I split each string by the comma and return the second half of the string as an int?
This is what I have so far:
def split_fileA(line):
# split the input line in word and count on the comma
<ENTER_CODE_HERE>
# turn the count to an integer
<ENTER_CODE_HERE>
return (word, count)
One of the first things you'll need in learning how to code, is to get to know the set of functions and types you have natively available to you. Python's built-in functions is a good place to start. Also get the habit of consulting the documentation for the stuff you use; it's a good habit. In this case you'll need split and int. Split does pretty much what it says, it splits a given string into multiple tokens, given a separator. You'll find several examples with a simple search in google. int, on the other hand, parses a string (one of the things it does) into a numeric value.
In your case, this is what it means:
def split_fileA(line):
# split the input line in word and count on the comma
word, count = line.split(',')
# turn the count to an integer
count = int(count)
return (word, count)
You won't get this much here in stackoverflow, has other users are often reluctant to do your homework for you. It seems to me that you are at the very beginning of learning how to code so I hope this helps you get started, but remember that learning is also about trial and error.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I tried to begin with Biopython. So that I can do my thesis in it. But this really makes me think twice. Show features missing, when I tried a integer value, it does not work and same is the case with string too. Kindly help. Thank you.
Link:
http://imgur.com/87Gw9E5
Biopython seems pretty robust to me, the errors are probably due to your inexperience with it.
You have several errors, one of them is that you forgot to end the strings with "". The following lines
print "location start, features[ftNum].location.start # note location.start"
print "feature qualifiers,features[ftNum].qualifiers"
should be corrected to
print "location start", features[ftNum].location.start # note location.start
print "feature qualifiers", features[ftNum].qualifiers
Furthermore, as Wooble pointed out the condition in your while loop is wrong. I'm guessing you meant to to invert the ">", that is, the number of features should be greater than zero.
Please add some example data and error messages.
The guys at Biopython actually made it easy to deal with the features. Your problem is string management (plain python). I've used format, but you can use the % operator.
Also in python you rarely have to keep the count when looping. Python is not C.
from Bio import SeqIO
for record in SeqIO.parse("NG_009616.gb", "genbank"):
# You don't have to take care of the number of features with a while
# Loop all of them.
for feature in record.features:
print "Attributes of feature"
print "Type {0}".format(feature.type)
print "Start {0}".format(feature.location.start)
print "End {0}".format(feature.location.end)
print "Qualifiers {0}".format(feature.qualifiers)
# This is the right way to extract the sequence:
print "Sequence {0}".format(feature.location.extract(record).seq)
print "Sub-features {0}".format(feature.sub_features)