Few weeks ago I needed a crawler for data collection and sorting so I started learning python.
Same day I wrote a simple crawler but the code looked ugly as hell. Mainly because I don't know how to do certain things and I don't know how to properly google them.
Example:
Instead of deleting [, ] and ' in one line I did
extra_nr = extra_nr.replace("'", '')
extra_nr = extra_nr.replace("[", '')
extra_nr = extra_nr.replace("]", '')
extra_nr = extra_nr.replace(",", '')
Because I couldn't do stuff to list object and when I did str(list object) It looked like ['this', 'and this'].
Now I'm creating discord bot that will upload data that I feed to it to google spreadsheet. The code is long and ugly. And it takes like 2-3 secs to start the bot (idk if this is normal, I think the more I write the more time it takes to start it which makes me think that code is garbage). Sometimes it works, sometimes it doesn't.
My question is how do I know that I wrote something good? And if I just keep adding stuff like in the example, how will it affect my program? If I have a really long code do I split it and call the parts of it only when they are needed or how does it work?
tl;dr to get good at Python and write good code, write a lot of Python and read other people's code. Learn multiple approaches to different problem types and get a feel for which to use and when. It's something that comes over time with a lot of practice. As far as resources, I highly recommend the book "Automate the Boring Stuff with Python".
As for your code sample, you could use translate for this:
def strip(my_string):
bad_chars = [*"[],'"]
return my_string.translate({ord(c): None for c in bad_chars})
translate does a character by character translation of the string given a translation table, so you create a small translation table with the characters you don't want set to None.
The list of characters you don't want is created by unpacking (splatting) a string of the characters.
>>> [*"abc"] == ["a", "b", "c"]
True
Another option would be using comprehensions:
def strip(my_string):
bad_chars = [*"[],'"]
return "".join(c for c in my_string if c not in bad_chars)
Here we use the comprehension format [x for x in y] to build a new list of xs from y, just specifying to drop the character if it appears in bad_chars. We then join the remaining list of characters into a string that doesn't have the specified characters in it.
You will definitely improve quickly from reading (or listening) up on Python best practices from resources like Real Python and Talk Python To Me.
Meanwhile, I'd recommend starting using some code analysers like pylint and bandit as part of your regular workflow.
In any case, welcome to the world of Python and enjoy! :-)
You can use maketrans() to define characters to remove (3rd parameter):
def clean(S): return S.translate(str.maketrans("","","[],'"))
clean("A['23']") # 'A23'
Related
I have a script that processes the output of a command (the aws help cli command).
I step through the output line-by-line and don't start the actual real parsing until I encounter the text "AVAILABLE COMMANDS" at which point I set a flag to true and start further processing on each line.
I've had this working fine - BUT on Ubuntu we encounter a problem which is this :
The CLI highlights the text in a way I have not seen before:
The output is very long, so I've grep'd the particular line in question - see below:
># aws ec2 help | egrep '^A'
>AVAILABLE COMMANDS
># aws ec2 help | egrep '^A' | cat -vet
>A^HAV^HVA^HAI^HIL^HLA^HAB^HBL^HLE^HE C^HCO^HOM^HMM^HMA^HAN^HND^HDS^HS$
What I haven't seen before is that each letter that is highligted is in the format X^HX.
I'd like to apply a simple transformation of the type X^HX --> X (for all a-zA-Z).
What have I tried so far:
well my workaround is this - first I remove control characters like this:
String = re.sub(r'[\x00-\x1f\x7f-\x9f]','',String)
but I still have to search for 'AAVVAAIILLAABBLLEE' which is totally ugly. I considered using a further regex to turn doubles to singles but that will catch true doubles and get messy.
I started writing a function with an iteration across a constructed list of alpha characters to translate as described, and I used hexdump to try to figure out the exact \x code of the control characters in question but could not get it working - I could remove H but not the ^.
I really don't want to use any additional modules because I want to make this available to people without them having to install extras. In conclusion I have a workaround that is quite ugly, but I'm sure someone must know a quick an easy way to do this translation. It's odd that it only seems to show up on Ubuntu.
After looking at this a little further I was able to put in place a solution:
from string import ascii_lowercase
from string import ascii_uppercase
def RemoveUbuntuHighlighting(String):
for Char in ascii_uppercase + ascii_lowercase:
Match = Char + '\x08' + Char
String = re.sub(Match,Char,String)
return(String)
I'm still a little confounded to see characters highlighted in the format (X\x08X), the arrangement does seem to repeat the same information unnecessarily.
The other thing I would advise to anyone not familiar with reading hexcode is that each pair of hexes is swapped around with respect to the order of their appearance.
A much simpler and more reliable fix is to replace a backspace and duplicate of any character.
I have also augmented this to handle underscores using the same mechanism (character, backspace, underscore).
String = re.sub(r'(.)\x08(\1|_)', r'\1', String)
Demo: https://ideone.com/yzwd2V
This highlighting was standard back when output was to a line printer; backspacing and printing the same character again would add pigmentation to produce boldface. (Backspacing and printing an underscore would produce underlining.)
Probably the AWS CLI can be configured to disable this by setting the TERM variable to something like dumb. There is also a utility col which can remove this formatting (try col-b; maybe see also colcrt). Though perhaps really the best solution would be to import the AWS Python code and extract the help message natively.
This is a rather generic question, but I have a textfile that I want to edit using a script.
What are some ways to format text, so that it will visually stand out but still be recognized by my script?
It works fine when I use text_to_be_replaced, but it is hard to find when you have a large file.
Tried searching, and it seems that the common ways are:
%text_to_be_replaced%
<text_to_be_replaced>
$(text_to_be_replaced)
But maybe there is a commonly used/widely accepted way to format text for visibility?
The language the script is written in is python, if that matters... but I'm looking for a more-or-less generic soluting which will work 90% of the time.
I'm not aware of any generic standard here, but if it's meant to be replaced, you can use the new string formatting method as follows:
string = 'some text {add_text_here} some more text'
Then to replace it when you need to:
value = 'formatted'
string = string.format(add_text_here=value)
Now print it out:
>>> string
'some text formatted some more text'
In fact, this quite neat at the addition of curly {brackets} around the text that needs to be replaced also may make it stand out a little.
At first I thought that {{curly braces}} would be fine, but than I went with $ALLCAPS.
First of all, caps really stands out, while lowercase may be confused with the rest of the code.
And while it $REALLYSTANDSOUT, it shouldn't cause any problems, since it's just a "bookmark" in a text file, and will be replaced with the appropriate stuff determined by the script.
I am teaching some neighborhood kids to program in Python. Our first project is to convert a string given as a Roman numeral to the Arabic value.
So we developed an function to evaluate a string that is a Roman numeral the function takes a string and creates a list that has the Arabic equivalents and the operations that would be done to evaluate to the Arabic equivalent.
For example suppose you fed in XI the function will return [1,'+',10]
If you fed in IX the function will return [10,'-',1]
Since we need to handle the cases where adjacent values are equal separately let us ignore the case where the supplied value is XII as that would return [1,'=',1,'+',10] and the case where the Roman is IIX as that would return [10,'-',1,'=',1]
Here is the function
def conversion(some_roman):
roman_dict = {'I':1,'V':5,'X':10,'L':50,'C':100,'D':500,'M',1000}
arabic_list = []
for letter in some_roman.upper():
if len(roman_list) == 0:
arabic_list.append(roman_dict[letter]
continue
previous = roman_list[-1]
current_arabic = roman_dict[letter]
if current_arabic > previous:
arabic_list.extend(['+',current_arabic])
continue
if current_arabic == previous:
arabic_list.extend(['=',current_arabic])
continue
if current_arabic < previous:
arabic_list.extend(['-',current_arabic])
arabic_list.reverse()
return arabic_list
the only way I can think to evaluate the result is to use eval()
something like
def evaluate(some_list):
list_of_strings = [str(item) for item in some_list]
converted_to_string = ''.join([list_of_strings])
arabic_value = eval(converted_to_string)
return arabic_value
I am a little bit nervous about this code because at some point I read that eval is dangerous to use in most circumstances as it allows someone to introduce mischief into your system. But I can't figure out another way to evaluate the list returned from the first function. So without having to write a more complex function.
The kids get the conversion function so even if it looks complicated they understand the process of roman numeral conversion and it makes sense. When we have talked about evaluation though I can see they get lost. Thus I am really hoping for some way to evaluate the results of the conversion function that doesn't require too much convoluted code.
Sorry if this is warped, I am so . . .
Is there a way to accomplish what eval does without using eval
Yes, definitely. One option would be to convert the whole thing into an ast tree and parse it yourself (see here for an example).
I am a little bit nervous about this code because at some point I read that eval is dangerous to use in most circumstances as it allows someone to introduce mischief into your system.
This is definitely true. Any time you consider using eval, you need to do some thinking about your particular use-case. The real question is how much do you trust the user and what damage can they do? If you're distributing this as a script and users are only using it on their own computer, then it's really not a problem -- After all, they don't need to inject malicious code into your script to remove their home directory. If you're planning on hosting this on your server, that's a different story entirely ... Then you need to figure out where the string comes from and if there is any way for the user to modify the string in a way that could make it untrusted to run. Hackers are pretty clever1,2 and so hosting something like this on your server is generally not a good idea. (I always assume that the hackers know python WAY better than I do).
1http://blog.delroth.net/2013/03/escaping-a-python-sandbox-ndh-2013-quals-writeup/
2http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html
The only implementation of a safe expression evalulator that I've come across is:
https://pypi.org/project/simpleeval/
It supports a lot of basic Python-ish expressions and is quite restricted in what it allows you to do (so you don't blow up the interpreter or do something evil). It uses the python ast module for parsing, and evaluates the result itself.
Example:
from simpleeval import simple_eval
simple_eval("21 + 21")
Then you can extend it and give it access to the parts of your program that you want to:
simple_eval("x + y", names={"x": 22, "y": 48})
or
simple_eval("do_thing(11)", functions={"do_thing": my_callback})
and so on.
I need some help;
I'm trying to program a sort of command prompt with python
I need to split a text file into lines then split them into strings
example :
splitting
command1 var1 var2;
command2 (blah, bleh);
command3 blah (b bleh);
command4 var1(blah b(bleh * var2));
into :
line1=['command1','var1','var2']
line2=['command2']
line2_sub1=['blah','bleh']
line3=['blah']
line3_sub1=['b','bleh']
line4=['command4']
line4_sub1=['blah','b']
line4_sub2=['bleh','var2']
line4_sub2_operand=['*']
Would that be possible at all?
If so could some one explain how or give me a piece of code that would do it?
Thanks a lot,
It's been pointed out, that there appears to be no reasoning to your language. All I can do is point you to pyparsing, which is what I would use if I were solving a problem similar to this, here is a pyparsing example for the python language.
Like everyone else is saying, your language is confusingly designed and you probably need to simplify it. But I'm going to give you what you're looking for and let you figure that out the hard way.
The standard python file object (returned by open()) is an iterator of lines, and the split() method of the python string class splits a string into a list of substrings. So you'll probably want to start with something like:
for line in command_file
words = line.split(' ')
http://docs.python.org/3/library/string.html
you could use this code to read the file line by line and split it by spaces between words.
a= True
f = open(filename)
while a:
nextline=f.readline()
wordlist= nextline.split("")
print(wordlist)
if nextline=="\n":
a= False
What you're talking about is writing a simple programming language. It's not extraordinarily difficult if you know what you're doing, but it is the sort of thing most people take a full semester class to learn. The fact that you've got multiple different types of lexical units with what looks to be a non-trivial, recursive syntax means that you'll need a scanner and a parser. If you really want to teach yourself to do this, this might not be a bad place to start.
If you simplify your grammar such that each command only has a fixed number of arguments, you can probably get away with using regular expressions to represent the syntax of your individual commands.
Give it a shot. Just don't expect it to all work itself out overnight.
I have a text file with lots of lines and with this structure:
[('name_1a',
'name_1b',
value_1),
('name_2a',
'name_2b',
value_2),
.....
.....
('name_XXXa',
'name_XXXb',
value_XXX)]
I would like to convert it to:
name_1a, name_1b, value_1
name_2a, name_2b, value_2
......
name_XXXa, name_XXXb, value_XXX
I wonder what would be the best way, whether awk, python or bash.
Thanks
Jose
Tried evaluating it python? Looks like a list of tuples to me.
eval(your_string)
Note, it's massively unsafe! If there's code in there to delete your hard disk, evaluating it will run that code!
I would like to use Python:
lines = open('filename.txt','r').readlines()
n = len(lines) # n % 3 == 0
for i in range(0,n,3):
name1 = lines[i].strip("',[]\n\r")
name2 = lines[i+1].strip("',[]\n\r")
value = lines[i+2].strip("',[]\n\r")
print name1,name2,value
It looks like legal Python. You might be able to just import it as a module and then write it back out after formatting it.
Oh boy, here is a job for ast.literal_eval:
(literal_eval is safer than eval, since it restricts the input string to literals such as strings, numbers, tuples, lists, dicts, booleans and None:
import ast
filename='in'
with open(filename,'r') as f:
contents=f.read()
data=ast.literal_eval(contents)
for elt in data:
print(', '.join(map(str,elt)))
here's one way to do it with (g)awk
$ awk -vRS=")," ' { gsub(/\n|[\047\]\[)(]/,"") } 1' file
name_1a,name_1b,value_1
name_2a,name_2b,value_2
name_XXXa,name_XXXb,value_XXX
Awk is typically line oriented, and bash is a shell, with limited numbrer of string manipulation functions. It really depends on where your strength as a programmer lies, but all other things being equal, I would choose python.
Did you ever consider that by redirecting the time it took to post this on SO, you could have had it done?
"AWK is a language for processing
files of text. A file is treated as a
sequence of records, and by default
each line is a record. Each line is
broken up into a sequence of fields,
so we can think of the first word in a
line as the first field, the second
word as the second field, and so on.
An AWK program is of a sequence of
pattern-action statements. AWK reads
the input a line at a time. A line is
scanned for each pattern in the
program, and for each pattern that
matches, the associated action is
executed." - Alfred V. Aho[2]
Asking what's the best language for doing a given task is a very different question to say, asking: 'what's the best way of doing a given task in a particular language'. The first, what you're asking, is in most cases entirely subjective.
Since this is a fairly simple task, I would suggest going with what you know (unless you're doing this for learning purposes, which I doubt).
If you know any of the languages you suggested, go ahead and solve this in a matter of minutes. If you know none of them, now enters the subjective part, I would suggest learning Python, since it's so much more fun than the other 2 ;)
If the values are legal python values, you can take advantage of eval() since your data is a legal python data sucture. The following would work if values are integers, otherwise you might have to massage the print call a bit:
input = """[('name_1a',
'name_1b',
1),
('name_2a',
'name_2b',
2),
('name_XXXa',
'name_XXXb',
3)]"""
for e in eval(input):
print '%s,%s,%d' % e
P.S. using eval() is quite controversial since it will execute any valid python code that you pass into it, so take care.