Literate Python [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I liked the idea of Literate CoffeeScript and wanted to see if I could get something working with Python. I tried to accomplish something similar to this simple Literate Ruby and ended up with the following. The program takes Literate Python in standard input, a file, or list of files and executes just the code.
import fileinput
import re
code = ''
for line in fileinput.input():
match = re.match(r'([ ]{4}|\t)(.*)', line)
if match:
code += match.group(2) + '\n'
exec code
A simple Literate Python file.
# Python Hello World Program
A simple example of a Literate Python Hello Word program.
print "hello world"
I'm new to Python and wanted to know if I'm missing something or if there is a better solution.

As I said in a comment, eval or exec are evil, a better alternative is to use the code module as in the following example. Also, you may prefer to compile the regex, in order to only perform the matching at each iteration over the input instead of building the automaton at each iteration.
import fileinput
import code
console = code.InteractiveConsole(locals())
import re
r = re.compile(r'([ ]{4}|\t)(.*)')
code = ''
for line in fileinput.input():
match = r.match(line)
if match:
console.push(match.group(2))
Though that example will output the results on sys.stdout so you may want to use instead an overloaded class such as the one in this example.

Combing python and markdown-like syntax is best done using tools rather than changing the language. For example:
sphinx (render output from restructured text in doc strings or other files, both of which may have embedded code samples)
ipython notebook (a combination of "cells" with either markdown or python code)

Related

Is there any way to retrieve file name using Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In a Linux directory, I have several numbered files, such as "day1" and "day2". My goal is to write a code that retrieves the number from the files and add 1 to the file that has the biggest number and create a new file. So, for example, if there are files, 'day1', 'day2' and 'day3', the code should read the list of files and add 'day4'. To do so, at least I need to know how to retrieve the numbers on the file name.
I'd use os.listdir to get all the file names, remove the "day" prefix, convert the remaining characters to integers, and take the maximum.
From there, it's just a matter of incrementing the number and appending it to the same prefix:
import os
max_file = max([int(f[3:]) for f in os.listdir('some_directory')])
new_file = 'day' + str(max_file + 1)
Get all files with the os module/package (don't have the exact command handy) and then use regex(package) to get the numbers. If you don't want to look into regex you could remove the letters from your string with replace() and convert that string with int().
Glob would be good for this. It is kind of regex, but specially for file search and simpler. Basically you just use * as a wildcard, and you can select numbers too. Just google what it exactly is. It can be pretty powerful and is native to the bash shell for example.
for glob import glob
from pathlib import Path
pattern = "day"
last_file_number = max(map(lambda f: int(f[len(pattern):]), glob(pattern + "[0-9]*")))
Path("%s%d" % (pattern, last_file_number + 1)).touch()
You can also see that I use pathlib here. This is a library to deal with the file system in an OOP manner. Some people like, some don't.
So, a little disclaimer: Glob is not as powerful as regex. Here daydream for example won't be matched, but day0dream would still be matched. You can also try day*[0-9], but then daydream0 would still be matched. Off course you can also use day[0-9] if you know you stay below double digits. So, if your use case requires this, you can use glob and filter down with regex.

Best practice: local variables in a function (explicit vs implicit) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Is there a recommended style when it comes to the use local variables inside a function? Should we use more explicit local variables as in style 1 or not as in style 2?
Two possible styles:
Style 1:
import re
def doc_to_lower(url_raw):
url_lower = [word.lower() for word in url_raw]
return url_lower
def process_data(url_raw):
url_split = re.split('//|/|-', url_raw)
url_lower = doc_to_lower(url_split)
return url_lower
url = 'http://www.bbc.com/sport/football/41653935'
tokens = process_data(url)
for token in tokens:
print(token)
Style 2:
import re
def doc_to_lower(url_raw):
return [word.lower() for word in url_raw]
def process_data(url_raw):
return doc_to_lower(re.split('//|/|-', url_raw))
url = 'http://www.bbc.com/sport/football/41653935'
tokens = process_data(url)
for token in tokens:
print(token)
Pretty sure this is a case where personal opinions will arise. But for me, situation 2 represent a more pythonic way of representing things.
The main reason of my answer is the fact that your function name in this case says it all. I declare local variable only if I have to or if it helps for readability.
Hope it helps
EDIT
To demonstrate my answer take this part of your code,
Style 1:
def process_data(url_raw):
url_split = re.split('//|/|-', url_raw)
url_lower = doc_to_lower(url_split)
return url_lower
Style 2:
def process_data(url_raw):
return doc_to_lower(re.split('//|/|-', url_raw))
If I were to reuse your code, at a glance I'd think style 1 return a lowered url and would understand that in style 2 the function is used to processed data.
I'm not trying to say that I'm an expert or anything and this is debatable, I'm just trying to clarify my point.
I prefef style 2, because I find it easier to read.
I can think of two reasons to use style 1 in certain cases:
When the expression becomes very complex. Using style 1 you can split up parts of the expression and assign it a readable name.
When the value of a subexpression must be available for an assert statement, debugging or a test case.
Great question and well done for thinking about readability all the time, making it easier down the line.
I think my answer would have to be follow the coding standard of your place of work where possible. This is most important, there should be consistency with the other developers you are working with.
If there is no coding standard, arrange a meeting and write one up together. That way you're all workingfrom the same script (pardon the pun) and the code will be readable to everyone.
My personal preference would be the explicit version. For me it would be clearer what was going on and thus reduce my own errors. However I understand that some would see this as a slight overkill in simple examples. I guess it comes down to what languages you learnt first and how and where you learnt them.

Why is my perl script an order of magnitude faster than the equivalent python code [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I've recently taken up Python3 and was stumped by how much slower than other comparable dynamic languages (mainly Perl) it is.
While trying to learn Python I did several online coding challenges and Python would often be at least 10x slower than Perl and use at least 2x the memory.
Researching this curiosity I came across people asking why Python is slower than C/C++, which should be pretty obvious, but not any posts comparing it to other similar languages. There is also this informative but outdated benchmark http://raid6.com.au/~onlyjob/posts/arena/ which confirms it being rather slow.
I am explicity asking about the standard Python implementation and NOT anything like pypy or the likes.
EDIT:
The reason I was surprised comes from the results page on codeeval.com.
Here are two scripts to capitalize the first character of every word in a line.
Python3 (3.4.3) v1
import sys
import re
def uc(m):
c = m.group(1)
return c.upper()
f = open(sys.argv[1], "r")
for line in f:
print(re.sub(r"\b(\D)", uc, line))
Perl (5.18.2)
use strict;
use warnings "all";
open(my $fh, "<", "$ARGV[0]") or die;
while (<$fh>)
{
s,\b(\D),uc $1,ge;
print;
}
close $fh;
As I am not very familiar with Python yet I also tried a different version to see if there was any difference.
Python3 v2:
import sys
f = open(sys.argv[1], "r")
for line in f:
lst = [word[0].upper() + word[1:] for word in line.split()]
print(" ".join(lst))
The results are quite different as can be seen in this image: (results for Python in this image are from the v1, v2 had nearly identical stats (+1 ms execution time, ~same memory usage)

How to cut link in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have the following link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg
How to take just this one part of the link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
and remove everything else? I also want to keep the extension.
I want to remove this part:
._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_
and keep this part:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
How can I do this in python?
You could use:
re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
This makes some assumptions but works on your input. The search starts at the ._ sequence, takes anything after that that is a letter, digit, dash, underscore, dot or comma, then matches the extension. I picked an explicit small group of possible extensions; you could also just use (\.w+)$ at the end instead to widen the acceptable extensions to word characters.
Demo:
>>> import re
>>> inputurl = 'http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg'
>>> re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
'http://ecx.images-amazon.com/images/I51JXXb2vpDL.jpg'
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
l = url.split(".")
print(".".join(l[:-2:])+".{}".format(l[-1]))
prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
The following should work:
import re
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
print re.sub(r"(https?://.+?)\._.+(\.\w+)", r'\1\2', url)
The above code prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
An important detail: More links are necessary to find the correct pattern. I'm currently assuming you want everything until the first ._
url = re.sub("(/[^./]+)\.[^/]*?(\.[^.]+)$", "\\1\\2", url)

Trying to call an executable with arguments from Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am using a third party software that I can run in command line which consists of the .exe file as well as several switches that pass through arguments. The goal is to script this using Python but I am a beginner to programming in Python and could use some help in translating the following command line in Python. The arguments are seperated by "/" and the arguments are /inbook1, /inbook2, /report
C:\Program Files(x86)\Florencesoftt\diffenginex\diffenginex.exe /inbook1:"c:\users\file.xlsx /inbook2: "c:\users\file2.xlsx /report:"c:\users\file3.xlsx"
So, would anyone be able to help me call this command using Python?
You want to use the subprocess module.
Exactly how you want to use it depends on exactly what you want to do. For example, do you want to let the program's output mix in with your output, or do you want to capture it to a string? Do you want to wait until it's done, or kick it off in the background?
Fortunately, the documentation is pretty clear, and explains how to do each thing you might want to do.
Meanwhile, I'm 95% sure you've gotten some of the quotes wrong on your command line. For example, the /inbook argument starts with a ", which isn't closed until the start of the /inbook2 argument.
Anyway, trying to guess what you might want, here's one possibility:
args = [r'C:\Program Files(x86)\Florencesoftt\diffenginex\diffenginex.exe',
r'/inbook1:"c:\users\file.xlsx"',
r'/inbook2:"c:\users\file2.xlsx"',
r'/report:"c:\users\file3.xlsx"']
output = subprocess.check_output(args)
The check_output function runs the program, waits for it to finish, raises an exception if it finishes with an error return code, and returns the program's output as a string (or, in Python 3, a bytes).
The extra double quotes probably aren't necessary here (especially since there are no spaces in your pathnames), but since they were in your original code, I left them as-is. Generally, Python will do whatever is necessary to get each separate argument treated as a single argument by the target program, even if you have arguments that have spaces or quotes in them, so you don't have to worry about that.
Mweanwhile, if it's easier to write the arguments as one big string, instead of as a list of four separate strings, you can do that instead. (Only on Windows; don't do it on Unix unless you're using shell=True.) But usually that just means more opportunities to get the quoting wrong, and since you appear to have already gotten it wrong multiple times, I think you're better off this way.
I'm a newbie, but would you like to try this code:
---EDIT---
I've edited so much according to #Abarnert's suggestions that this code is more his than mine, so don't up-vote me. I'm leaving the solution for sake of the fact that now it should work.
The code:
import subprocess
basecommand = r"C:\Program Files(x86)\Florencesoftt\diffenginex\diffenginex.exe"
inbook1 = r"c:\users\file.xlsx"
inbook2 = r"c:\users\file2.xlsx"
report = r"c:\users\file3.xlsx"
inbook1 = r'/inbook1:"' + inbook1 + '"'
inbook2 = r'/inbook2:"' + inbook2 + '"'
report = r'/report:"' + report + '"'
subprocess.call([basecommand, inbook1, inbook2, report])
Thanks #Abarnert!

Categories

Resources