Search inside ipython history - python

ipython's %his command outputs recent commands entered by the user. Is it possible to search within these commands? Something like this:
[c for c in %history if c.startswith('plot')]
EDIT I am not looking for a way to rerun a command, but to locate it in the history list. Of course, sometimes I will want to rerun a command after locating it, either verbatim or with modifications.
EDIT searching with ctr-r and then typing plot gives the most recent command that starts with "plot". It won't list all the commands that start with it. Neither can you search within the middle or the end of the commands
Solution
Expanding PreludeAndFugue's solution here what I was looking for:
[l for l in _ih if l.startswith('plot')]
here, the if condition can be substituted by a regex

Even better: %hist -g pattern greps your past history for pattern. You can additionally restrict your search to the current session, or to a particular range of lines. See %hist?
So for #BorisGorelik's question you would have to do
%hist -g plot
Unfortunately you cannot do
%hist -g ^plot
nor
%hist -g "^plot"

If you want to re-run a command in your history, try Ctrl-r and then your search string.

I usually find myself wanting to search the entire ipython history across all previous and current sessions. For this I use:
from IPython.core.history import HistoryAccessor
hista = HistoryAccessor()
z1 = hista.search('*numpy*corr*')
z1.fetchall()
OR (don't run both or you will corrupt/erase your history)
ip = get_ipython()
sqlite_cursor = ip.history_manager.search('*numpy*corr*')
sqlite_cursor.fetchall()
The search string is not a regular expression. The iPython history_manager uses sqlite's glob * search syntax instead.

Similar to the first answer you can do the following:
''.join(_ih).split('\n')
However, when iterating through the command history items you can do the following. Thus you can create your list comprehension from this.
for item in _ih:
print item
This is documented in the following section of the documentation:
http://ipython.org/ipython-doc/dev/interactive/reference.html#input-caching-system

There is the way you can do it:
''.join(_ip.IP.shell.input_hist).split('\n')
or
''.join(_ip.IP.shell.input_hist_raw).split('\n')
to prevent magick expansion.

from IPython.core.history import HistoryAccessor
def search_hist(pattern,
print_matches=True,
return_matches=True,
wildcard=True):
if wildcard:
pattern = '*' + pattern + '*'
matches = HistoryAccessor().search(pattern).fetchall()
if not print_matches:
return matches
for i in matches:
print('#' * 60)
print(i[-1])
if return_matches:
return matches

%history [-n] [-o] [-p] [-t] [-f FILENAME] [-g [PATTERN [PATTERN ...]]]
[-l [LIMIT]] [-u]
[range [range ...]]
....
-g <[PATTERN [PATTERN …]]>
treat the arg as a glob pattern to search for in (full) history. This includes the saved history (almost all commands ever written). The pattern may contain ‘?’ to match one unknown character and ‘*’ to match any number of unknown characters. Use ‘%hist -g’ to show full saved history (may be very long).
Example (in my history):
In [23]: hist -g cliente*aza
655/58: cliente.test.alguna.update({"orden" : 1, "nuevo" : "azafran"})
655/59: cliente.test.alguna.update({"orden" : 1} , {$set : "nuevo" : "azafran"})
655/60: cliente.test.alguna.update({"orden" : 1} , {$set : {"nuevo" : "azafran"}})
Example (in my history):
In [24]: hist -g ?lie*aza
655/58: cliente.test.alguna.update({"orden" : 1, "nuevo" : "azafran"})
655/59: cliente.test.alguna.update({"orden" : 1} , {$set : "nuevo" : "azafran"})
655/60: cliente.test.alguna.update({"orden" : 1} , {$set : {"nuevo" : "azafran"}})

Related

Snakemake with one input but multiple parameters permutations

I have been trying to wrap my head around this problem which probably has a very easy solution.
I am running a bioinformatics workflow where I have one file as input and I want to run a program on it. However I want that program to be run with multiple parameters. Let me explain.
I have file.fastq and I want to run cutadapt (in the shell) with two flags: --trim and -e. I want to run trim with values --trim 0 and --trim 5. Also I want -e with values -e 0.1 and -e 0.5
Thererfore I want to run the following:
cutadapt file.fastq --trim0 -e0.5 --output ./outputs/trim0_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.5 --output ./outputs/trim5_error0.5/trimmed_file.fastq
cutadapt file.fastq --trim0 -e0.1 --output ./outputs/trim0_error0.1/trimmed_file.fastq
cutadapt file.fastq --trim5 -e0.1 --output ./outputs/trim5_error0.1/trimmed_file.fastq
I thought snakemake would be perfect for this. So far I tried:
E = [0.1, 0.5]
TRIM = [5, 0]
rule cutadapt:
input:
"file.fastq"
output:
expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
params:
trim = TRIM,
e = E
shell:
"cutadapt {input} -e{params.e} --trim{params.trim} --output {output}"
However I get an error like this:
shell:
cutadapt file.fastq -e0.1 0.5 --trim0 5 --output {output}
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
So, as you can see, snakemake is not taking each argument of the TRIM and E variables, but putting them together like a string. How could I solve this problem? Thank you in advance
When specifying params, right now you are providing full lists rather than specific values. Contrast the following parameter values:
E = [0.1, 0.5]
TRIM = [5, 0]
rule all
input: expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
rule cutadapt:
input:
"file.fastq"
output: "../outputs/trim{TRIM}_error{E}/trimmed_file.fastq"
params:
trim_list = TRIM,
trim_value = lambda wildcards: wildcards.TRIM,
shell:
"cutadapt {input} -e{wildcards.E} --trim{wildcards.TRIM} --output {output}"
Note that in the shell directive there was no need to reference params, since this directive is aware of wildcards.
Expanding on #SultanOrzbayev 's answer, The key was that I needed rule all with the parameters in order to use wildcards in my rule cutadapt.
The parameters section can actually be erased, so in the end:
E = [0.1, 0.5]
TRIM = [5, 0]
rule all
input: expand("../outputs/trim{TRIM}_error{E}/trimmed_file.fastq", E=E, TRIM=TRIM)
rule cutadapt:
input:
"file.fastq"
output: "../outputs/trim{TRIM}_error{E}/trimmed_file.fastq"
shell:
"cutadapt {input} -e{wildcards.E} --trim{wildcards.TRIM} --output {output}"

Handling equivalent file extensions in Snakemake

Essentially, I want to know what the recomended way of handling equivalent file extensions is in snakemake. For example, lets say I have a rule that counts the number of entries in a fasta file. The rule might look something like....
rule count_entries:
input:
["{some}.fasta"]
output:
["{some}.entry_count"]
shell:
'grep -c ">" {input[0]} > {output[0]}'
This works great. But what if I want this rule to also permit "{some}.fa" as input?
Is there any clean way to do this?
EDIT:
Here is my best guess at the first proposed sollution. This can probably be turned into a higher order function to be more general purpose but this is the basic idea as I understand it. I don't think this idea really fits any general use case though as it doesn't cooperate with other rules at the "building DAG" stage.
import os
def handle_ext(wcs):
base = wcs["base"]
for file_ext in [".fasta", ".fa"]:
if(os.path.exists(base + file_ext)):
return [base + file_ext]
rule count_entries:
input:
handle_ext
output:
["{base}.entry_count"]
shell:
'grep -c ">" {input[0]} > {output[0]}'
EDIT2: Here is the best current sollution as I see it...
count_entries_cmd = 'grep -c ">" {input} > {output}'
count_entries_output = "{some}.entry_count"
rule count_entries_fasta:
input:
"{some}.fasta"
output:
count_entries_output
shell:
count_entries_cmd
rule count_entries_fa:
input:
"{some}.fa"
output:
count_entries_output
shell:
count_entries_cmd
One thing I noticed is that you are trying to specify lists of files in both input and output sections but actually your rule takes a single file and produces another file.
I propose you a straightforward solution of specifying two separate rules for different extensions:
rule count_entries_fasta:
input:
"{some}.fasta"
output:
"{some}.entry_count"
shell:
'grep -c ">" {input} > {output}'
rule count_entries_fa:
input:
"{some}.fa"
output:
"{some}.entry_count"
shell:
'grep -c ">" {input} > {output}'
These rules are not ambiguous unless you keep files with the same {some} name and different extension in the same folder (which I hope you don't do).
One possible solution is to only allow the original rule to take .fasta files as input, but enable .fa files to be renamed to that. For example,
rule fa_to_fasta:
input:
"{some}.fa"
output:
temp("{some}.fasta")
shell:
"""
cp {input} {output}
"""
Clearly this has the disadvantage of making a temporary copy of the file. Also, if foo.fa and foo.fasta are both provided (not through the copying), then foo.fasta will silently overshadow foo.fa, even if they are different.
Even though OP has edited his entry and included the possible workaround via the input functions, I think it is best to list it also here as an answer to highlight this as possible solution. At least for me, this was the case :)
So, for example if you have an annotation table for your samples, which includes the respective extensions for each sample-file (e.g. via PEP), then you can create a function that returns these entries and pass this function as input to a rule. My example:
# Function indicates needed input files, based on given wildcards (here: sample) and sample annotations
# In my case the sample annotations were provided via PEP
def get_files_dynamically(wildcards):
sample_file1 = pep.sample_table["file1"][wildcards.sample]
sample_read2 = pep.sample_table["file"][wildcards.sample]
return {"file1": sample_file1, "file2": sample_file2}
# 1. Perform trimming on fastq-files
rule run_rule1:
input:
unpack(get_files_dynamically) # Unpacking allows naming the inputs
output:
output1="output/somewhere/{sample}_1.xyz.gz",
output2="output/somewhere/{sample}_2.xyz.gz"
shell:
"do something..."

How can we generate a list which it element is dict?

Before Question:
There are may android applications
There are lots of versions under same application
got the newest application(versionCode indicate the newest)
eg:
/path/to/apk1/history/v1/xxx.apk versionCode=1
/path/to/apk1/history/v1.3/xxx.apk versionCode=1.3
/path/to/apk/v2/xxx.apk versionCode=3
well,/path/to/apk/v2/xxx.apk win out.
What's I do?
analysis the versionCode from apk file with below function.
def analysisApk(apkPath):
outfile = os.popen("./aapt d badging %s " % apkPath,'r').read()
match = re.compile("package: name='(\S+)' versionCode='(\d+)' versionName='(\S+)'").match(outfile)
packageName=match.group(1)
versionCode = match.group(2)
append the unique dict to the global list which store the application we want.
I can generate a single dict,but I don't know how to generate a dict list
I can generate a dict list this {packageName:"com.xxx.1",versionCode:2,apkPath:"/path/to/1.apk"}
unfortunately I do not konw how to append the dicts to below list
[{packageName:"com.xxx.1",versionCode:2,apkPath:"/path/to/1.apk"},{packageName:"com.xxx.2",versionCode:2,apkPath:"/path/to/2.0.apk"},{{packageName:"com.xxx.2",versionCode:1,apkPath:"/path/to/v2.apk"}},...,{{packageName:"com.xxx.n",versionCode:2,apkPath:"/path/to/n.apk"}}]
the del line need remove from the list(a newest versionCode coming..)
this quote does not the really result ,only example for us to understand this quesiton
What's my Question:
how to generate the dict list step by step follow my point?
Is there any others ideas for this problem?
I assumed, on basis of your regular expression, that your input is a file with lines like these:
lines = [
"package: name='/path/to/apk1/history/v1/xxx.apk' versionCode='1' versionName='one'",
"package: name='/path/to/apk1/history/v1.3/xxx.apk' versionCode='1.3' versionName='two'",
"package: name='/path/to/apk/v2/xxx.apk' versionCode='3' versionName='three'",
]
You can get the lines as a list via the .readlines() function of the filehandle.
Then you can iterate over them, use whatever regex you need to extract your values and put them in a dictionary which is recreated each loop (apk_dict).
Before going into the next loop we put our dictionary into a list outside the loop (apk_list) with .append().
import re
apk_list = []
for line in lines:
apk_dict = {}
match = re.match("package: name='(\S+)' versionCode='(\d+)' versionName='(\S+)'", line)
if not match:
continue
apk_dict['name'] = match.group(1)
apk_dict['versionCode'] = match.group(2)
apk_dict['versionName'] = match.group(3)
apk_list.append(apk_dict)
print(apk_list)
apk_list will then have:
[
{
'versionName': 'one',
'versionCode': '1',
'name': '/path/to/apk1/history/v1/xxx.apk'
},
{
'versionName': 'three',
'versionCode': '3',
'name': '/path/to/apk/v2/xxx.apk'
}
]
The second line got dropped for reasons I explain below.
Some things to notice:
You are trying to use .compile() on your regular expression. This does not do what you think it does. To match the expression with a string use .match(). Compile creates a kind of 'expression object' which you can use instead of an expression.
Next your (\d+) will not work for things like '1.3' as there is a non digit character inbetween the numbers. Something like (\S+)
You also have to handle failed matching attempts. I simply skip the line with continue.
Since you just want to get the path to an apk with the max versionCode using dictionaries would be an overkill.
Use two variables: path and version. Open the first apk, save apkPath in path and versionCode in version. Go to the next apk. If its versionCode is higher than version, set path to apkPath. Go to the next, repeat until you processed all apks. After that you'll have in path the path to the apk with the highest versionCode.
PS. And to add something to a list use "append()":
new_list = old_list.append(item)
If I understand your question correctly, you could do a list comprehension as such:
list_of_dicts = [extract_all_the_info(line) for line in source]
where extract_all_the_info() does the regular expression that you do above, and return a dict.

Incremental Saves

I am trying to write up a script on incremental saves but there are a few hiccups that I am running into.
If the file name is "aaa.ma", I will get the following error - ValueError: invalid literal for int() with base 10: 'aaa' # and it does not happens if my file is named "aaa_0001"
And this happens if I wrote my code in this format: Link
As such, to rectify the above problem, I input in an if..else.. statement - Link, it seems to have resolved the issue on hand, but I was wondering if there is a better approach to this?
Any advice will be greatly appreciated!
Use regexes for better flexibility especially for file rename scripts like these.
In your case, since you know that the expected filename format is "some_file_name_<increment_number>", you can use regexes to do the searching and matching for you. The reason we should do this is because people/users may are not machines, and may not stick to the exact naming conventions that our scripts expect. For example, the user may name the file aaa_01.ma or even aaa001.ma instead of aaa_0001 that your script currently expects. To build this flexibility into your script, you can use regexes. For your use case, you could do:
# name = lastIncFile.partition(".")[0] # Use os.path.split instead
name, ext = os.path.splitext(lastIncFile)
import re
match_object = re.search("([a-zA-Z]*)_*([0-9]*)$", name)
# Here ([a-zA-Z]*) would be group(1) and would have "aaa" for ex.
# and ([0-9]*) would be group(2) and would have "0001" for ex.
# _* indicates that there may be an _, or not.
# The $ indicates that ([0-9]*) would be the LAST part of the name.
padding = 4 # Try and parameterize as many components as possible for easy maintenance
default_starting = 1
verName = str(default_starting).zfill(padding) # Default verName
if match_object: # True if the version string was found
name = match_object.group(1)
version_component = match_object.group(2)
if version_component:
verName = str(int(version_component) + 1).zfill(padding)
newFileName = "%s_%s.%s" % (name, verName, ext)
incSaveFilePath = os.path.join(curFileDir, newFileName)
Check out this nice tutorial on Python regexes to get an idea what is going on in the above block. Feel free to tweak, evolve and build the regex based on your use cases, tests and needs.
Extra tips:
Call cmds.file(renameToSave=True) at the beginning of the script. This will ensure that the file does not get saved over itself accidentally, and forces the script/user to rename the current file. Just a safety measure.
If you want to go a little fancy with your regex expression and make them more readable, you could try doing this:
match_object = re.search("(?P<name>[a-zA-Z]*)_*(?P<version>[0-9]*)$", name)
name = match_object.group('name')
version_component = match_object('version')
Here we use the ?P<var_name>... syntax to assign a dict key name to the matching group. Makes for better readability when you access it - mo.group('version') is much more clearer than mo.group(2).
Make sure to go through the official docs too.
Save using Maya's commands. This will ensure Maya does all it's checks while and before saving:
cmds.file(rename=incSaveFilePath)
cmds.file(save=True)
Update-2:
If you want space to be checked here's an updated regex:
match_object = re.search("(?P<name>[a-zA-Z]*)[_ ]*(?P<version>[0-9]*)$", name)
Here [_ ]* will check for 0 - many occurrences of _ or (space). For more regex stuff, trying and learn on your own is the best way. Check out the links on this post.
Hope this helps.

python-iptables how to specify multi argument matches

How do I specify multi-argument matches with python-iptables?
For example, the following iptables command:
-A INPUT -s 1.1.1.1 -p tcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -j DROP
If I create the following:
import iptc
rule = iptc.Rule()
rule.src = '1.1.1.1'
rule.protocol = 'tcp'
t = rule.create_target('DROP')
m = rule.create_match('tcp')
m.tcp_flags = 'FIN,SYN,RST,ACK SYN'
it will complain:
ValueError: invalid value FIN,SYN,RST,ACK SYN
PS: I know that for my particular example, I can simply use m.syn = '1', but I'm trying to generalize on how to specify multi-argument matches.
Are you using the latest version? See this issue.
Okay... someone tried to post an answer, but he/she deleted it when I was commenting on it.
The answer attempt was:
m.tcp_flags = ['FIN', 'SYN', 'RST', 'ACK SYN']
which gave the wrong result:
print m.parameters
{u'tcp_flags': u'FIN SYN'}
However, that inspired me to try the following:
m.tcp_flags = ['FIN,SYN,RST,ACK', 'SYN']
which gives:
>>> match.parameters
{u'tcp_flags': u'FIN,SYN,RST,ACK SYN'}
Committing that rule into the INPUT chain and running iptables-save shows that it properly returns the rule I want.
So, thank you!

Categories

Resources