is it possible to create custom AWS CLI commands? - python

Not sure this is the right place to ask but I'll go ahead anyway.
I want to modify the aws s3 sync command so that it does not override newer files on the destination. Currently it compares the file size and timestamp and if either are different then the file at the source is copied to the destination.
Having a look at the AWS CLI source code it seems possible to simply modify the python code here
class SizeAndLastModifiedSync(BaseSync):
def determine_should_sync(self, src_file, dest_file):
same_size = self.compare_size(src_file, dest_file)
same_last_modified_time = self.compare_time(src_file, dest_file)
should_sync = (not same_size) or (not same_last_modified_time)
if should_sync:
LOG.debug(
"syncing: %s -> %s, size: %s -> %s, modified time: %s -> %s",
src_file.src, src_file.dest,
src_file.size, dest_file.size,
src_file.last_update, dest_file.last_update)
return should_sync
But not being much of a python expert I am not sure there are other considerations that would prevent these modifications from working.
It seems all I would need to do is add a check to see if the dest_file timestamp is greater than the src_file timestamp - after resolving things like timezones etc.
Any pointers on the best approach would be appreciated.

Related

How to Generate Python Bindings for Thrift in Bazel

I'm attempting to generate Python bindings for a Thrift service definition using Bazel. As far as I've been able to tell, there is no existing .bzl for doing this so I'm somewhat on my own here. I've written .bzl rules in the past but the situation I'm running into in this case is different.
The general issue is that I don't know the names of the output files from the thrift command before the build starts which means that I can't generate a py_library rule with a srcs attribute set correctly since I don't have the names of the files. I've tried to follow examples whereby the output files are known ahead of time by way of generating a .zip file, but the py_library rule only allows .py files as srcs so this doesn't work.
The only thing I can think of would be to use a repository_rule to generate the code and BUILD files but what I'm trying to accomplish doesn't seem like much of a stretch and should be supported.
Someone has attempted this before.
Discussion here:
https://groups.google.com/forum/#!topic/bazel-dev/g3DVmhVhNZs
Code Here:
https://github.com/wt/bazel_thrift
I would start there.
Edit:
I started there. I did not get as far as I hoped. Bazel is being extended to support having multiple outputs generated by one input, but it does not allow that very easily just yet, per:
groups.google.com/forum/#!topic/bazel-discuss/3WQhHm194yU
Regardless, I did attempt something for C++ thrift bindings, which have the same issue. The Java example got around this by using the source jar as a build source, which won't work for us. To make it work, I passed in the list of source files I cared about that would be created by the thrift generator. I then reported these files as the output that would be generated in the impl. That seems to work. It is a bit nasty in that you have to know what files you are looking for before you build, but it does work. It would also be possible to have a small program read the thift file and determine the output files it would make. That would be nicer, but I don't have the time. Plus, the current approach is nice, in that it explicitly defines what files you are looking for thrift to generate, which makes the BUILD file a little bit easier to understand for a newbie like me.
First pass at some code, maybe I will clean it up and submit it as a patch (maybe not):
###########
# CPP gen
###########
# Create Generated cpp source files from thrift idl files.
#
def _gen_thrift_cc_src_impl(ctx):
out = ctx.outputs.outs
if not out:
# empty set
# nothing to do, no inputs to build
return DefaultInfo(files=depset(out))
# Used dir(out[0]) to see what
# we had available in the object.
# dirname attribute tells us the directory
# we should be putting stuff in, works nicely.
# ctx.genfile_dir is not the output directory
# when called as an external repository
target_genfiles_root = out[0].dirname
thrift_includes_root = "/".join(
[ target_genfiles_root, "thrift_includes"])
gen_cpp_dir = "/".join([target_genfiles_root,"." ])
commands = []
commands.append(_mkdir_command_string(target_genfiles_root))
commands.append(_mkdir_command_string(thrift_includes_root))
thrift_lib_archive_files = ctx.attr.thrift_library._transitive_archive_files
for f in thrift_lib_archive_files:
commands.append(
_tar_extract_command_string(f.path, thrift_includes_root))
commands.append(_mkdir_command_string(gen_cpp_dir))
thrift_lib_srcs = ctx.attr.thrift_library.srcs
for src in thrift_lib_srcs:
commands.append(_thrift_cc_compile_command_string(
thrift_includes_root, gen_cpp_dir, src))
inputs = (
list(thrift_lib_archive_files) + thrift_lib_srcs )
ctx.action(
inputs = inputs,
outputs = out,
progress_message = "Generating CPP sources from thift archive %s" % target_genfiles_root,
command = " && ".join(commands),
)
return DefaultInfo(files=depset(out))
thrift_cc_gen_src= rule(
_gen_thrift_cc_src_impl,
attrs = {
"thrift_library": attr.label(
mandatory=True, providers=['srcs', '_transitive_archive_files']),
"outs" : attr.output_list(mandatory=True, non_empty=True),
},output_to_genfiles = True,
)
# wraps cc_library to generate a library from one or more .thrift files
# provided as a thrift_library bundle.
#
# Generates all src and hdr files needed, but you must specify the expected
# files. This is a bug in bazel: https://groups.google.com/forum/#!topic/bazel-discuss/3WQhHm194yU
#
# Instead of src and hdrs, requires: cpp_srcs and cpp_hdrs. These are required.
#
# Takes:
# name: The library name, like cc_library
#
# thrift_library: The library of source .thrift files from which our
# code will be built from.
#
# cpp_srcs: The expected source that will be generated and built. Passed to
# cc_library as src.
#
# cpp_hdrs: The expected header files that will be generated. Passed to
# cc_library as hdrs.
#
# Rest of options are documented in native.cc_library
#
def thrift_cc_library(name, thrift_library,
cpp_srcs=[],cpp_hdrs=[],
build_skeletons=False,
deps=[], alwayslink=0, copts=[],
defines=[], include_prefix=None,
includes=[], linkopts=[],
linkstatic=0, nocopts=None,
strip_include_prefix=None,
textual_hdrs=[],
visibility=None):
# from our thrift_library tarball source bundle,
# create a generated cpp source directory.
outs = []
for src in cpp_srcs:
outs.append("//:"+src)
for hdr in cpp_hdrs:
outs.append("//:"+hdr)
thrift_cc_gen_src(
name = name + 'cc_gen_src',
thrift_library = thrift_library,
outs = outs,
)
# Then make the library for the given name.
native.cc_library(
name = name,
deps = deps,
srcs = cpp_srcs,
hdrs = cpp_hdrs,
alwayslink = alwayslink,
copts = copts,
defines=defines,
include_prefix=include_prefix,
includes=includes,
linkopts=linkopts,
linkstatic=linkstatic,
nocopts=nocopts,
strip_include_prefix=strip_include_prefix,
textual_hdrs=textual_hdrs,
visibility=visibility,
)

Seeming discrepancy in shutil.disk_usage()

I am using the shutil.disk_usage() function to find the current disk usage of a particular path (amount available, used, etc.). As far as I can find, this is a wrapper around os.statvfs() calls. I'm finding that it is not giving the answers I'd expect, as comparing to the output of "du" in Linux.
I have obscured some of the paths below for company privacy reasons, but the output and code are otherwise undoctored. I am using Python 3.3.2 64-bit version.
#!/apps/python/3.3.2_64bit/bin/python3
# test of shutils.diskusage module
import shutil
BytesPerGB = 1024 * 1024 * 1024
(total, used, free) = shutil.disk_usage("/data/foo/")
print ("Total: %.2fGB" % (float(total)/BytesPerGB))
print ("Used: %.2fGB" % (float(used)/BytesPerGB))
(total1, used1, free1) = shutil.disk_usage("/data/foo/utils/")
print ("Total: %.2fGB" % (float(total1)/BytesPerGB))
print ("Used: %.2fGB" % (float(used1)/BytesPerGB))
Which outputs:
/data/foo/drivecode/me % disk_usage_test.py
Total: 609.60GB
Used: 291.58GB
Total: 609.60GB
Used: 291.58GB
As you can see, the main problem is I would expect the second amount for "Used" to be much smaller, as it is a subset of the first directory.
/data/foo/drivecode/me % du -sh /data/foo/utils
2.0G /data/foo/utils
As much as I trust "du," I find it hard to believe the Python module would be incorrect either. So perhaps it is just my understanding of Linux filesystems that could be the issue. :)
I wrote a module (based heavily on someone's code here at SO) which recursively gets the disk_usage, which I was using until now. It appears to match the "du" output but is MUCH, much slower than the shutil.disk_usage() function, so I'm hoping I can make that one work.
Thanks much in advance.
The problem is that shutil uses the statvfs system call underneath to determine the space used. This system call has no file-path granularity as far as I'm aware, only file-system granularity. What this means is that the path you provide it with only helps to identify the file system you want to query, not the path's.
In other words, you gave it the path /data/foo/utils and then it determined which file system backs this file path. Then it queried the file system. This becomes apparent when you consider how the used parameter is defined in shutil:
used = (st.f_blocks - st.f_bfree) * st.f_frsize
Where:
fsblkcnt_t f_blocks; /* size of fs in f_frsize units */
fsblkcnt_t f_bfree; /* # free blocks */
unsigned long f_frsize; /* fragment size */
This is why it's giving you the total space used on the entire file system.
Indeed, it seems to me like the du command itself also traverses the file structure and adds up the file sizes. Here is GNU coreutils du command's source code.
The shutil.disk_usage returns the disk usage (i.e. the mount point which backs the path) and not actual file usage under that path. It is equivalent of running df /path/to/mount and not du /path/to/files. Notice that for both directories you got the exact same usage.
From the docs: "Return disk usage statistics about the given path as a named tuple with the attributes total, used and free, which are the amount of total, used and free space, in bytes."
Update for anyone stumbling upon this after 2013:
Depending on your Python version and OS, shutil.disk_usage might support files and directories for the path variable. Here's the breakdown:
Windows:
3.3 - 3.5: only suports mountpoint/filesystem
3.6 - 3.7: directory support
3.8+: file & directory support
Unix:
3.3 - 3.5: only suports mountpoint/filesystem
3.6+: file & directory support

How do you escape a dash in Jython/Websphere?

I have a Jython script that is used to set up a JDBC datasource on a Websphere 7.0 server. I need to set several properties on that datasource. I am using this code, which works, unless value is '-'.
def setCustomProperty(datasource, name, value):
parms = ['-propertyName', name, '-propertyValue', value]
AdminTask.setResourceProperty(datasource, parms)
I need to set the dateSeparator property on my datasource to just that - a dash. When I run this script with setCustomProperty(ds, 'dateSeparator', '-') I get an exception that says, "Invalid property: ". I figured out that it thinks that the dash means that another parameter/argument pair is expected.
Is there any way to get AdminTask to accept a dash?
NOTE: I can't set it via AdminConfig because I cannot find a way to get the id of the right property (I have multiple datasources).
Here is a solution that uses AdminConfig so that you can set the property value to the dash -. The solution accounts for multiple data sources, finding the correct one by specifying the appropriate scope (i.e. the server, but this could be modified if your datasource exists within a different scope) and then finding the datasource by name. The solution also accounts for modifying the existing "dateSeparator" property if it exists, or it creates it if it doesn't.
The code doesn't look terribly elegant, but I think it should solve your problem :
def setDataSourceProperty(cell, node, server, ds, propName, propVal) :
scopes = AdminConfig.getid("/Cell:%s/Node:%s/Server:%s/" % (cell, node, server)).splitlines()
datasources = AdminConfig.list("DataSource", scopes[0]).splitlines()
for datasource in datasources :
if AdminConfig.showAttribute(datasource, "name") == ds :
propertySet = AdminConfig.list("J2EEResourcePropertySet", datasource).splitlines()
customProp = [["name", propName], ["value", propVal]]
for property in AdminConfig.list("J2EEResourceProperty", propertySet[0]).splitlines() :
if AdminConfig.showAttribute(property, "name") == propName :
AdminConfig.modify(property, customProp)
return
AdminConfig.create("J2EEResourceProperty", propertySet[0], customProp)
if (__name__ == "__main__"):
setDataSourceProperty("myCell01", "myNode01", "myServer", "myDataSource", "dateSeparator", "-")
AdminConfig.save()
Please see the Management Console preferences settings. You can do what you are attempting now and you should get to see the Jython equivalent that the Management Console is creating for its own use. Then just copy it.
#Schemetrical solution worked for me. Just giving another example with jvm args.
Not commenting on the actual answer because I don't have enough reputation.
server_name = 'server1'
AdminTask.setGenericJVMArguments('[ -serverName %s -genericJvmArguments "-agentlib:getClasses" ]' % (server_name))
Try using a String instead of an array to pass the parameters using double quotes to surround the values starting with a dash sign
Example:
AdminTask.setVariable('-variableName JDK_PARAMS -variableValue "-Xlp -Xscm250M" -variableDescription "-Yes -I -can -now -use -dashes -everywhere :-)" -scope Cell=MyCell')

Are there any problems with this symlink traversal code for Windows?

In my efforts to resolve Python issue 1578269, I've been working on trying to resolve the target of a symlink in a robust way. I started by using GetFinalPathNameByHandle as recommended here on stackoverflow and by Microsoft, but it turns out that technique fails when the target is in use (such as with pagefile.sys).
So, I've written a new routine to accomplish this using CreateFile and DeviceIoControl (as it appears this is what Explorer does). The relevant code from jaraco.windows.filesystem is included below.
The question is, is there a better technique for reliably resolving symlinks in Windows? Can you identify any issues with this implementation?
def relpath(path, start=os.path.curdir):
"""
Like os.path.relpath, but actually honors the start path
if supplied. See http://bugs.python.org/issue7195
"""
return os.path.normpath(os.path.join(start, path))
def trace_symlink_target(link):
"""
Given a file that is known to be a symlink, trace it to its ultimate
target.
Raises TargetNotPresent when the target cannot be determined.
Raises ValueError when the specified link is not a symlink.
"""
if not is_symlink(link):
raise ValueError("link must point to a symlink on the system")
while is_symlink(link):
orig = os.path.dirname(link)
link = _trace_symlink_immediate_target(link)
link = relpath(link, orig)
return link
def _trace_symlink_immediate_target(link):
handle = CreateFile(
link,
0,
FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE,
None,
OPEN_EXISTING,
FILE_FLAG_OPEN_REPARSE_POINT|FILE_FLAG_BACKUP_SEMANTICS,
None,
)
res = DeviceIoControl(handle, FSCTL_GET_REPARSE_POINT, None, 10240)
bytes = create_string_buffer(res)
p_rdb = cast(bytes, POINTER(REPARSE_DATA_BUFFER))
rdb = p_rdb.contents
if not rdb.tag == IO_REPARSE_TAG_SYMLINK:
raise RuntimeError("Expected IO_REPARSE_TAG_SYMLINK, but got %d" % rdb.tag)
return rdb.get_print_name()
Unfortunately I can't test with Vista until next week, but GetFinalPathNameByHandle should work, even for files in use - what's the problem you noticed?
In your code above, you forget to close the file handle.

trac-past-commit-hook on remote repository

Trying to set up the svn commit with trac using this script.
It is being called without issue, but the problem is this line here:
144 repos = self.env.get_repository()
Because I am calling this remotely self.env_get_repository() looks for the repository using the server drive and not the local drive mapping. That is, it is looking for E:/Projects/svn/InfoProj and not Y:/Projects/sv/InfoProj
I noticed a changeset on the trac set for being able to call get_repository() and passing in the path as the variable, but it seems this hasn't made it into the latest stable release yet.
This version of the script (the one submitted by code monkey) appears to do things differently, but is throwing an error that seems related:
154 if url is None:
155 url = self.env.config.get('project', 'url')
156 self.env.href = Href(url)
157 self.env.abs_href = Href(url)
Lines 156 / 157 throw error: Warning: TypeError: 'str' object is not callable
The 10.3 stable version of the script throws a completely different error:
Warning: NameError: global name 'core' is not defined
I'm setting up trac for the first time on a Windows box with a remote repository. I'm using trac 0.11 stable with Python 2.6.
I thought there would have been a lot more people out there trying to commit across servers who had come across this problem. I've looked around and couldn't find a solution. I'm supposing Linux has a more graceful way of handling this.
Thanks in advance.
This is totally do-able and just requires a couple of small hacks... woo hoo!
The problem I was having is that get_repository reads the value of the svn repository from the trac.ini file. This was pointing at E:/ and not at Y:/. The simple fix involves a check to see if the repository is at repository_dir and if not, then check at a new variable remote_repository_dir. The second part of the fix involves removing the error message from cache.py that checks to see if the current repository address matches the one being passed in.
As always, use this at your own risk and back everything up before hand!!!
First open you trac.ini file and add a new variable 'remote_repository_dir' underneath the 'repository_dir' variable. Remote repository dir will point to the mapped drive on your local machine. It should now look something like this:
repository_dir = E:/Projects/svn/InfoProj
remote_repository_dir = Y:/Projects/svn/InfoProj
Next we will modify the api.py file to check for the new variable if it can't find the repository at the repository_dir location. Around :71 you should have something like this:
repository_dir = Option('trac', 'repository_dir', '',
"""Path to local repository. This can also be a relative path
(''since 0.11'').""")
Underneath this line add:
remote_repository_dir = Option('trac', 'remote_repository_dir', '',
"""Path to remote repository.""")
Next near :156 you will have this:
rtype, rdir = self.repository_type, self.repository_dir
if not os.path.isabs(rdir):
rdir = os.path.join(self.env.path, rdir)
Change that to this:
rtype, rdir = self.repository_type, self.repository_dir
if not os.path.isdir(rdir):
rdir = self.remote_repository_dir
if not os.path.isabs(rdir):
rdir = os.path.join(self.env.path, rdir)
Finally you will need to remove the alert in the cache.py file (note this is not the best way to do this, you should be able to include the remote variable as part of the check, but for now it works).
In cache.py near :97 it should look like this:
if repository_dir:
# directory part of the repo name can vary on case insensitive fs
if os.path.normcase(repository_dir) != os.path.normcase(self.name):
self.log.info("'repository_dir' has changed from %r to %r"
% (repository_dir, self.name))
raise TracError(_("The 'repository_dir' has changed, a "
"'trac-admin resync' operation is needed."))
elif repository_dir is None: #
self.log.info('Storing initial "repository_dir": %s' % self.name)
cursor.execute("INSERT INTO system (name,value) VALUES (%s,%s)",
(CACHE_REPOSITORY_DIR, self.name,))
else: # 'repository_dir' cleared by a resync
self.log.info('Resetting "repository_dir": %s' % self.name)
cursor.execute("UPDATE system SET value=%s WHERE name=%s",
(self.name, CACHE_REPOSITORY_DIR))
We are going to remove the first part of the if statement so it now should look like this:
if repository_dir is None: #
self.log.info('Storing initial "repository_dir": %s' % self.name)
cursor.execute("INSERT INTO system (name,value) VALUES (%s,%s)",
(CACHE_REPOSITORY_DIR, self.name,))
else: # 'repository_dir' cleared by a resync
self.log.info('Resetting "repository_dir": %s' % self.name)
cursor.execute("UPDATE system SET value=%s WHERE name=%s",
(self.name, CACHE_REPOSITORY_DIR))
Warning! Doing this will mean that it no longer gives you an error if your directory has changed and you need a resync.
Hope this helps someone.

Categories

Resources