How to see changes for a mercurial file context?

How to see changes for a mercurial file context? - python

I'm currently trying to write a script that will find all the files changed given a certain # in the task description, and I have gotten the script to work for that. But now I'm trying to sort it by whether the file was added, modified or removed. I've looked through the Mercurial API, but I can't find anything that can do what I want.
My code currently uses repo[revnum].description() and parses that to find which ones contain the #, and if they do, add the file context to a list.
This works fine and I can print a list of files, but I can't find a method to see what was done with each context. Can anyone help me out here, or point me to some better documentation?

Do you need to work with the Mercurial API? It is possible to do what you need by working with the output of hg log

In general, you should avoid writing scripts directly using the Mercurial API. It is better to write your scripts to use the CLI or perhaps even use hglib. As stated on the MercurialApi wiki:
For the vast majority of third party code, the best approach is to use
Mercurial's published, documented, and stable API: the command line
interface.
That being said, if you really need to use the API, you can use repo.status() to find the info you asked about:
modified, added, removed, deleted, unknown, ignored, clean = repo.status(revnum-1, revnum)

I ended up using something similar to what Tim said, although I did still use the API.
I imported commands from mercurial, and then called commands.status(repo.ui, repo, change=revnum)
I captured the output of this, using repo.ui.pushbuffer() and repo.ui.popbuffer() which was in the form
A file_path1
R file_path2
R file_path3
A file_path4
M file_path5
I parsed this input and sorted it into Add, remove, modify, etc..

Related

How to customize Flyway so that it can handle CSV files as input as well?

Has someone implemented the CSV-handling for Flyway? It was requested some time ago (Flyway specific migration with csv files). Flyway comments it now as a possibility for the MigrationResolver and MigrationExecutor, but it does not seem to be implemented.
I've tried to do it myself with Flyway 4.2, but I'm not very good with java. I got as far as creating my own jar using the sample and make it accessible to flyway. But how does flyway distinguish when to use the SqlMigrator and when to use my CsvMigrator? I thought I have to register my own prefix/suffix (as the question above writes), but FlywayConfiguration seems to be read-only, at least I did not see any API calls for doing this :(.
How to connect the different Resolvers to the different migration file types? (.sql to the migration using Sql and .csv/.py to the loading of Csv and executing python scripts)

After some shed of tears and blood, it looks like came up with something on this. I can't make the whole code available because it is using proprietary file format, but here's the main ideas:
implement the ConfigurationAware as well, and use the setFlywayConfiguration implementation to catalog the extra files you want to handle (i.e. .csv). This is executed only once during the run.
during this cataloging I could not use the scanner or LoadableResources, there's some Java magic I do not understand. All the classes and methods seem to be available and accessible, even when using .getMethods() runtime... but when trying to actually call them during a run it throws java.lang.NoSuchMethodError and java.lang.NoClassDefFoundError. I've wasted a whole day on this - don't do that, just copy-paste the code from org.flywaydb.core.internal.util.scanner.filesystem.FileSystemScanner.
use Set< String > instead of LoadableResources[], way easier to work with, especially since there's no access to LoadableResources anyway and working with [] was a nightmare.
the python/shell call will go to the execute(). Some tips:
any exception or fawlty exitcode needs to be translated to an SQLException.
the build is enforcing Java 1.6, so new ProcessBuilder(cmd).inheritIO() cannot be used. Look at these solutions: ProcessBuilder: Forwarding stdout and stderr of started processes without blocking the main thread if you want to print the STDOUT/STDERR.
to compile flyway including your custom module, clone the whole flyway repo from git, edit the main pom.xml to include your module as well and use this command to compile: "mvn install -P-CommercialDBTest -P-CommandlinePlatformAssemblies -DskipTests=true" (I found this in another stackoverflow question.)
what I haven't done yet is the checksum part, I don't know yet what that wants.

How to merge the display of logs from several Mercurial repositories

Is there a way to merge the change logs from several different Mercurial repositories? By "merge" here I just mean integrate into a single display; this is nothing to do with merging in the source control sense.
In other words, I want to run hg log on several different repositories at once. The entries should be sorted by date regardless of which repository they're from, but be limited to the last n days (configurable), and should include entries from all branches of all the repositories. It would also be nice to filter by author and do this in a graphical client like TortoiseHg. Does anyone know of an existing tool or script that would do this? Or, failing that, a good way to access the log entries programmically? (Mercurial is written in Python, which would be ideal, but I can't find any information on a simple API for this.)
Background: We are gradually beginning to transition from SVN to Mercurial. The old repository was not just monolithic in the sense of one server, but also in the sense that there was one huge repository for all projects (albeit with a sensible directory structure). Our new Mercurial repositories are more focused! In general, this works much better, but we miss one useful feature from SVN: being able to use svn log at the root of the repository to see everything we have been working on recently. It's very useful for filling in timesheets, giving yourself a sense of purpose, etc.

I figured out a way of doing this myself. In short, I merge all the revisions into one mega-repo, and I can then look at this in TortoiseHG. Of course, it's a total mess, but it's good enough to get a summary of what happened recently.
I do this in three steps:
(Optional) Run hg convert on each source repository using the branchmap feature to rename each branch from original to reponame/original. This makes it easier later to identify which revision came from which source repository. (More faithful to SVN would be to use the filemap feature instead.)
On a new repository, run hg pull -f to force-pull from the individual repositories into a one big one. This gets all the revisions in one place, but they show up in the wrong order.
Use the method described in this answer to create yet another repository that contains all the changes from the one created in step 2 but sorted into the right order. (Actually I use a slight variant: I get the hashes and compare against the hashes in the destination, check that the destination has a prefix of the source's, and only copy the new ones across.)
This is all done from a Python script, but although Mercurial is written in Python I just use the command line interface using the subprocess module. Running through the three steps only copies the new revisions without rebuilding everything from scratch, unless you add a new repo.

Dangerous Python Keywords?

I am about to get a bunch of python scripts from an untrusted source.
I'd like to be sure that no part of the code can hurt my system, meaning:
(1) the code is not allowed to import ANY MODULE
(2) the code is not allowed to read or write any data, connect to the network etc
(the purpose of each script is to loop through a list, compute some data from input given to it and return the computed value)
before I execute such code, I'd like to have a script 'examine' it and make sure that there's nothing dangerous there that could hurt my system.
I thought of using the following approach: check that the word 'import' is not used (so we are guaranteed that no modules are imported)
yet, it would still be possible for the user (if desired) to write code to read/write files etc (say, using open).
Then here comes the question:
(1) where can I get a 'global' list of python methods (like open)?
(2) Is there some code that I could add to each script that is sent to me (at the top) that would make some 'global' methods invalid for that script (for example, any use of the keyword open would lead to an exception)?
I know that there are some solutions of python sandboxing. but please try to answer this question as I feel this is the more relevant approach for my needs.
EDIT: suppose that I make sure that no import is in the file, and that no possible hurtful methods (such as open, eval, etc) are in it. can I conclude that the file is SAFE? (can you think of any other 'dangerous' ways that built-in methods can be run?)

This point hasn't been made yet, and should be:
You are not going to be able to secure arbitrary Python code.
A VM is the way to go unless you want security issues up the wazoo.

You can still obfuscate import without using eval:
s = '__imp'
s += 'ort__'
f = globals()['__builtins__'].__dict__[s]
** BOOM **

Built-in functions.
Keywords.
Note that you'll need to do things like look for both "file" and "open", as both can open files.
Also, as others have noted, this isn't 100% certain to stop someone determined to insert malacious code.

An approach that should work better than string matching us to use module ast, parse the python code, do your whitelist filtering on the tree (e.g. allow only basic operations), then compile and run the tree.
See this nice example by Andrew Dalke on manipulating ASTs.

built in functions/keywords:
eval
exec
__import__
open
file
input
execfile
print can be dangerous if you have one of those dumb shells that execute code on seeing certain output
stdin
__builtins__
globals() and locals() must be blocked otherwise they can be used to bypass your rules
There's probably tons of others that I didn't think about.
Unfortunately, crap like this is possible...
object().__reduce__()[0].__globals__["__builtins__"]["eval"]("open('/tmp/l0l0l0l0l0l0l','w').write('pwnd')")
So it turns out keywords, import restrictions, and in-scope by default symbols alone are not enough to cover, you need to verify the entire graph...

Use a Virtual Machine instead of running it on a system that you are concerned about.

Without a sandboxed environment, it is impossible to prevent a Python file from doing harm to your system aside from not running it.
It is easy to create a Cryptominer, delete/encrypt/overwrite files, run shell commands, and do general harm to your system.
If you are on Linux, you should be able to use docker to sandbox your code.
For more information, see this GitHub issue: https://github.com/raxod502/python-in-a-box/issues/2.
I did come across this on GitHub, so something like it could be used, but that has a lot of limits.
Another approach would be to create another Python file which parses the original one, removes the bad code, and runs the file. However, that would still be hit-and-miss.

Create a user-group in linux using python

I want to create a user group using python on CentOS system. When I say 'using python' I mean I don't want to do something like os.system and give the unix command to create a new group. I would like to know if there is any python module that deals with this.
Searching on the net did not reveal much about what I want, except for python user groups.. so I had to ask this.
I learned about the grp module by searching here on SO, but couldn't find anything about creating a group.
EDIT: I dont know if I have to start a new question for this, but I would also like to know how to add (existing) users to the newly created group.
Any help appreciated.
Thank you.

I don't know of a python module to do it, but the /etc/group and /etc/gshadow format is pretty standard, so if you wanted you could just open the files, parse their current contents and then add the new group if necessary.
Before you go doing this, consider:
What happens if you try to add a group that already exists on the system
What happens when multiple instances of your program try to add a group at the same time
What happens to your code when an incompatible change is made to the group format a couple releases down the line
NIS, LDAP, Kerberos, ...
If you're not willing to deal with these kinds of problems, just use the subprocess module and run groupadd. It will be way less likely to break your customers machines.
Another thing you could do that would be less fragile than writing your own would be to wrap the code in groupadd.c (in the shadow package) in Python and do it that way. I don't see this buying you much versus just exec'ing it, though, and it would add more complexity and fragility to your build.

I think you should use the commandline programs from your program, a lot of care has gone into making sure that they don't break the groups file if something goes wrong.
However the file format is quite straight forward to write something yourself if you choose to go that way

There are no library calls for creating a group. This is because there's really no such thing as creating a group. A GID is simply a number assigned to a process or a file. All these numbers exist already - there is nothing you need to do to start using a GID. With the appropriate privileges, you can call chown(2) to set the GID of a file to any number, or setgid(2) to set the GID of the current process (there's a little more to it than that, with effective IDs, supplementary IDs, etc).
Giving a name to a GID is done by an entry in /etc/group on basic Unix/Linux/POSIX systems, but that's really just a convention adhered to by the Unix/Linux/POSIX userland tools. Other network-based directories also exist, as mentioned by Jack Lloyd.
The man page group(5) describes the format of the /etc/group file, but it is not recommended that you write to it directly. Your distribution will have policies on how unnamed GIDs are allocated, such as reserving certain spaces for different purposes (fixed system groups, dynamic system groups, user groups, etc). The range of these number spaces differs on different distributions. These policies are usually encoded in the command-line tools that a sysadmin uses to assign unnamed GIDs.
This means the best way to add a group locally is to use the command-line tools.

If you are looking at Python, then try this program. Its fairly simple to use, and the code can easily be customized http://aleph-null.tv/downloads/mpb-adduser-1.tgz

Bash alias to Python script -- is it possible?

The particular alias I'm looking to "class up" into a Python script happens to be one that makes use of the cUrl -o (output to file) option. I suppose I could as easily turn it into a BASH function, but someone advised me that I could avoid the quirks and pitfalls of the different versions and "flavors" of BASH by taking my ideas and making them Python scripts.
Coincident with this idea is another notion I had to make a feature of legacy Mac OS (officially known as "OS 9" or "Classic") pertaining to downloads platform-independent: writing the URL to some part of the file visible from one's file navigator {Konqueror, Dolphin, Nautilus, Finder or Explorer}. I know that only a scant few file types support this kind of thing using some other command-line tools (exiv2, wrjpgcom, etc). Which is perfectly fine with me as I only use this alias to download single-page image files such as JPEGs anyways.
I reckon I might as well take full advantage of the power of Python by having the script pass the string which is the source URL of the download (entered by the user and used first by cUrl) to something like exiv2 which could write it to the Comment block, EXIF User Comment block, and (taking as a first and worst example) Windows XP's File Description field. Starting small is sometimes a good way to start.
Hope someone has advice or suggestions.
BZT

The relevant section from the Bash manual states:
Aliases allow a string to be
substituted for a word when it is used
as the first word of a simple command.
So, there should be nothing preventing you from doing e.g.
$ alias geturl="python /some/cool/script.py"
Then you could use it like any other shell command:
$ geturl http://example.com/excitingstuff.jpg
And this would simply call your Python program.

I thought Pycurl might be the answer. Ahh Daniel Sternberg and his innocent presumptions that everybody knows what he does. I asked on the list whether or not pycurl had a "curl -o" analogue, and then asked 'If so: How would one go about coding it/them in a Python script?' His reply was the following:
"curl.setopt(pycurl.WRITEDATA, fp)
possibly combined with:
curl.setopt(pycurl.WRITEFUNCITON, callback) "
...along with Sourceforge links to two revisions of retriever.py. I can barely recall where easy_install put the one I've got; how am I supposed to compare them?
It's pretty apparent this gentleman never had a helpdesk or phone tech support job in the Western Hemisphere, where you have to assume the 'customer' just learned how to use their comb yesterday and be prepared to walk them through everything and anything. One-liners (or three-liners with abstruse links as chasers) don't do it for me.
BZT

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.