I'm kind of new to python, but something I find myself doing in bash a lot is prepending and appending strings to filenames with parameter expansion.
e.g.
for file in *.txt ; do mkdir ${file%.*} ; mv $file ${file%.*}/ ; done
Would be an example for stripping off the extension of a load of files, making directories based on those names, and then moving the files inside their namesake folders now.
If I want to achieve a similar thing, such as rename the output of a function based on the input file name (below is an example of a Biopython function), I've seen a few ways to do it with string concatenation etc, but without bracketing and so on, it looks confusing and like it might create parsing errors with spaces, quotes and so on being all over the place potentially.
SeqIO.convert(genbank, 'genbank', genbank[:-3]+'tmp', 'fasta')
There are other threads on here about using rsplit, string concatenation and so on, but is one of these more 'correct' than another?
String concatenation is really nice and works great in simple commands like print(), but when adding to commands that are expecting separated values, it strikes me as a little messy?
You can use os.path.splitext which is build especially for file names:
>>> import os
>>>
>>> fname = '/tmp/foo/bar.baz'
>>> sp = os.path.splitext(fname)
>>> sp
('/tmp/foo/bar', '.baz')
Extracting the name of the file without extension:
>>> os.path.basename(sp[0])
'bar'
And formatting a new file name:
>>> "{}.txt".format(os.path.basename(sp[0]))
'bar.txt'
In general, when manipulating file names and paths I try to just use os.path, since it already handles edge cases, can normalize paths from things like /..//./././, etc.
Related
I have a main file which uses(from the main I do a source) a properties file with variables pointing to paths.
The properties file looks like this:
TMP_PATH=/$COMPANY/someProject/tmp
OUTPUT_PATH=/$COMPANY/someProject/output
SOME_PATH=/$COMPANY/someProject/some path
The problem is SOME_PATH, I must use a path with spaces (I can't change it).
I tried escaping the whitespace, with quotes, but no solution so far.
I edited the paths, the problem with single quotes is I'm using another variable $COMPANY in the path
Use one of these threee variants:
SOME_PATH="/mnt/someProject/some path"
SOME_PATH='/mnt/someProject/some path'
SOME_PATH=/mnt/someProject/some\ path
I see Federico you've found solution by yourself.
The problem was in two places. Assignations need proper quoting, in your case
SOME_PATH="/$COMPANY/someProject/some path"
is one of possible solutions.
But in shell those quotes are not stored in a memory,
so when you want to use this variable, you need to quote it again, for example:
NEW_VAR="$SOME_PATH"
because if not, space will be expanded to command level, like this:
NEW_VAR=/YourCompany/someProject/some path
which is not what you want.
For more info you can check out my article about it http://www.cofoh.com/white-shell
You can escape the "space" char by putting a \ right before it.
SOME_PATH=/mnt/someProject/some\ path
should work
If the file contains only parameter assignments, you can use the following loop in place of sourcing it:
# Instead of source file.txt
while IFS="=" read name value; do
declare "$name=$value"
done < file.txt
This saves you having to quote anything in the file, and is also more secure, as you don't risk executing arbitrary code from file.txt.
If the path in Ubuntu is "/home/ec2-user/Name of Directory", then do this:
1) Java's build.properties file:
build_path='/home/ec2-user/Name\\ of\\ Directory'
Where ~/ is equal to /home/ec2-user
2) Jenkinsfile:
build_path=buildprops['build_path']
echo "Build path= ${build_path}"
sh "cd ${build_path}"
I am writing an AI that runs commands off of text file modules. In the folder in which my python program is located are a group of text files. They each have sets of keyword-command sets formatted like this:
keyword 1,function 1|keyword 2,function 2
My program loops through all these files and creates a list of keyword-command sets. For example, from 2 text files,
keyword 1,function 1|keyword 2,function 2 and keyword 3,function 3,
the list generated is
[['keyword 1', 'function 1'], ['keyword 2', 'function 2'], ['keyword 3', 'function 3']].
Now the function portions are commands run via the exec command, but I would like to have the ability to execute multiple lines of code for each function. I am thinking I will accomplish this by adding a special symbol to symbolize a new line and add the commands to a list, then iterate through them. My question is are there any symbol I could safely use that won't mess up any other commands that may use those symbols? For example, if I use %, it would mess up the modulo command.
Here is my code as of now in case you need it, although I don't really think you would.
# Setup
import os
import locale
# Load modules
functions = []
print(str(os.getcwd()))
print(str(os.getcwd().replace('ZAAI.py', '')))
for file in os.listdir(os.getcwd().replace('ZAAI.py', '')):
if file.endswith('.txt'):
openFile = open(os.getcwd().replace('ZAAI.py', '') + file, encoding=locale.getpreferredencoding())
openFileText = openFile.read()
print(openFileText)
for item in openFileText.split('|'):
functions.append(item.split(','))
print(functions)
Well, python supports multiple expressions/statements on a single line using a semi-colon ;
a = 1; b = 2; c = a + b; print c
So, you don't need to create your own newline symbol to handle multiline python scripts. That being said, you should probably not do this.
You're essentially creating a somewhat limited plugin architecture. People have done this before. There are lots of options for doing this in python. I can just imagine the amount of frustration someone could have looking at one of your "plugin" files with dozens of commands, each with a 30 line python script on a single line.
According to the documentation on literals, The $ and ? characters are not used in Python for any purpose other than string literals and comments.
I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric.
As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.
import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')
yields
files1 = []
files2 = ['C:\\A\\Test.txt']
If this is by design, is there any other directory name, that leads to such unexpected behaviour?
(I'm working on win 7, with Python 2.7.10 (32bit))
EDIT: (2019) Added an answer for Python 3 using pathlib.
The problem is that \a has a special meaning in string literals (bell char).
Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").
Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").
This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...
I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:
import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')
Notice the r at the beginning of the string.
As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:
https://docs.python.org/2/reference/lexical_analysis.html#string-literals
As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.
from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))
--> [WindowsPath('C:/a/Test.txt')]
I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.
I am looking for a few scripts which would allow to manipulate generic csv files...
typically something like:
add-row FILENAME INSERT_ROW
get-row FILENAME GREP_ROW
replace-row FILENAME GREP_ROW INSERT_ROW
delete-row FILENAME GREP_ROW
where
FILENAME the name of a csv file, with the first row containing headers, "" used to delimit strings which might contain ','
GREP_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to identify a row based on its fields values in a csv file
INSERT_ROW a string of pairs field1=value1[,fieldN=valueN,...] used to replace(or add) the fields of a row.
peferably in python using the csv package...
ideally leveraging python to associate each field as a variable and allowing more advanced GREP rules like fieldN > XYZ...
Perl has a tradition of in-place editing derived from the unix philosophy.
We could for example write simple add-row-by-num.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $ln=shift; $line=shift; }
print "$line\n" if $ln==$.;
close ARGV if eof;
Replace the third line by $_="$line\n" if $ln==$.; to replace lines. Eliminate the $line=shift; and replace the third line by $_ = "" if $ln==$.; to delete lines.
We could write a simple add-row-by-regex.pl command as follows :
#!/usr/bin/perl -pi
BEGIN { $regex=shift; $line=shift; }
print "$line\n" if /$regex/;
Or simply the perl command perl -pi -e 'print "LINE\n" if /REGEX/'; FILES. Again, we may replace the print $line by $_="$line\n" or $_ = "" for replace or delete, respectively.
We do not need the close ARGV if eof; line anymore because we need not rest the $. counter after each file is processed.
Is there some reason the ordinary unix grep utility does not suffice? Recall the regular expression (PATERN){n} matches PATERN exactly n times, i.e. (\s*\S+\s*,){6}{\s*777\s*,) demands a 777 in the 7th column.
There is even a perl regular expression to transform your fieldN=value pairs into this regular expression, although I'd use split, map, and join myself.
Btw, File::Inplace provides inplace editing for file handles.
Perl has the DBD::CSV driver, which lets you access a CSV file as if it were an SQL database. I've played with it before, but haven't used it extensively, so I can't give a thorough review of it. If your needs are simple enough, this may work well for you.
App::CCSV does some of that.
The usual way in Python is to use the csv.reader to load the data into a list of tuples, then do your add/replace/get/delete operations on that native python object, and then use csv.writer to write the file back out.
In-place operations on CSV files wouldn't make much sense anyway. Since the records are not typically of fixed length, there is no easy way to insert, delete, or modify a record without moving all the other records at the same time.
That being said, Python's fileinput module has a mode for in-place file updates.
I need to generate a tar file but as a string in memory rather than as an actual file. What I have as input is a single filename and a string containing the assosiated contents. I'm looking for a python lib I can use and avoid having to role my own.
A little more work found these functions but using a memory steam object seems a little... inelegant. And making it accept input from strings looks like even more... inelegant. OTOH it works. I assume, as most of it is new to me. Anyone see any bugs in it?
Use tarfile in conjunction with cStringIO:
c = cStringIO.StringIO()
t = tarfile.open(mode='w', fileobj=c)
# here: do your work on t, then...:
s = c.getvalue() # extract the bytestring you need