Python: Spaces to Tabs?

Python: Spaces to Tabs? - python

I have a python file (/home/test.py) that has a mixture of spaces and tabs in it.
Is there a programmatic way (note: programmatic, NOT using an editor) to convert this file to use only tabs? (meaning, replace any existing 4-spaces occurrences with a single tab)?
Would be grateful for either a python code sample or a linux command to do the above. Thanks.

Sounds like a task for sed:
sed -e 's/ /\t/g' test.py > test.new
[Put a real tab instead of \t]
However...
Use 4 spaces per indentation level.
--PEP 8 -- Style Guide for Python Code

you can try iterating the file and doing replacing eg
import fileinput
for line in fileinput.FileInput("file",inplace=1):
print line.replace(" ","\t")
or you can try a *nix tool like sed/awk
$ awk '{gsub(/ /,"\t")}1' file > temp && mv temp file
$ ruby -i.bak -ne '$_.gsub!(/ /,"\t")' file

Related

How to add new columns of zeroes to a file?

I have a file of 10000 rows, e.g.,
1.2341105289455E+03 1.1348135000000E+00
I would like to have
1.2341105289455E+03 0.0 1.1348135000000E+00 0.0
and insert columns of '0.0' in it.
I tried to replace 'space' into '0.0' it works but I don't think it is the best solution. I tried with awk but I was only able to add '0.0' at the end of the file.
I bet there is a better solution to it. Do you know how to do it? awk? python? emacs?

Use this Perl one-liner:
perl -lane 'print join "\t", $F[0], "0.0", $F[1], "0.0"; ' in_file > out_file
The perl one-liner uses these command line flags:
-e : tells Perl to look for code in-line, instead of in a file.
-n : loop over the input one line at a time, assigning it to $_ by default.
-l : strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perlrun: command line switches

with awk
awk '{print $1,"0.0",$2,"0.0"}' file
If you want to modify the file inplace, you can do it either with GNU awk adding the-i inplace option, or adding > tmp && mv tmp file to the existing command. But always run it first without replacing, to test it and confirm the output.

python -c vs python -<< heredoc

I am trying to run some piece of Python code in a Bash script, so i wanted to understand what is the difference between:
#!/bin/bash
#your bash code
python -c "
#your py code
"
vs
python - <<DOC
#your py code
DOC
I checked the web but couldn't compile the bits around the topic. Do you think one is better over the other?
If you wanted to return a value from Python code block to your Bash script then is a heredoc the only way?

The main flaw of using a here document is that the script's standard input will be the here document. So if you have a script which wants to process its standard input, python -c is pretty much your only option.
On the other hand, using python -c '...' ties up the single-quote for the shell's needs, so you can only use double-quoted strings in your Python script; using double-quotes instead to protect the script from the shell introduces additional problems (strings in double-quotes undergo various substitutions, whereas single-quoted strings are literal in the shell).
As an aside, notice that you probably want to single-quote the here-doc delimiter, too, otherwise the Python script is subject to similar substitutions.
python - <<'____HERE'
print("""Look, we can have double quotes!""")
print('And single quotes! And `back ticks`!')
print("$(and what looks to the shell like process substitutions and $variables!)")
____HERE
As an alternative, escaping the delimiter works identically, if you prefer that (python - <<\____HERE)

If you are using bash, you can avoid heredoc problems if you apply a little bit more of boilerplate:
python <(cat <<EoF
name = input()
print(f'hello, {name}!')
EoF
)
This will let you run your embedded Python script without you giving up the standard input. The overhead is mostly the same of using cmda | cmdb. This technique is known as Process Substitution.
If want to be able to somehow validate the script, I suggest that you dump it to a temporary file:
#!/bin/bash
temp_file=$(mktemp my_generated_python_script.XXXXXX.py)
cat > $temp_file <<EoF
# embedded python script
EoF
python3 $temp_file && rm $temp_file
This will keep the script if it fails to run.

If you prefer to use python -c '...' without having to escape with the double-quotes you can first load the code in a bash variable using here-documents:
read -r -d '' CMD << '--END'
print ("'quoted'")
--END
python -c "$CMD"
The python code is loaded verbatim into the CMD variable and there's no need to escape double quotes.

How to use here-docs with input
tripleee's answer has all the details, but there's Unix tricks to work around this limitation:
So if you have a script which wants to process its standard input, python -c is pretty much your only option.
This trick applies to all programs that want to read from a redirected stdin (e.g., ./script.py < myinputs) and also take user input:
python - <<'____HERE'
import os
os.dup2(1, 0)
print(input("--> "))
____HERE
Running this works:
$ bash heredocpy.sh
--> Hello World!
Hello World!
If you want to get the original stdin, run os.dup(0) first. Here is a real-world example.
This works because as long as either stdout or stderr are a tty, one can read from them as well as write to them. (Otherwise, you could just open /dev/tty. This is what less does.)
In case you want to process inputs from a file instead, that's possible too -- you just have to use a new fd:
Example with a file
cat <<'____HERE' > file.txt
With software there are only two possibilites:
either the users control the programme
or the programme controls the users.
____HERE
python - <<'____HERE' 4< file.txt
import os
for line in os.fdopen(4):
print(line.rstrip().upper())
____HERE
Example with a command
Unfortunately, pipelines don't work here -- but process substitution does:
python - <<'____HERE' 4< <(fortune)
import os
for line in os.fdopen(4):
print(line.rstrip().upper())
____HERE

bach linux file rename - how to rename multiple files in linux console

I would like to rename cca 1000 files that are named like:
66-123123.jpg -> abc-123123-66.jpg. So in general file format is:
xx-yyyyyy.jpg -> abc-yyyyyy-xx.jpg, where xx and yyyyyy are numbers, abc is string.
Can someone help me with bash or py script?

Try doing this :
rename 's/(\d{2})-(\d{6})\.jpg/abc-$2-$1.jpg/' *.jpg
There are other tools with the same name which may or may not be able to do this, so be careful.
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
If not, to make it the default (usually already the case) on Debian and derivative like Ubuntu :
$ sudo update-alternatives --set rename /path/to/rename
(replace /path/to/rename to the path of your perl's rename command.
If you don't have this command, search your package manager to install it or do it manually.
Last but not least, this tool was originally written by Larry Wall, the Perl's dad.

for file in ??-??????.jpg ; do
[[ $file =~ (..)-(......)\.jpg ]]
mv "$file" "abc-${BASH_REMATCH[2]}-${BASH_REMATCH[1]}.jpg" ;
done
This requires bash 4 for the regex support. For POSIXy shells, this will do
for f in ??-??????.jpg ; do
g=${f%.jpg} # remove the extension
a=${g%-*} # remove the trailing "-yyyyyy"
b=${g#*-} # remove the leading "xx-"
mv "$f" "abc-$b-$a.jpg" ;
done

You could use the rename command, which renames multiple files using regular expressions. In this case you would like to write
rename 's/(\d\d)-(\d\d\d\d\d\d)/abc-$2-$1/' *
where \dmeans a digit, and $1 and $2 refer to the values matched by the first and second parenthesis.

Being able to do things like this easily, is why I name my files the way I do. Using a + sign lets me cut them all up into variables, and then I can just re-arrange them with echo.
#!/usr/bin/env bash
set -x
find *.jpg -type f | while read files
do
newname=$(echo "${files}" | sed s'#-#+#'g | sed s'#\.jpg#+.jpg#'g)
field1=$(echo "${newname}" | cut -d'+' -f1)
field2=$(echo "${newname}" | cut -d'+' -f2)
field3=$(echo "${newname}" | cut -d'+' -f3)
finalname=$(echo "abc-${field2}-${field1}.${field3}")
mv "${files}" "${finalname}"
done

pythonic equivalent this sed command

I have this awk/sed command
awk '{full=full$0}END{print full;}' initial.xml | sed 's|</Product>|</Product>\
|g' > final.xml
to break an XML doc containing large number of tags
such that the new file will have all contents of the product node in a single line
I am trying to run it using os.system and subprocess module however this is wrapping all the contents of the file into one line.
Can anyone convert it into equivalent python script?
Thanks!

Something like this?
from __future__ import print_function
import fileinput
for line in fileinput.input('initial.xml'):
print(line.rstrip('\n').replace('</Product>','</Product>\n'),end='')
I'm using the print function because the default print in Python 2.x will add a space or newline after each set of output. There are various other ways to work around that, some of which involve buffering your output before printing it.
For the record, your problem could equally well be solved in just a simple Awk script.
awk '{ gsub(/<Product>/,"&\n"); printf $0 }' initial.xml
Printing output as it arrives without a trailing newline is going to be a lot more efficient than buffering the whole file and then printing it at the end, and of course, Awk has all the necessary facilities to do the substition as well. (gsub is not available in all dialects of Awk, though.)

Python equivalent to perl -pe?

I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:
cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'
However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? I.e., could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?

Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:
python -c "import sys; print sys.path"
There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input()/input(), and print/print()).
For your particular example:
cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"
(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)

You can use:
$ python -c '<your code here>'

You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).
The Python equivalent of your colleague's Perl one-liner is approximately:
import sys, re
for line in sys.stdin:
print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)

You have a problem which can be solved several ways.
I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:
import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)
(I would prefer using $ instead of '\n' for line endings, because line endings are different between operational systems and file encodings)
If you want to call bash commands from inside Python, you could use:
import os
os.system(mycommand)
Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.
Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.

Perl (or sed) is more convenient. However it is possible, if ugly:
python -c 'import sys, re; print "\n".join(re.sub(".+(\d\.\d+)\.\n","\1 ", l) for l in sys.stdin)'

Quoting from https://stackoverflow.com/a/12259852/411282:
for ln in __import__("fileinput").input(): print ln.rstrip()
See the explanation linked above, but this does much more of what perl -p does, including support for multiple file names and stdin when no filename is given.
https://docs.python.org/3/library/fileinput.html#fileinput.input

You can use python to execute code directly from your bash command line, by using python -c, or you can process input piped to stdin using sys.stdin, see here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Spaces to Tabs? - python

Sounds like a task for sed: sed -e 's/ /\t/g' test.py > test.new [Put a real tab instead of \t] However... Use 4 spaces per indentation level. --PEP 8 -- Style Guide for Python Code

you can try iterating the file and doing replacing eg import fileinput for line in fileinput.FileInput("file",inplace=1): print line.replace(" ","\t") or you can try a *nix tool like sed/awk $ awk '{gsub(/ /,"\t")}1' file > temp && mv temp file $ ruby -i.bak -ne '$_.gsub!(/ /,"\t")' file

Related

How to add new columns of zeroes to a file?

python -c vs python -<< heredoc

bach linux file rename - how to rename multiple files in linux console

pythonic equivalent this sed command

Python equivalent to perl -pe?

Categories

Resources