removing numeric value from a string - python

I have a file of 2000 rows and 1 column
1007_s_at1
1007_s_at2
1007_s_at3
1007_s_at4
1007_s_at5
1007_s_at6
1007_s_at7
1007_s_at8
1007_s_at9
1007_s_at10
looks like above, I want to remove the last numeric value after "at". In principle whatever number is in the last should be truncated.
I have tried things like splitting them and then rejoioning it, but it just complicates the problem and I am far away from answer.
Could you please suggest something in bash or shell or python or perl to solve this.
An output like below is desired
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
1007_s_at
Thank you

With Perl:
perl -p -e "s/\d+$//" input.txt > output.txt

sed -i -e 's/[[:digit:]]*$//' filename

Just pass string.digits to .rstrip() to remove digits from the right-hand side of your strings:
import string
with open('inputfile') as infile, open('outputfile') as outfile:
for line in infile:
outfile.write(line.rstrip().rstrip(string.digits) + '\n')

If the only the number at the end changes you could potentially splice:
>>> a = '1007_s_at1'
>>> a[0:9]
'1007_s_at'

Python
Just strip all digits from the end.
>>> "1007_s_at10".rstrip('01234567890')
'1007_s_at'

If you are using Linux or Unix a simple one liner solution would be:
perl -i.bak -pe 's/\d+$//g' file.txt
if Windows:
perl -i.bak -pe "s/\d+$//g" file.txt
If you already know what it is doing then well and good otherwise, in very simple terms, -i switch with .bak would first create a backup of your file.txt and name it file.txt.bak.
The -p option would then loop over the entries in the file and print/save the output in file.txt after s/\d+$//g removes the digits in the end.

Nobody's suggested a bash solution yet:
shopt -s extglob
while read line; do
echo "${line%%*([0-9])}"
done < filename

Related

How to add new columns of zeroes to a file?

I have a file of 10000 rows, e.g.,
1.2341105289455E+03 1.1348135000000E+00
I would like to have
1.2341105289455E+03 0.0 1.1348135000000E+00 0.0
and insert columns of '0.0' in it.
I tried to replace 'space' into '0.0' it works but I don't think it is the best solution. I tried with awk but I was only able to add '0.0' at the end of the file.
I bet there is a better solution to it. Do you know how to do it? awk? python? emacs?
Use this Perl one-liner:
perl -lane 'print join "\t", $F[0], "0.0", $F[1], "0.0"; ' in_file > out_file
The perl one-liner uses these command line flags:
-e : tells Perl to look for code in-line, instead of in a file.
-n : loop over the input one line at a time, assigning it to $_ by default.
-l : strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perlrun: command line switches
with awk
awk '{print $1,"0.0",$2,"0.0"}' file
If you want to modify the file inplace, you can do it either with GNU awk adding the-i inplace option, or adding > tmp && mv tmp file to the existing command. But always run it first without replacing, to test it and confirm the output.

How to turn text file into python list form

I'm trying to turn a list of numbers in a text file into python list form. For example, I want to make
1
2
3
4
5
into
[1,2,3,4,5]
I found something that almost worked in another post using sed.
sed '1s/^/[/;$!s/$/,/;$s/$/]/' file
but this didn't remove the new line after every number. How can I modify this sed command to get it to do what I want. Also, an explanation on the components of the sed command would also be appreciated. Thanks
With GNU sed for -z to read the whole file at once:
sed -z 's/\n/,/g; s/^/[/; s/,$/]\n/' file
[1,2,3,4,5]
With any awk in any shell on any UNIX box:
$ awk '{printf "%s%s", (NR>1 ? "," : "["), $0} END{print "]"}' file
[1,2,3,4,5]
You can append all the lines into the pattern space first before performing substitutions:
sed ':a;N;$!ba;s/\n/,/g;s/^/\[/;s/$/\]/' file
This outputs:
[1,2,3,4,5]
This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/\n/,/g;s/.*/[&]/' file
Copy the first line to the hold space, append copies of subsequent lines and delete the originals. At the end of the file, swap to the hold space, replace newlines by commas, and surround the remaining string by square brackets.
If you want the list using python, a simple implementation is
with open('./num.txt') as f:
num = [int(line) for line in f]

Add filename before first occurrence of a character in all lines for all files in a given folder

I have a folder full of files with lines which look like this:
S149.sh
sox preaching.wav _001 trim 889.11 891.23
sox preaching.wav _002 trim 891.45 893.92
sox preaching.wav _003 trim 1599.95 1606.78
And I want to add the filename without its extension (which is S149) right before the first occurrence of the _ character in every line, so that it ends up looking like this:
sox preaching.wav S149_001 trim 889.11 891.23
sox preaching.wav S149_002 trim 891.45 893.92
sox preaching.wav S149_003 trim 1599.95 1606.78
And I want to automatically do this for every *.sh file in a given folder.
How do I achieve that with either bash (this includes awk, grep, sed, etc.) or python? Any help will be greatly appreciated.
One possibility, using ed, the standard editor and a loop:
for i in *.sh; do
printf '%s\n' ",g/_/ s/_/${i%.sh}&/" w q | ed -s -- "$i"
done
The parameter expansion ${i%.sh} expands to $i where the suffix .sh is removed.
The ed commands are, in the case i=S149.sh:
,g/_/ s/_/S149&/
w
,g/_/ marks all lines containing an underscore and s/_/S149&/ replaces the underscore by S149_. Then w writes the file.
A sed version:
for i in *.sh; do
sed -i "s/_/${i%.*}_/g" "$i"
done
${i%.*} expands to the filename minus the extension used by the in-place replacement operation.
With GNU awk for inplace editing:
awk -i inplace 'FNR==1{f=gensub(/\.[^.]+$/,"",1,FILENAME)} {$3=f$3} 1' *.sh
If you're considering using a shell loop instead, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
#Ruran- In case you do not have an awk which could do Input_file editing while reading the Input_file then following may help you in same.
awk '(FILENAME != P && P && Q){close(P);system("mv " Q OFS P)} {Q=P=FILENAME;sub(/\..*/,X,Q);sub(/_/,Q"&");print > Q;} END{system("mv " Q OFS P)}' *.sh
Logic, behind is simple it is changing the first occurrence of _(char) and then it is keeping the new formatted lines into a tmp file while reading next Input_file it is renaming that temp file into the previous Input_file.
Also one more point which I have not seen here in above posts it as we are using *.sh so let's say you have thousands of Input_files then code may give error which is because of too many Input_files will be opened and we are NOT closing the files, so I am closing them too, let me know if this helps you.
A non-one liner form of solution too as follows.
awk '(FILENAME != P && P && Q){
close(P);
system("mv " Q OFS P)
}
{
Q=P=FILENAME;
sub(/\..*/,X,Q);
sub(/_/,Q"&");
print > Q;
}
END {
system("mv " Q OFS P)
}
' *.sh

Python Script to Change Folder Names

I'm on OS X and I'm fed up with our labeling system where I work. The labels are mm/dd/yy and I think that they should be yy/mm/dd. Is there a way to write a script to do this? I understand a bit of Python with lists and how to change the position of characters.
Any suggestions or tips?
What I have now:
083011-HalloweenBand
090311-ViolaClassRecital
090411-JazzBand
What I want:
110830-HalloweenBand
110903-ViolaClassRecital
110904-JazzBand
Thanks
Assuming the script is in the same directory as the files you want to rename, and you already have the list of files that you want to rename, you can do this:
for file in rename_list:
os.rename(file, file[4:6] + file[:2] + file[2:4] + file[6:])
There is a Q&A with information on traversing directories with Python that you could modify to do this. The key method is walk(), but you'll need to add the appropriate calls to rename().
As a beginner it is probably best to start by traversing the directories and writing out the new directory names before attempting to change the directory names. You should also make a backup and notify anyone who might care about this change before doing it.
i know you asked for python, but I would do it from the shell. this is a simple one liner.
ls | awk '{print "mv " $0 FS substr($1,5,2) substr($1,1,4) substr($1,7) }' | bash
I do not use osx but I think it is a bash shell. you may need to rename bash to sh, or awk to gawk.
but what that line is doing is piping the directory listing to awk which is printing "mv" $0 (the line) and a space (FS = field separator, which defaults to space) then two substrings.
substr(s,c,n). This returns the substring from string s starting from character position c up to a maximum length of n characters. If n is not supplied, the rest of the string from c is returned.
lastly this is piped to the shell. allowing it to be executed. This works without problems on ubuntu and variations of this command I use quite a bit. a version of awk (awk,nawk,gawk) should be isntalled on osx which I believe uses bash

Python: Spaces to Tabs?

I have a python file (/home/test.py) that has a mixture of spaces and tabs in it.
Is there a programmatic way (note: programmatic, NOT using an editor) to convert this file to use only tabs? (meaning, replace any existing 4-spaces occurrences with a single tab)?
Would be grateful for either a python code sample or a linux command to do the above. Thanks.
Sounds like a task for sed:
sed -e 's/ /\t/g' test.py > test.new
[Put a real tab instead of \t]
However...
Use 4 spaces per indentation level.
--PEP 8 -- Style Guide for Python Code
you can try iterating the file and doing replacing eg
import fileinput
for line in fileinput.FileInput("file",inplace=1):
print line.replace(" ","\t")
or you can try a *nix tool like sed/awk
$ awk '{gsub(/ /,"\t")}1' file > temp && mv temp file
$ ruby -i.bak -ne '$_.gsub!(/ /,"\t")' file

Categories

Resources