How to turn text file into python list form - python

I'm trying to turn a list of numbers in a text file into python list form. For example, I want to make
1
2
3
4
5
into
[1,2,3,4,5]
I found something that almost worked in another post using sed.
sed '1s/^/[/;$!s/$/,/;$s/$/]/' file
but this didn't remove the new line after every number. How can I modify this sed command to get it to do what I want. Also, an explanation on the components of the sed command would also be appreciated. Thanks

With GNU sed for -z to read the whole file at once:
sed -z 's/\n/,/g; s/^/[/; s/,$/]\n/' file
[1,2,3,4,5]
With any awk in any shell on any UNIX box:
$ awk '{printf "%s%s", (NR>1 ? "," : "["), $0} END{print "]"}' file
[1,2,3,4,5]

You can append all the lines into the pattern space first before performing substitutions:
sed ':a;N;$!ba;s/\n/,/g;s/^/\[/;s/$/\]/' file
This outputs:
[1,2,3,4,5]

This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/\n/,/g;s/.*/[&]/' file
Copy the first line to the hold space, append copies of subsequent lines and delete the originals. At the end of the file, swap to the hold space, replace newlines by commas, and surround the remaining string by square brackets.

If you want the list using python, a simple implementation is
with open('./num.txt') as f:
num = [int(line) for line in f]

Related

how to add header text with adjacent content in un-formatted data set, side by side with a delimiter separated value using sed/awk/python

I have a long list of unformatted data say data.txt where each set is started with a header and ends with a blank line, like:
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
Now, i want to add the header of each set with it's content side by side with comma separated. Like:
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
so that whenever i will grep with one keyword, relevant content along with the header comes together. Like:
$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$
I am newbie on bash scripting as well as python and recently started learning these, so would really appreciate any simple bash scipting (using sed/awk) or python scripting.
Using sed
$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
Match lines containing Type, hold it in memory and delete it.
Match lines with alphabetic characters, append G the contents of the hold space. Finally, sub new line for a comma.
I would use GNU AWK for this task following way, let file.txt content be
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
then
awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt
output
alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$
moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$
rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$
Explanation: If line starts with (^) Type set header value to that line ($0) and go to next line. For every line print if it does contain at least one character (/./) line ($0) concatenated with ; and header, otherwise print line ($0) as is.
(tested in GNU Awk 5.0.1)
Using any awk in any shell on every Unix box regardless of which characters are in your data:
$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

How to add new columns of zeroes to a file?

I have a file of 10000 rows, e.g.,
1.2341105289455E+03 1.1348135000000E+00
I would like to have
1.2341105289455E+03 0.0 1.1348135000000E+00 0.0
and insert columns of '0.0' in it.
I tried to replace 'space' into '0.0' it works but I don't think it is the best solution. I tried with awk but I was only able to add '0.0' at the end of the file.
I bet there is a better solution to it. Do you know how to do it? awk? python? emacs?
Use this Perl one-liner:
perl -lane 'print join "\t", $F[0], "0.0", $F[1], "0.0"; ' in_file > out_file
The perl one-liner uses these command line flags:
-e : tells Perl to look for code in-line, instead of in a file.
-n : loop over the input one line at a time, assigning it to $_ by default.
-l : strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perlrun: command line switches
with awk
awk '{print $1,"0.0",$2,"0.0"}' file
If you want to modify the file inplace, you can do it either with GNU awk adding the-i inplace option, or adding > tmp && mv tmp file to the existing command. But always run it first without replacing, to test it and confirm the output.

How can I concatenate multiple text or xml files but omit specific lines from each file?

I have a number of xml files (which can be considered as text files in this situation) that I wish to concatenate. Normally I think I could do something like this from a Linux command prompt or bash script:
cat somefile.xml someotherfile.xml adifferentfile.xml > out.txt
Except that in this case, I need to copy the first file in its entirety EXCEPT for the very last line, but in all subsequent files omit exactly the first four lines and the very last line (technically, I do need the last line from the last file but it is always the same, so I can easily add it with a separate statement).
In all these files the first four lines and the last line are always the same, but the contents in between varies. The names of the xml files can be hardcoded into the script or read from a separate data file, and the number of them may vary from time to time but always will number somewhere around 10-12.
I'm wondering what would be the easiest and most understandable way to do this. I think I would prefer either a bash script or maybe a python script, though I generally understand bash scripts a little better. What I can't get my head around is how to trim off just those first four lines (on all but the first file) and the last line of every file. My suspicion is there's some Linux command that can do this, but I have no idea what it would be. Any suggestions?
sed '$d' firstfile > out.txt
sed --separate '1,4d; $d' file2 file3 file4 >> out.txt
sed '1,4d' lastfile >> out.txt
It's important to use the --separate (or shorter -s) option so that the range statements 1,4 and $ apply to each file individually.
From GNU sed manual:
-s, --separate
By default, sed will consider the files specified on the command line as a single continuous long stream. This GNU sed
extension allows the user to consider them as separate files.
Do it in two steps:
use the head command (to get the lines you want)
Use cat to combine
You could use temp files or bash trickery.

Add filename before first occurrence of a character in all lines for all files in a given folder

I have a folder full of files with lines which look like this:
S149.sh
sox preaching.wav _001 trim 889.11 891.23
sox preaching.wav _002 trim 891.45 893.92
sox preaching.wav _003 trim 1599.95 1606.78
And I want to add the filename without its extension (which is S149) right before the first occurrence of the _ character in every line, so that it ends up looking like this:
sox preaching.wav S149_001 trim 889.11 891.23
sox preaching.wav S149_002 trim 891.45 893.92
sox preaching.wav S149_003 trim 1599.95 1606.78
And I want to automatically do this for every *.sh file in a given folder.
How do I achieve that with either bash (this includes awk, grep, sed, etc.) or python? Any help will be greatly appreciated.
One possibility, using ed, the standard editor and a loop:
for i in *.sh; do
printf '%s\n' ",g/_/ s/_/${i%.sh}&/" w q | ed -s -- "$i"
done
The parameter expansion ${i%.sh} expands to $i where the suffix .sh is removed.
The ed commands are, in the case i=S149.sh:
,g/_/ s/_/S149&/
w
,g/_/ marks all lines containing an underscore and s/_/S149&/ replaces the underscore by S149_. Then w writes the file.
A sed version:
for i in *.sh; do
sed -i "s/_/${i%.*}_/g" "$i"
done
${i%.*} expands to the filename minus the extension used by the in-place replacement operation.
With GNU awk for inplace editing:
awk -i inplace 'FNR==1{f=gensub(/\.[^.]+$/,"",1,FILENAME)} {$3=f$3} 1' *.sh
If you're considering using a shell loop instead, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
#Ruran- In case you do not have an awk which could do Input_file editing while reading the Input_file then following may help you in same.
awk '(FILENAME != P && P && Q){close(P);system("mv " Q OFS P)} {Q=P=FILENAME;sub(/\..*/,X,Q);sub(/_/,Q"&");print > Q;} END{system("mv " Q OFS P)}' *.sh
Logic, behind is simple it is changing the first occurrence of _(char) and then it is keeping the new formatted lines into a tmp file while reading next Input_file it is renaming that temp file into the previous Input_file.
Also one more point which I have not seen here in above posts it as we are using *.sh so let's say you have thousands of Input_files then code may give error which is because of too many Input_files will be opened and we are NOT closing the files, so I am closing them too, let me know if this helps you.
A non-one liner form of solution too as follows.
awk '(FILENAME != P && P && Q){
close(P);
system("mv " Q OFS P)
}
{
Q=P=FILENAME;
sub(/\..*/,X,Q);
sub(/_/,Q"&");
print > Q;
}
END {
system("mv " Q OFS P)
}
' *.sh

fasta file header lines into column

I have a fasta file that contains sequence headers and their corresponding sequences as so:
>ID101_hg19
ATGGGTGTATCGTACCC
>ID102_hg19
AGCTTTAGCGGGGTACA
I want to change the header line to be another tab separated column next to the sequence. Here's the desired output:
>ID101_hg19 ATGGGTGTATCGTACCC
>ID102_hg19 AGCTTTAGCGGGGTACA
Any ideas on how to do this task?
Using Sed, you could do it like:
sed 'N;s/\n/\t/' file.txt
Using awk, you could do the following:
awk '{getline a; printf("%s\t%s", $0, a);}' file.txt
A slight correction to SMA's answer...
awk '{getline a; printf("%s\t%s\n", $0, a);}' file.txt
Adds newline
In general, each header line in a FASTA file can be followed by more than one line of data, so one might want to handle such cases. If the goal is to string together all the contiguous data lines, then the following would do the job:
awk '/^>/ {if (prev) {print prev;}; prev=$0 "\t"; next}
{prev=prev $0;}
END {print prev}'
If, on the other hand, the header is to be attached to just one line of data, then assuming the $'...' syntax is available, the sed command to use would be:
sed $'/^>/ {N;s/\\n/\t/;}'

Categories

Resources