Joining two different datasets using multiple keyvalues

Joining two different datasets using multiple keyvalues - python

I have two sets of data.
The first dataset looks like:
Storm_ID,Cell_ID,Wind_speed
2,10236258,27
2,10236300,58
2,10236301,25
3,10240400,51
The second dataset looks like:
Storm_ID,Cell_ID,Storm_surge
2,10236299,0.27
2,10236300,0.27
2,10236301,0.35
2,10240400,0.35
2,10240401,0.81
4,10240402,0.11
Now I want an output which looks something like this:
Storm_ID,Cell_ID,Wind_speed,Storm_surge
2,10236258,27,0
2,10236299,0,0.27
2,10236300,58,0.27
2,10236301,25,0.35
2,10240400,0,0.35
2,10240401,0,0.81
3,10240400,51,0
4,10240402,0,0.11
I tried join command in Linux to perform this task and failed badly. Join command skipped the rows which didn't match in the database. I can use Matlab but the size of the data is more than 100 GB which is making it very difficult for this task.
Can someone please guide me on this one please. Can I use SQL or python to complete this task. I appreciate your help Thanks.

I think you want a full outer join:
select storm_id, cell_id,
coalesce(d1.wind_speed, 0) as wind_speed,
coalesce(d2.storm_surge, 0) as storm_surge
from dataset1 d1 full join
dataset2 d2
using (storm_id, cell_id);

Shell-Only Solution
Make a backup of your files first
Assuming your files are called wind1.txt and wind2.txt
You could apply these sets of shell commands:
perl -pi -E "s/,/_/" wind*
perl -pi -E 's/(.$)/$1,0/' wind1.txt
perl -pi -E "s/,/,0,/" wind2.txt
join --header -a 1 -a 2 wind1.txt wind2.txt > outfile.txt
Intermediate Result
Storm_ID_Cell_ID,Wind_speed,0
2_10236258,27,0
2_10236299,0,0.27
2_10236300,0,0.27
2_10236300,58,0
2_10236301,0,0.35
2_10236301,25,0
2_10240400,0,0.35
2_10240401,0,0.81
3_10240400,51,0
4_10240402,0,0.11
Now rename in line 0 to "storm_surge", replace first _ with "," in digits
perl -pi -E "s/Wind_speed,0/Wind_speed,Storm_surge/" outfile.txt
perl -pi -E 's/^(\d+)_/$1,/' outfile.txt
perl -pi -E "s/Storm_ID_Cell_ID/Storm_ID,Cell_ID/" outfile.txt
Intermediate result:
Storm_ID,Cell_ID,Wind_speed,Storm_surge
2,10236258,27,0
2,10236299,0,0.27
2,10236300,0,0.27
2,10236300,58,0
2,10236301,0,0.35
2,10236301,25,0
2,10240400,0,0.35
2,10240401,0,0.81
3,10240400,51,0
4,10240402,0,0.11
Finally run this:
awk 'BEGIN { FS=OFS=SUBSEP=","}{arr[$1,$2]+=$3+$4 }END {for (i in arr) print i,arr[i]}' outfile.txt | sort
(Sorry - Q was closed while answering)

awk -F, -v OFS=, '{x = $1 "," $2} FNR == NR {a[x] = $3; b[x] = 0; next} {b[x] = $3} !a[x] {a[x] = 0} END {for (i in a) print i, a[i], b[i]}' f1 f2 | sort -n
Since it's a loop, awk produces random order. Hence sorting at the end.

Related

access multiple output array of python in bash

I have a python script that print out 3 different lists. How can I access them. For example:
python out:
[1,2,3,4][a,b,c,d][p,q,r,s]
Now in bash I want to access them as:
list1=[1,2,3,4]
list2=[a,b,c,d]
list3=[p,q,r,s]
So far, I tried something like:
x=$(python myscript.py input.csv)
Now, If I use echo $x I can see the above mentioned list: [1,2,3,4][a,b,c,d][p,q,r,s]
How could I get 3 different lists? Thanks for help.

The Python output does not match the bash syntax. If you can not print the bash syntax directly from the Python script you will need to parse the output first.
I suggest using the sed command for parsing the output into bash arrays:
echo $x | sed 's|,| |g; s|\[|list1=(|; s|\[|list2=(|; s|\[|list3=(|;s|\]|)\n|g;'
Command explanation
sed 's|,| |g; # replaces `,` by blank space
s|\[|list1=(|; # replaces the 1st `[` by `list1=(`
s|\[|list2=(|; # replaces the 2nd `[` by `list2=(`
s|\[|list3=(|; # replaces the 3rd `[` by `list3=(`
s|\]|)\n|g;' # replaces all `]` by `)`
The output would be something like:
list1=(1 2 3 4)
list2=(a b c d)
list3=(p q r s)
At this point, the output are not actual lists. To turn the output into bash commands, you can surround the whole command with eval $(...), then the output will be evaluated as a bash command.
Putting all together:
$ eval $(echo $x | sed 's|,| |g; s|\[|list1=(|; s|\[|list2=(|; s|\[|list3=(|;s|\]|)\n|g;')
$ echo ${list1[#]}
1 2 3 4
$ echo ${list2[#]}
a b c d
$ echo ${list3[#]}
p q r s

Here is one approach using bash.
#!/usr/bin/env bash
##: This line is a simple test that it works.
##: IFS='][' read -ra main_list <<< [1,2,3,4][a,b,c,d][p,q,r,s]
IFS='][' read -ra main_list < <(python myscript.py input.csv)
n=1
while read -r list; do
[[ $list ]] || continue
read -ra list$((n++)) <<< "${list//,/ }"
done < <(printf '%s\n' "${main_list[#]}")
declare -p list1 list2 list3
Output
declare -a list1=([0]="1" [1]="2" [2]="3" [3]="4")
declare -a list2=([0]="a" [1]="b" [2]="c" [3]="d")
declare -a list3=([0]="p" [1]="q" [2]="r" [3]="s")
As per Philippe's comment, a for loop is also an option.
IFS='][' read -ra main_list < <(python myscript.py input.csv)
n=1
for list in "${main_list[#]}"; do
[[ $list ]] || continue
read -ra list$((n++)) <<< "${list//,/ }"
done
declare -p list1 list2 list3

How to find matching rows of the first column and add quantities of the second column? Bash

I have a csv file that looks like this:
SKU,QTY
KA006-001,2
KA006-001,33
KA006-001,46
KA009-001,22
KA009-001,7
KA010-001,18
KA014-001,3
KA014-001,42
KA015-001,1
KA015-001,16
KA020-001,6
KA022-001,56
The first column is SKU. The second column is QTY number.
Some lines in (SKU column only) are identical.
I need to achieve the following:
SKU,QTY
KA006-001,81 (2+33+46)
KA009-001,29 (22+7)
KA010-001,18
KA014-001,45 (3+42)
so on...
I tried different things , loop statements and arrays. Got so lost, got headache.
My code:
#!/bin/bash
while IFS=, read sku qty
do
echo "SKU='$sku' QTY='$qty'"
if [ "$sku" = "$sku" ]
then
#x=("$sku" != "$sku")
for i in {0..3}; do echo $sku[$i]=$qty; done
fi
done < 2asg.csv

I'd use awk:
awk -F, 'NR==1{print} NR>1{a[$1] += $2}END{for (i in a) print i","a[i]}' file
If you want to ignore blank lines, you can either ignore lines less than 2 columns:
awk -F, 'NR==1{print} NR>1 && NF>1{a[$1] += $2} END{for (i in a) print i","a[i]}' file
or ignore ones without exactly 2 columns:
awk -F, 'NR==1{print} NR>1 && NF==2{a[$1] += $2} END{for (i in a) print i","a[i]}' file
Alternatively, you can check to see that the second column begins with a digit:
awk -F, 'NR==1{print} NR>1 && $2~/^[0-9]/{a[$1] += $2} END{for (i in a) print i","a[i]}' file

For Bash 4:
#!/bin/bash
declare -A astr
while IFS=, read -r col1 col2
do
if [ "$col1" != "SKU" ] && [ "$col1" != "" ]
then
(( astr[$col1] += col2 ))
fi
done < 2asg.csv
echo "SKU,QTY"
for i in "${!astr[#]}"
do
echo "$i,${astr[$i]}"
done | sort -t : -k 2n
https://github.com/tigertv/stackoverflow-answers

Split csv file vertically using command line

Is it possible to split a csv file, vertically, into multiple files? I know we can split single large files into smaller files with no of rows mentioned using the command line. I have csv files in which columns are repeating after certain column no and I want to split that file column-wise.Is that possible with the command line, If not then how can we do it with python?
For Eg.
consider above sample in which site and address present multiple times vertically, I want to create 3 different csv files containing single site and single address
Any help would be highly appreciated,
Thanks

Assuming your input files is named ~/Downloads/sites.csv and looks like this:
Google,google.com,Google,google.com,Google,google.com
MS,microsoft.com,MS,microsoft.com,MS,microsoft.com
Apple,apple.com,Apple,apple.com,Apple,apple.com
You can use cut to create 3 files, each containing one pair of company/site:
cut -d "," -f 1-2 < ~/Downloads/sites.csv > file1.csv
cut -d "," -f 3-4 < ~/Downloads/sites.csv > file2.csv
cut -d "," -f 5-6 < ~/Downloads/sites.csv > file3.csv
Explanation:
For the cut command, we declare the comma (,) as a separator, which splits every line into a set for 'fields'.
We then specify for each output file, which fields we want to be included.
HTH!

If the site-address pairs are regularly repeated, how about:
awk '{
n = split($0, ary, ",");
for (i = 1; i <= n; i += 2) {
j = (i + 1) / 2;
print ary[i] "," ary[i+1] >> "file" j ".csv";
}
}' input.csv

The following script produces what you want (based on the SO answer adjusted for your needs: number of columns, field separator). It splits the original file vertically into 2 column chunks (note n=2) and creates 3 different files (tmp.examples.1, tmp.examples.2, tmp.examples.3 or whatever you specify for the f variable):
awk -F "," -v f="tmp.examples" '{for (i=1; i<=NF; i++) printf (i%n==0||i==NF)?$i RS:$i FS > f "." int((i-1)/n+1) }' n=2 example.txt
If your example.txt file has the subsequent data:
site,address,site,address,site,address
Google,google.com,MS,microsoft.com,Apple,apple.com

python: Remove trailing 0's and decimal point from awk command

I need to remove the trailing zero's from an export:
the code is reading original tempFile i need column 2 and 6 which contains:
12|9781624311390|1|1|0|0.0000
13|9781406273687|1|1|0|99.0000
14|9781406273717|1|1|0|104.0000
15|9781406273700|1|1|0|63.0000
the awk command changes the form to comma separated and dumps column 2 and 6 into tempFile2 - and i need to remove the trailing zeros from column 6 so the end result looks like this:
9781624311390,0
9781406273687,99
9781406273717,104
9781406273700,63
i believe this should do the trick but have had no luck implementing it:
awk '{sub("\\.*0+$",""); print}'
Below is the code i need to adjust: $6 is the column to remove zero's
if not isError:
print "Translating SQL output to tab delimited format"
awkRunSuccess = os.system(
"awk -F\"|\" '{print $2 \"\\,\" $6}' %s > %s" %
(tempFile, tempFile2)
)
if awkRunSuccess != 0: isError = True

You can use gsub("\\.*0+$","",$2) to do this, as per the following transcript:
pax> echo '9781624311390|0.0000
9781406273687|99.0000
9781406273717|104.0000
9781406273700|63.0000' | awk -F'|' '{gsub("\\.*0+$","",$2);print $1","$2}'
9781624311390,0
9781406273687,99
9781406273717,104
9781406273700,63
However, given you're already within Python (and it's no slouch when it comes to regexes), you'd probably want to use it natively rather than start up an awk process.

Try this awk command
awk -F '[|.]' '{print $2","$(NF-1)}' FileName
Output:
9781624311390,0
9781406273687,99
9781406273717,104
9781406273700,63

Combine lines with matching keys

I have a text file with the following structure
ID,operator,a,b,c,d,true
WCBP12236,J1,75.7,80.6,65.9,83.2,82.1
WCBP12236,J2,76.3,79.6,61.7,81.9,82.1
WCBP12236,S1,77.2,81.5,69.4,84.1,82.1
WCBP12236,S2,68.0,68.0,53.2,68.5,82.1
WCBP12234,J1,63.7,67.7,72.2,71.6,75.3
WCBP12234,J2,68.6,68.4,41.4,68.9,75.3
WCBP12234,S1,81.8,82.7,67.0,87.5,75.3
WCBP12234,S2,66.6,67.9,53.0,70.7,75.3
WCBP12238,J1,78.6,79.0,56.2,82.1,84.1
WCBP12239,J2,66.6,72.9,79.5,76.6,82.1
WCBP12239,S1,86.6,87.8,23.0,23.0,82.1
WCBP12239,S2,86.0,86.9,62.3,89.7,82.1
WCBP12239,J1,70.9,71.3,66.0,73.7,82.1
WCBP12238,J2,75.1,75.2,54.3,76.4,84.1
WCBP12238,S1,65.9,66.0,40.2,66.5,84.1
WCBP12238,S2,72.7,73.2,52.6,73.9,84.1
Each ID corresponds to a dataset which is analysed by an operator several times. i.e J1 and J2 are the first and second attempt by operator J. The measures a, b, c and d use 4 slightly different algorithms to measure a value whose true value lies in the column true
What I would like to do is to create 3 new text files comparing the results for J1 vs J2, S1 vs S2 and J1 vs S1. Example output for J1 vs J2:
ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,true
WCBP12236,75.7,76.3,80.6,79.6,65.9,61.7,83.2,81.9,82.1
WCBP12234,63.7,68.6,67.7,68.4,72.2,41.4,71.6,68.9,75.3
where a1 is measurement a for J1, etc.
Another example is for S1 vs S2:
ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,true
WCBP12236,77.2,68.0,81.5,68.0,69.4,53.2,84.1,68.5,82.1
WCBP12234,81.8,66.6,82.7,67.9,67.0,53,87.5,70.7,75.3
The IDs will not be in alphanumerical order nor will the operators be clustered for the same ID. I'm not certain how best to approach this task - using linux tools or a scripting language like perl/python.
My initial attempt using linux quickly hit a brick wall
First find all unique IDs (sorted)
awk -F, '/^WCBP/ {print $1}' file | uniq | sort -k 1.5n > unique_ids
Loop through these IDs and sort J1, J2:
foreach i (`more unique_ids`)
grep $i test.txt | egrep 'J[1-2]' | sort -t',' -k2
end
This gives me the data sorted
WCBP12234,J1,63.7,67.7,72.2,71.6,75.3
WCBP12234,J2,68.6,68.4,41.4,68.9,80.4
WCBP12236,J1,75.7,80.6,65.9,83.2,82.1
WCBP12236,J2,76.3,79.6,61.7,81.9,82.1
WCBP12238,J1,78.6,79.0,56.2,82.1,82.1
WCBP12238,J2,75.1,75.2,54.3,76.4,82.1
WCBP12239,J1,70.9,71.3,66.0,73.7,75.3
WCBP12239,J2,66.6,72.9,79.5,76.6,75.3
I'm not sure how to rearrange this data to get the desired structure. I tried adding an additional pipe to awk in the foreach loop awk 'BEGIN {RS="\n\n"} {print $1, $3,$10,$4,$11,$5,$12,$6,$13,$7}'
Any ideas? I'm sure this can be done in a less cumbersome manner using awk, although it may be better using a proper scripting language.

You can use the Perl csv module Text::CSV to extract the fields, and then store them in a hash, where ID is the main key, the second field is the secondary key and all the fields are stored as the value. It should then be trivial to do whatever comparisons you want. If you want to retain the original order of your lines, you can use an array inside the first loop.
use strict;
use warnings;
use Text::CSV;
my %data;
my $csv = Text::CSV->new({
binary => 1, # safety precaution
eol => $/, # important when using $csv->print()
});
while ( my $row = $csv->getline(*ARGV) ) {
my ($id, $J) = #$row; # first two fields
$data{$id}{$J} = $row; # store line
}

Python Way:
import os,sys, re, itertools
info=["WCBP12236,J1,75.7,80.6,65.9,83.2,82.1",
"WCBP12236,J2,76.3,79.6,61.7,81.9,82.1",
"WCBP12236,S1,77.2,81.5,69.4,84.1,82.1",
"WCBP12236,S2,68.0,68.0,53.2,68.5,82.1",
"WCBP12234,J1,63.7,67.7,72.2,71.6,75.3",
"WCBP12234,J2,68.6,68.4,41.4,68.9,80.4",
"WCBP12234,S1,81.8,82.7,67.0,87.5,75.3",
"WCBP12234,S2,66.6,67.9,53.0,70.7,72.7",
"WCBP12238,J1,78.6,79.0,56.2,82.1,82.1",
"WCBP12239,J2,66.6,72.9,79.5,76.6,75.3",
"WCBP12239,S1,86.6,87.8,23.0,23.0,82.1",
"WCBP12239,S2,86.0,86.9,62.3,89.7,82.1",
"WCBP12239,J1,70.9,71.3,66.0,73.7,75.3",
"WCBP12238,J2,75.1,75.2,54.3,76.4,82.1",
"WCBP12238,S1,65.9,66.0,40.2,66.5,80.4",
"WCBP12238,S2,72.7,73.2,52.6,73.9,72.7" ]
def extract_data(operator_1, operator_2):
operator_index=1
id_index=0
data={}
result=[]
ret=[]
for line in info:
conv_list=line.split(",")
if len(conv_list) > operator_index and ((operator_1.strip().upper() == conv_list[operator_index].strip().upper()) or (operator_2.strip().upper() == conv_list[operator_index].strip().upper()) ):
if data.has_key(conv_list[id_index]):
iters = [iter(conv_list[int(operator_index)+1:]), iter(data[conv_list[id_index]])]
data[conv_list[id_index]]=list(it.next() for it in itertools.cycle(iters))
continue
data[conv_list[id_index]]=conv_list[int(operator_index)+1:]
return data
ret=extract_data("j1", "s2")
print ret
O/P:
{'WCBP12239': ['70.9', '86.0', '71.3', '86.9', '66.0', '62.3', '73.7', '89.7', '75.3', '82.1'], 'WCBP12238': ['72.7', '78.6', '73.2', '79.0', '52.6', '56.2', '73.9', '82.1', '72.7', '82.1'], 'WCBP12234': ['66.6', '63.7', '67.9', '67.7', '53.0', '72.2', '70.7', '71.6', '72.7', '75.3'], 'WCBP12236': ['68.0', '75.7', '68.0', '80.6', '53.2', '65.9', '68.5', '83.2', '82.1', '82.1']}

I didn't use Text::CSV like TLP did. If you needed to you could but for this example, I thought since there were no embedded commas in the fields, I did a simple split on ','. Also, the true fields from both operators are listed (instead of just 1) as I thought the special case of the last value complicates the solution.
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw/ mesh /;
my %data;
while (<DATA>) {
chomp;
my ($id, $op, #vals) = split /,/;
$data{$id}{$op} = \#vals;
}
my #ops = ([qw/J1 J2/], [qw/S1 S2/], [qw/J1 S1/]);
for my $id (sort keys %data) {
for my $comb (#ops) {
open my $fh, ">>", "#$comb.txt" or die $!;
my $a1 = $data{$id}{ $comb->[0] };
my $a2 = $data{$id}{ $comb->[1] };
print $fh join(",", $id, mesh(#$a1, #$a2)), "\n";
close $fh or die $!;
}
}
__DATA__
WCBP12236,J1,75.7,80.6,65.9,83.2,82.1
WCBP12236,J2,76.3,79.6,61.7,81.9,82.1
WCBP12236,S1,77.2,81.5,69.4,84.1,82.1
WCBP12236,S2,68.0,68.0,53.2,68.5,82.1
WCBP12234,J1,63.7,67.7,72.2,71.6,75.3
WCBP12234,J2,68.6,68.4,41.4,68.9,75.3
WCBP12234,S1,81.8,82.7,67.0,87.5,75.3
WCBP12234,S2,66.6,67.9,53.0,70.7,75.3
WCBP12239,J1,78.6,79.0,56.2,82.1,82.1
WCBP12239,J2,66.6,72.9,79.5,76.6,82.1
WCBP12239,S1,86.6,87.8,23.0,23.0,82.1
WCBP12239,S2,86.0,86.9,62.3,89.7,82.1
WCBP12238,J1,70.9,71.3,66.0,73.7,84.1
WCBP12238,J2,75.1,75.2,54.3,76.4,84.1
WCBP12238,S1,65.9,66.0,40.2,66.5,84.1
WCBP12238,S2,72.7,73.2,52.6,73.9,84.1
The output files produced are below
J1 J2.txt
WCBP12234,63.7,68.6,67.7,68.4,72.2,41.4,71.6,68.9,75.3,75.3
WCBP12236,75.7,76.3,80.6,79.6,65.9,61.7,83.2,81.9,82.1,82.1
WCBP12238,70.9,75.1,71.3,75.2,66.0,54.3,73.7,76.4,84.1,84.1
WCBP12239,78.6,66.6,79.0,72.9,56.2,79.5,82.1,76.6,82.1,82.1
S1 S2.txt
WCBP12234,81.8,66.6,82.7,67.9,67.0,53.0,87.5,70.7,75.3,75.3
WCBP12236,77.2,68.0,81.5,68.0,69.4,53.2,84.1,68.5,82.1,82.1
WCBP12238,65.9,72.7,66.0,73.2,40.2,52.6,66.5,73.9,84.1,84.1
WCBP12239,86.6,86.0,87.8,86.9,23.0,62.3,23.0,89.7,82.1,82.1
J1 S1.txt
WCBP12234,63.7,81.8,67.7,82.7,72.2,67.0,71.6,87.5,75.3,75.3
WCBP12236,75.7,77.2,80.6,81.5,65.9,69.4,83.2,84.1,82.1,82.1
WCBP12238,70.9,65.9,71.3,66.0,66.0,40.2,73.7,66.5,84.1,84.1
WCBP12239,78.6,86.6,79.0,87.8,56.2,23.0,82.1,23.0,82.1,82.1
Update: To get only 1 true value, the for loop could be written like this:
for my $id (sort keys %data) {
for my $comb (#ops) {
local $" = '';
open my $fh, ">>", "#$comb.txt" or die $!;
my $a1 = $data{$id}{ $comb->[0] };
my $a2 = $data{$id}{ $comb->[1] };
pop #$a2;
my #mesh = grep defined, mesh(#$a1, #$a2);
print $fh join(",", $id, #mesh), "\n";
close $fh or die $!;
}
}
Update: Added 'defined' for test in grep expr. as it is the proper way (instead of just testing '$_', which possibly could be 0 and wrongly excluded for the list by grep).

Any problem that awk or sed can solve, there is no doubt that python, perl, java, go, c++, c can too. However, it is not necessary to write a complete program in any of them.
Use awk in one liner
VERSION 1
For the most use cases, I think the VERSION 1 is good enough.
tail -n +2 file | # the call to `tail` to remove the 1st line is not necessary
sort -t, -k 1,1 |
awk -F ',+' -v OFS=, '$2==x{id=$1;a=$3;b=$4;c=$5;d=$6} id==$1 && $2==y{$3=a","$3; $4=b","$4; $5=c","$5; $6=d","$6; $2=""; $0=$0; $1=$1; print}' \
x=J1 y=S1
Just replace the value of the argument x and y with what you like.
Please note the value of x and y must follow the alphabet order, e.g., x=J1 y=S1 is OK, but x=S1 y=J1 doesn't work.
VERSION 2
The limitation mentioned in VERSION 1 that you have to specify the x and y in alphabet order is removed. Like, x=S1 y=J1 is OK now.
tail -n +2 file | # the call to `tail` to remove the 1st line is not necessary
sort -t, -k 1,1 |
awk -F ',+' -v OFS=, 'id!=$1 && ($2==x||$2==y){z=$2==x?y:x; id=$1; a=$3;b=$4;c=$5;d=$6} id==$1 && $2==z{$3=a","$3;$4=b","$4;$5=c","$5;$6=d","$6; $2=""; $0=$0; $1=$1; print}' \
x=S1 y=J1
However, the data of J1 is still put before the data of S1, which means the column a1 in the resulting output is always the column a of J1 in the input file, and a2 in the resulting output is always the column a of S1 in the input file.
VERSION 3
The limitation mentioned in the VERSION 2 is removed. Now with x=S1 y=J1, the output column a1 would be the input column a of S1, and the a2 would be the a of J1.
tail -n +2 file | # the call to `tail` to remove the 1st line is not necessary
sort -t, -k 1,1 |
awk -F ',+' -v OFS=, 'id!=$1 && ($2==x||$2==y){z=$2==x?y:x; id=$1; a=$3;b=$4;c=$5;d=$6} id==$1 && $2==z{if (z==y) {$3=a","$3;$4=b","$4;$5=c","$5;$6=d","$6} else {$3=$3","a;$4=$4","b;$5=$5","c;$6=$6","d} $2=""; $0=$0; $1=$1; print}' \
x=S1 y=J1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Joining two different datasets using multiple keyvalues - python

I think you want a full outer join: select storm_id, cell_id, coalesce(d1.wind_speed, 0) as wind_speed, coalesce(d2.storm_surge, 0) as storm_surge from dataset1 d1 full join dataset2 d2 using (storm_id, cell_id);

awk -F, -v OFS=, '{x = $1 "," $2} FNR == NR {a[x] = $3; b[x] = 0; next} {b[x] = $3} !a[x] {a[x] = 0} END {for (i in a) print i, a[i], b[i]}' f1 f2 | sort -n Since it's a loop, awk produces random order. Hence sorting at the end.

Related

access multiple output array of python in bash

How to find matching rows of the first column and add quantities of the second column? Bash

Split csv file vertically using command line

python: Remove trailing 0's and decimal point from awk command

Combine lines with matching keys

Categories

Resources