I am wanting to write a script to
fetch information then returning Json file
filter Json file
then converting that Json to CSV.
I have figured out steps 1 and 2, but am stuck on steps 3. Currently I have to use an online Json to CSV converter to get the desired output.
The Online Json to CSV tool uses python for users to connect to it's API to use the conversation tool. Possibly means that the tool itself is a python module.
Json file to convert
[{
"matchId":"2068447050405",
"timestamp":1658361314,
"clubs": {
"39335": {
"toa":"486",
"details": {
"name":"Team one",
"clubId":39335
}},
"111655": {
"toa":"229",
"details": {
"name":"Team two",
"clubId":111655
}}},
"players": {
"39335": {
"189908959": {
"position":"defenseMen",
"toiseconds":"3600",
"playername":"player one"
},
"828715674": {
"position":"rightWing",
"toiseconds":"3600",
"playername":"player two"
}},
"111655": {
"515447555": {
"position":"defenseMen",
"toiseconds":"3600",
"playername":"player three"
},
"806370074": {
"position":"center",
"toiseconds":"3600",
"playername":"player four"
}}}}]
Desired output csv code
"matchId","timestamp","clubs__|","clubs__|__toa","clubs__|__details__name","clubs__|__details__clubId","players__|","players__||","players__||__position","players__||__toiseconds","players__||__playername"
"2068447050405","1658361314","39335","486","Team one","39335","39335","189908959","defenseMen","3600","player one"
"2068447050405","1658361314","111655","229","Team two","111655","39335","828715674","rightWing","3600","player two"
"2068447050405","1658361314","","","","","111655","515447555","defenseMen","3600","player three"
"2068447050405","1658361314","","","","","111655","806370074","center","3600","player four"
How it looks in a spreadsheet
Sheet example
Some believe the filter is having an effect on how the csv out put is formatted, here is a link to the full json file and csv output of that file. Code is to long to post on this page.
Original JSON before filter
Original JSON
CSV output of original JSON file
CSV output
Edit
I should have mentioned this, The "Jason file to convert is only a small sample of the actual Json I wish to convert. I assumed I would be able to simple add to the code used to answer, I was wrong.
The Json I intend to use has 9 total columns for clubs and 52 columns for Players.
I'm working hard to really grok jq, so here you go: with no explanation:
jq -r '
.[]
| [.matchId, .timestamp] as [$matchId, $timestamp]
| (.players | [to_entries[] | .key as $id1 | .value | to_entries[] | [$id1, .key, .value.position, .value.toiseconds, .value.playername]]) as $players
| (.clubs | [to_entries[] | [.key, .value.toa, .value.details.name, .value.details.clubId]]) as $clubs
| range([$players, $clubs] | map(length) | max)
| [$matchId, $timestamp] + ($clubs[.] // ["","","",""]) + ($players[.] // ["","","","",""])
| #csv
' file.json
"2068447050405",1658361314,"39335","486","Team one",39335,"39335","189908959","defenseMen","3600","player one"
"2068447050405",1658361314,"111655","229","Team two",111655,"39335","828715674","rightWing","3600","player two"
"2068447050405",1658361314,"","","","","111655","515447555","defenseMen","3600","player three"
"2068447050405",1658361314,"","","","","111655","806370074","center","3600","player four"
The default value arrays of empty strings needs to be the same size as the amount of "real" data you're grabbing.
Since this is a PITA to keep aligned, an update:
jq -r '
def empty_strings: reduce range(length) as $i ([]; . + [""]);
.[]
| [.matchId, .timestamp] as [$matchId, $timestamp]
| (.players | [to_entries[] | .key as $id1 | .value | to_entries[] | [$id1, .key, .value.position, .value.toiseconds, .value.playername]]) as $players
| (.clubs | [to_entries[] | [.key, .value.toa, .value.details.name, .value.details.clubId]]) as $clubs
| range([$players, $clubs] | map(length) | max)
| [$matchId, $timestamp]
+ ($clubs[.] // ($clubs[0] | empty_strings))
+ ($players[.] // ($players[0] | empty_strings))
| #csv
' file.json
I have the following string
string = "OGC Number | LT No | Job /n 9625878 | EPP3234 | 1206545/n" and continues on
I am trying to write it to a .CSV file where it will look like this:
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
where each newline in the string is a new row
where each "|" in the sting is a new column
I am having trouble getting the formatting.
I think I need to use:
string.split('/n')
string.split('|')
Thanks.
Windows 7, Python 2.6
Untested:
text="""
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454"""
import csv
lines = text.splitlines()
with open('outputfile.csv', 'wb') as fout:
csvout = csv.writer(fout)
csvout.writerow(lines[0]) # header
for row in lines[2:]: # content
csvout.writerow([col.strip() for col in row.split('|')])
If you are interested in using a third party module. Prettytable is very useful and has a nice set of features to deal with and print tabular data.
EDIT: Oops, I missunderstood your question!
The code below will use two regular expressions to do the modifications.
import re
str="""OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
"""
# just setup above
# remove all lines with at least 4 dashes
str=re.sub( r'----+\n', '', str )
# replace all pipe symbols with their
# surrounding spaces by single semicolons
str=re.sub( r' +\| +', ';', str )
print str
I just started learning python scripting yesterday and I've already gotten stuck. :(
So I have a data file with a lot of different information in various fields.
Formatted basically like...
Name (tab) Start# (tab) End# (tab) A bunch of fields I need but do not do anything with
Repeat
I need to write a script that takes the start and end numbers, and add/subtract a number accordingly depending on whether another field says + or -.
I know that I can replace words with something like this:
x = open("infile")
y = open("outfile","a")
while 1:
line = f.readline()
if not line: break
line = line.replace("blah","blahblahblah")
y.write(line + "\n")
y.close()
But I've looked at all sorts of different places and I can't figure out how to extract specific fields from each line, read one field, and change other fields. I read that you can read the lines into arrays, but can't seem to find out how to do it.
Any help would be great!
EDIT:
Example of a line from the data here: (Each | represents a tab character)
| |
V V
chr21 | 33025905 | 33031813 | ENST00000449339.1 | 0 | **-** | 33031813 | 33031813 | 0 | 3 | 1835,294,104, | 0,4341,5804,
chr21 | 33036618 | 33036795 | ENST00000458922.1 | 0 | **+** | 33036795 | 33036795 | 0 | 1 | 177, | 0,
The second and third columns (indicated by arrows) would be the ones that I'd need to read/change.
You can use csv to do the splitting, although for these sorts of problems, I usually just use str.split:
with open(infile) as fin,open('outfile','w') as fout:
for line in fin:
#use line.split('\t'3) if the name of the field can contain spaces
name,start,end,rest = line.split(None,3)
#do something to change start and end here.
#Note that `start` and `end` are strings, but they can easily be changed
#using `int` or `float` builtins.
fout.write('\t'.join((name,start,end,rest)))
csv is nice if you want to split lines like this:
this is a "single argument"
into:
['this','is','a','single argument']
but it doesn't seem like you need that here.
Could somebody help me figure out a simple way of doing this using any script ? I will be running the script on Linux
1 ) I have a file1 which has the following lines :
(Bank8GntR[3] | Bank8GntR[2] | Bank8GntR[1] | Bank8GntR[0] ),
(Bank7GntR[3] | Bank7GntR[2] | Bank7GntR[1] | Bank7GntR[0] ),
(Bank6GntR[3] | Bank6GntR[2] | Bank6GntR[1] | Bank6GntR[0] ),
(Bank5GntR[3] | Bank5GntR[2] | Bank5GntR[1] | Bank5GntR[0] ),
2 ) I need the contents of file1 to be modified as following and written to a file2
(Bank15GntR[3] | Bank15GntR[2] | Bank15GntR[1] | Bank15GntR[0] ),
(Bank14GntR[3] | Bank14GntR[2] | Bank14GntR[1] | Bank14GntR[0] ),
(Bank13GntR[3] | Bank13GntR[2] | Bank13GntR[1] | Bank13GntR[0] ),
(Bank12GntR[3] | Bank12GntR[2] | Bank12GntR[1] | Bank12GntR[0] ),
So I have to:
read each line from the file1,
use "search" using regular expression,
to match Bank[0-9]GntR,
replace \1 with "7 added to number matched",
insert it back into the line,
write the line into a new file.
How about something like this in Python:
# a function that adds 7 to a matched group.
# groups 1 and 2, we grabbed (Bank) to avoid catching the digits in brackets.
def plus7(matchobj):
return '%s%d' % (matchobj.group(1), int(matchobj.group(2)) + 7)
# iterate over the input file, have access to the output file.
with open('in.txt') as fhi, open('out.txt', 'w') as fho:
for line in fhi:
fho.write(re.sub('(Bank)(\d+)', plus7, line))
Assuming you don't have to use python, you can do this using awk:
cat test.txt | awk 'match($0, /Bank([0-9]+)GntR/, nums) { d=nums[1]+7; gsub(/Bank[0-9]+GntR\[/, "Bank" d "GntR["); print }'
This gives the desired output.
The point here is that match will match your data and allows capturing groups which you can use to extract out the number. As awk supports arithmetic, you can then add 7 within awk and then do a replacement on all the values in the rest of the line. Note, I've assumed all the values in the line have the same digit in them.