How to trim a json doc to single line - python

I'm using json2yaml to convert a yaml doc to a json string. Now i want to pass the json doc as single-line argument as ansible_extravars in order to override the repository settings.
For example:
Yaml:
container:
name: "summary"
version: "1.0.0"
some_level: "3"
another_nested:
schema: "summary"
items_string: "really just a string of words"
json doc: (this was generated by the 'json2yaml' webpage)
{ "container": {
"name": "summary",
"version": "1.0.0",
"some_level": "3",
"another_nested": {
"schema": "summary",
"items_string": "really just a string of words"
}
}
}
I was using a shell command as follows:
% cat json_text.txt | tr -d '[:space:]'
Which is obviously also stripping white-space from the container.another_nested.items_string
Output:
{"container":{"name":"summary","version":"1.0.0","some_level":"3","another_nested":{"schema":"summary","items_string":"reallyjustastringofwords"}}}
How can i convert a json doc to single line and preserve white-space in quoted strings?

You only need to remove the line breaks with tr -d '[\r\n]' instead of all white space.

I came up with the following python script:
#!/usr/bin/env python
import sys
filter_chars = [' ', '\t', '\n', '\r']
def is_filtered_char(c):
return c in filter_chars
def is_filtered_eol_char(c):
return c in ['\n', '\r']
inside_double_quoted_string = inside_single_quoted_string = False
some_chars = sys.stdin.read()
for c in some_chars:
if c == '"':
if not inside_double_quoted_string:
inside_double_quoted_string = True
else:
inside_double_quoted_string = False
if c == "'":
if not inside_single_quoted_string:
inside_single_quoted_string = True
else:
inside_single_quoted_string = False
if not inside_double_quoted_string and not inside_single_quoted_string and not is_filtered_char(c):
sys.stdout.write(c)
elif (inside_double_quoted_string or inside_single_quoted_string) and not is_filtered_eol_char(c):
sys.stdout.write(c)
```
Usgage:
% cat blah.txt | ./remove_ws.py
{"container":{"name":"summary","version":"1.0.0","some_level":"3","another_nested":{"schema":"summary","items_string":"really just a string of words"}}}

Related

Python How to add nested fields to Yaml file

I need to modify a YAML file and add several fields.I am using the ruamel.yaml package.
First I load the YAML file:
data = yaml.load(file_name)
I can easily add new simple fields, like-
data['prop1'] = "value1"
The problem I face is that I need to add a nested dictionary incorporate with array:
prop2:
prop3:
- prop4:
prop5: "Some title"
prop6: "Some more data"
I tried to define-
record_to_add = dict(prop2 = dict(prop3 = ['prop4']))
This is working, but when I try to add beneath it prop5 it fails-
record_to_add = dict(prop2 = dict(prop3 = ['prop4'= dict(prop5 = "Value")]))
I get
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?
What am I doing wrong?
The problem has little to do with ruamel.yaml. This:
['prop4'= dict(prop5 = "Value")]
is invalid Python as a list ([ ]) expects comma separated values. You would need to use something like:
record_to_add = dict(prop2 = dict(prop3 = dict(prop4= [dict(prop5 = "Some title"), dict(prop6='Some more data'),])))
As your program is incomplete I am not sure if you are using the old API or not. Make sure to use
import ruamel.yaml
yaml = ruamel.yaml.YAML()
and not
import ruamel.yaml as yaml
Its because of having ['prop4'= <> ].Instead record_to_add = dict(prop2 = dict(prop3 = [dict(prop4 = dict(prop5 = "Value"))])) should work.
Another alternate would be,
import yaml
data = {
"prop1": {
"prop3":
[{ "prop4":
{
"prop5": "some title",
"prop6": "some more data"
}
}]
}
}
with open(filename, 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)

Converting dictionary into a list of dictionaries

So, I've been tasked with converting a string into a dict (has to be using regex). I've done a findall to separate each element but not sure how to put it together.
I have the following code:
import re
def edata():
with open("employeedata.txt", "r") as file:
employeedata = file.read()
IP_field = re.findall(r"\d+[.]\d+[.]\d+[.]\d+", employeedata)
username_field = re.findall (r"[a-z]+\d+|- -", employeedata)
date_field = re.findall (r"\d+\/[A-Z][a-z][0-9]+\/\d\d\d\d:\d+:\d+:\d+ -\d+", employeedata)
type_field = re.findall (r'"(.*)?"', employeedata)
Fields = ["IP","username","date","type"]
Fields2 = IP_field, username_field, date_field, type_field
dictionary = dict(zip(Fields,Fields2))
return dictionary
print(edata())
Current output:
{ "IP": ["190.912.120.151", "190.912.120.151"], "username": ["skynet10001", "skynet10001"] etc }
Expected output:
[{ "IP": "190.912.120.151", "username": "skynet10001" etc },
{ "IP": "190.912.120.151", "username": "skynet10001" etc }]
Another solution that uses the dictionary that you have already constructed. This code uses list comprehension and the zip function to produce a list of dictionaries from the existing dictionary variable.
import re
def edata():
with open("employeedata.txt", "r") as file:
employeedata = file.read()
IP_field = re.findall(r"\d+[.]\d+[.]\d+[.]\d+", employeedata)
username_field = re.findall (r"[a-z]+\d+|- -", employeedata)
date_field = re.findall (r"\[(.*?)\]", employeedata) ## changed your regex for the date field
type_field = re.findall (r'"(.*)?"', employeedata)
Fields = ["IP","username","date","type"]
Fields2 = IP_field, username_field, date_field, type_field
dictionary = dict(zip(Fields,Fields2))
result_dictionary = [dict(zip(dictionary, i)) for i in zip(*dictionary.values())] ## convert to list of dictionaries
return result_dictionary
print(edata())
You can use
import re
rx = re.compile(r'^(?P<IP>\d+(?:\.\d+){3})\s+\S+\s+(?P<Username>[a-z]+\d+)\s+\[(?P<Date>[^][]+)]\s+"(?P<Type>[^"]*)"')
def edata():
results = []
with open("downloads/employeedata.txt", "r") as file:
for line in file:
match = rx.search(line)
if match:
results.append(match.groupdict())
return results
print(edata())
See the online Python demo. For the file = ['190.912.120.151 - skynet10001 [19/Jan/2012] "Temp"', '221.143.119.260 - terminator002 [16/Feb/2021] "Temp 2"'] input, the output will be:
[{'IP': '190.912.120.151', 'Username': 'skynet10001', 'Date': '19/Jan/2012', 'Type': 'Temp'}, {'IP': '221.143.119.260', 'Username': 'terminator002', 'Date': '16/Feb/2021', 'Type': 'Temp 2'}]
The regex is
^(?P<IP>\d+(?:\.\d+){3})\s+\S+\s+(?P<Username>[a-z]+\d+)\s+\[(?P<Date>[^][]+)]\s+"(?P<Type>[^"]*)"
See the regex demo. Details:
^ - start of string
(?P<IP>\d+(?:\.\d+){3}) - Group "IP": one or more digits and then three occurrences of a . and one or more digits
\s+\S+\s+ - one or more non-whitespace chars enclosed with one or more whitespace chars on both ends
(?P<Username>[a-z]+\d+) - Group "Username": one or more lowercase ASCII letters and then one or more digits
\s+ - one or more whitespaces
\[ - a [ char
(?P<Date>[^][]+) - Group "Date": one or more chars other than [ and ]
]\s+" - a ] char, one or more whitespaces, "
(?P<Type>[^"]*) - Group "Type": zero or more chars other than "
" - a " char.

A regex for CSV parsing? Python3 Re module

Is there a regex (Python re compatible) that I can use for parsing csv?
EDIT: I didn't realize there was a csv module in Python's standard library
Here's the regex: (?<!,\"\w)\s*,(?!\w\s*\",). It's python compatible and JavaScript compatible. Here's the full parsing script (as a python function):
def parseCSV(csvDoc, output_type="dict"):
from re import compile as c
from json import dumps
from numpy import array
# This is where all the parsing happens
"""
To parse csv files.
Arguments:
csvDoc - The csv document to parse.
output_type - the output type this
function will return
"""
csvparser = c('(?<!,\"\\w)\\s*,(?!\\w\\s*\",)')
lines = str(csvDoc).split('\n')
# All the lines are not empty
necessary_lines = [line for line in lines if line != ""]
All = array([csvparser.split(line) for line in necessary_lines])
if output_type.lower() in ("dict", "json"): # If you want JSON or dict
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
elif output_type.lower() in ("list",
"numpy",
"array",
"matrix",
"np.array",
"np.ndarray",
"numpy.array",
"numpy.ndarray"):
return All
else:
# All the python dict keys required (At the top of the file or top row)
top_line = list(All[0])
main_table = {} # The parsed data will be here
main_table[top_line[0]] = {
name[0]: {
thing: name[
# The 'actual value' counterpart
top_line.index(thing)
] for thing in top_line[1:] # The requirements
} for name in All[1:]
}
return dumps(main_table, skipkeys=True, ensure_ascii=False, indent=1)
Dependancies: NumPy
All you need to do is chuck in the raw text of the csv file and then the function will return a json (or a 2-dimension list if you wish) in this format:
{"top-left-corner name":{
"foo":{"Item 1 left to foo":"Item 2 of the top row",
"Item 2 left to foo":"Item 3 of the top row",
...}
"bar":{...}
}
}
And here's an example of it:
CSV.csv
foo,bar,zbar
foo_row,foo1,,
barie,"2,000",,
and it outputs:
{
"foo": {
"foo_row": {
"bar": "foo1",
"zbar": ""
},
"barie": {
"bar": "\"2,000\"",
"zbar": ""
}
}
}
It should work if your csv file is formatted correctly (The ones I tested was made by apple's Numbers)

Dictionary from a String with particular structure

I am using python 3 to read this file and convert it to a dictionary.
I have this string from a file and I would like to know how could be possible to create a dictionary from it.
[User]
Date=10/26/2003
Time=09:01:01 AM
User=teodor
UserText=Max Cor
UserTextUnicode=392039n9dj90j32
[System]
Type=Absolute
Dnumber=QS236
Software=1.1.1.2
BuildNr=0923875
Source=LAM
Column=OWKD
[Build]
StageX=12345
Spotter=2
ApertureX=0.0098743
ApertureY=0.2431899
ShiftXYZ=-4.234809e-002
[Text]
Text=Here is the Text files
DataBaseNumber=The database number is 918723
..... (There are more than 1000 lines per file) ...
On the text I have "Name=Something" and then I would like to convert it as follows:
{'Date':'10/26/2003',
'Time':'09:01:01 AM'
'User':'teodor'
'UserText':'Max Cor'
'UserTextUnicode':'392039n9dj90j32'.......}
The word between [ ] can be removed, like [User], [System], [Build], [Text], etc...
In some fields there is only the first part of the string:
[Colors]
Red=
Blue=
Yellow=
DarkBlue=
What you have is an ordinary properties file. You can use this example to read the values into map:
try (InputStream input = new FileInputStream("your_file_path")) {
Properties prop = new Properties();
prop.load(input);
// prop.getProperty("User") == "teodor"
} catch (IOException ex) {
ex.printStackTrace();
}
EDIT:
For Python solution, refer to the answerred question.
You can use configparser to read .ini, or .properties files (format you have).
import configparser
config = configparser.ConfigParser()
config.read('your_file_path')
# config['User'] == {'Date': '10/26/2003', 'Time': '09:01:01 AM'...}
# config['User']['User'] == 'teodor'
# config['System'] == {'Type': 'Abosulte', ...}
Can easily be done in python. Assuming your file is named test.txt.
This will also work for lines with nothing after the = as well as lines with multiple =.
d = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.strip() # Remove any space or newline characters
parts = line.split('=') # Split around the `=`
if len(parts) > 1:
d[parts[0]] = ''.join(parts[1:])
print(d)
Output:
{
"Date": "10/26/2003",
"Time": "09:01:01 AM",
"User": "teodor",
"UserText": "Max Cor",
"UserTextUnicode": "392039n9dj90j32",
"Type": "Absolute",
"Dnumber": "QS236",
"Software": "1.1.1.2",
"BuildNr": "0923875",
"Source": "LAM",
"Column": "OWKD",
"StageX": "12345",
"Spotter": "2",
"ApertureX": "0.0098743",
"ApertureY": "0.2431899",
"ShiftXYZ": "-4.234809e-002",
"Text": "Here is the Text files",
"DataBaseNumber": "The database number is 918723"
}
I would suggest to do some cleaning to get rid of the [] lines.
After that you can split those lines by the "=" separator and then convert it to a dictionary.

Python string.Template: Replace field containing spaces

Using Python's string.Template class - how might I utilize the ${} for fields in a dictionary that contain spaces?
E.g.
t = string.Template("hello ${some field}")
d = { "some field": "world" }
print( t.substitute(d) ) # Returns "invalid placeholder in string"
Edit: Here's the closest I could get, with the caveat being that all variables need to be wrapped in a brackets (otherwise all space separated words would be matched).
class MyTemplate(string.Template):
delimiter = '$'
idpattern = '[_a-z][\s_a-z0-9]*'
t = MyTemplate("${foo foo} world ${bar}")
s = t.substitute({ "foo foo": "hello", "bar": "goodbye" })
# hello world goodbye
Just in case this might be helpful to somebody else. In python 3 you can use format_map:
t = "hello {some field}"
d = { "some field": "world" }
print( t.format_map(d) )
# hello world
From Doc it says that we could use Template option
https://docs.python.org/dev/library/string.html#template-strings
import string
class MyTemplate(string.Template):
delimiter = '%'
idpattern = '[a-z]+ [a-z]+'
t = MyTemplate('%% %with_underscore %notunderscored')
d = { 'with_underscore':'replaced',
'notunderscored':'not replaced',
}
print t.safe_substitute(d)

Categories

Resources