I have file of data and i want to convert it to list like this :
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}
Example of one object of temp_list data:
[{'host': '197.109.77.178',
'user_name': 'kertzmann3129',
'time': '[21/Jun/2019:15:45:25 -0700]',
'request': '"DELETE /virtual/solutions/target/web+services HTTP/2.0"'}]
I tried to use this code but get error list index out of range :
import re
def logs():
with open("logdata.txt", "r") as file:
logdata = file.read()
return logdata
pattern1 = re.compile(r"\n")
pattern2 = re.compile(r"\s")
Temp_List = []
Final_List = []
All_Data = pattern1.split(logs())
for item in All_Data:
temp = pattern2.split(item)
Temp_List = { "host": temp[0], "user_name": temp[2],"time":temp[3]+' '+temp[4],"request":temp[5]+' '+temp[6]+' '+temp[7]}
Final_List.append(Temp_List)
Final_List
As Patrick Artner pointed it out, when your code executes the following line, there is no guarantee that you have sufficient number of elements in your temp list:
Temp_List = { "host": temp[0], "user_name": temp[2],"time":temp[3]+' '+temp[4],"request":temp[5]+' '+temp[6]+' '+temp[7]}
To avoid this, you should check whether you have the required number of elements in the array by puting an if statement in the code, similar to the one below:
if len(temp) < 8:
Temp_List = { "host": temp[0], "user_name": temp[2],"time":temp[3]+' '+temp[4],"request":temp[5]+' '+temp[6]+' '+temp[7]}
Other suggestion:
You don't need the regex library to split a string like that. You can simply call
my_string.split(r'\s')
Related
So I have a code that converts my category tree to a list and I wanted to convert it to CSV/json. Each item on list can have more ids as shown below.
def paths(tree):
tree_name = next(iter(tree.keys()))
if tree_name == 'children':
for child in tree['children']:
for descendant in paths(child):
yield (tree['id'],) + descendant
else:
yield (tree['id'],)
pprint.pprint(list(paths(tree)))
Output
[(461123, 1010022280, 10222044, 2222871,2222890),
(461123, 129893, 119894, 1110100250),
(461123, 98943, 944894, 9893445),
(461123, 9844495)]
Is there any way I can improve my code or have another code that converts list to json that looks below output?
Output should look like this
{
{
"column1": "462312",
"column2": "1010022280",
"column3": "10222044",
"column4": "2222871",
"column5": "2222890"
},
{
"column1": "461123",
"column2": "129893",
"column3": "119894",
"column4": "1110100250"
}
and so on...
}
if csv should look like this. ** Can be up to column 10
column1
column2
column3
column4
461123
129893
119894
1110100250
461123
129893
119894
Following is the code to convert list of tuple to a list of dict which you can convert to json and the second function turns the data to a csv
data = [(461123, 1010022280, 10222044, 2222871,2222890),
(461123, 129893, 119894, 1110100250),
(461123, 98943, 944894, 9893445),
(461123, 9844495)]
def convert_to_list_of_dicts(data):
output_list = []
for i in data:
data_dict = {}
for count,j in enumerate(i) :
data_dict["column" + str(count+1)] = j
output_list.append(data_dict)
return output_list
# print(convert_to_list_of_dicts(data))
def convert_to_csv(data):
max_column_num = 0
for i in data:
if len(i) > max_column_num:
max_column_num = len(i)
columns = ["column" + str(i+1) for i in range(max_column_num)]
newdata = [tuple(columns)]
for tup in data:
newdata.append(tup)
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(newdata)
# convert_to_csv(data)
This is my json file input.
{"Report":{"id":101,"type":"typeA","Replist":[{"rptid":"r001","subrpt":{"subid":74,"subname":"name1","subval":113},"RelsubList":[{"Relid":8,"Relsubdetails":{"Rel_subname":"name8","Rel_Subval":65}},{"Relid":5,"Relsubdetails":{"Rel_subname":"name5","Rel_Subval":40}}],"fldA":30,"fldB":23}]}}
...
I am writing python program to convert the input into the below format in my dictionary.
I am new to python.
Expected output:
out: {"id": "101", "type": "typeA", "rptid": "r001", "subrpt_subid": "74", "subrpt_subname": "name1", "subrpt_subval":"113","Relid":"8","Rel_subname":"name8","Rel_Subval":"65","Relid":"5","Rel_subname":"name5","Rel_Subval":"40","fldA":"30","fldB":"23"
I used the following logic to convert the output till subrpt.
Current output:
out: {'id': '101', 'type': 'typeA', 'rptid': 'r001', 'subrpt_subid': '74', 'subrpt_subname': 'name1', 'subrpt_subval': '113'}
But I am struggling to get the logic of RelsubList(it looks like it has both list and dictionary[{}] ).
please help me to get the logic for the same.
import json
list1 = []
dict1 = {}
dict2 = {}
data_file = "samp1.json"
file = open(data_file)
for line in file:
json_line = json.loads(line)
json_line = json_line["Report"]
dict1["id"]=str(json_line["id"])
dict1["type"] = str(json_line["type"])
json_line = json_line["Replist"]
dict1["rptid"]= str(json_line[0]["rptid"])
dict1["subrpt_subid"] = str(json_line[0]["subrpt"]["subid"])
dict1["subrpt_subname"] = str(json_line[0]["subrpt"]["subname"])
dict1["subrpt_subval"] = str(json_line[0]["subrpt"]["subval"])
print("out:", dict1)
Some of your logic is confusing to me, i.e. why are you doing json.loads(line) in every loop?
Anyway, the following should get you the logic for RealsubList:
import json
f = open("data.json")
data = json.load(f)
for line in data:
relsublist = data["Report"]["Replist"][0]["RelsubList"]
print(relsublist)
Results in:
[{'Relid': 8, 'Relsubdetails': {'Rel_subname': 'name8', 'Rel_Subval': 65}}, {'Relid': 5, 'Relsubdetails': {'Rel_subname': 'name5', 'Rel_Subval': 40}}]
The reason for the [0] index after ["Replist"] is Replist contains an array of nested dictionaries, so you need to call it out by index. In this case its only a single array, so it would be 0
I have this list of hierarchical URLs:
data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
And I want to dynamically make a nested dictionary for every URL for which there exists another URL that is a subdomain/subfolder of it.
I already tried the follwoing but it is not returning what I expect:
result = []
for key,d in enumerate(data):
form_dict = {}
r_pattern = re.search(r"(http(s)?://(.*?)/)(.*)",d)
r = r_pattern.group(4)
if r == "":
parent_url = r_pattern.group(3)
else:
parent_url = r_pattern.group(3) + "/"+r
print(parent_url)
temp_list = data.copy()
temp_list.pop(key)
form_dict["name"] = parent_url
form_dict["children"] = []
for t in temp_list:
child_dict = {}
if parent_url in t:
child_dict["name"] = t
form_dict["children"].append(child_dict.copy())
result.append(form_dict)
This is the expected output.
{
"name":"https://python-rq.org/",
"children":[
{
"name":"https://python-rq.org/a",
"children":[
{
"name":"https://python-rq.org/a/b",
"children":[
]
}
]
},
{
"name":"https://python-rq.org/c",
"children":[
]
}
]
}
Any advice?
This was a nice problem. I tried going on with your regex method but got stuck and found out that split was actually appropriate for this case. The following works:
data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
temp_list = data.copy()
# This removes the last "/" if any URL ends with one. It makes it a lot easier
# to match the URLs and is not necessary to have a correct link.
data = [x[:-1] if x[-1]=="/" else x for x in data]
print(data)
result = []
# To find a matching parent
def find_match(d, res):
for t in res:
if d == t["name"]:
return t
elif ( len(t["children"])>0 ):
temp = find_match(d, t["children"])
if (temp):
return temp
return None
while len(data) > 0:
d = data[0]
form_dict = {}
l = d.split("/")
# I removed regex as matching the last parentheses wasn't working out
# split does just what you need however
parent = "/".join(l[:-1])
data.pop(0)
form_dict["name"] = d
form_dict["children"] = []
option = find_match(parent, result)
if (option):
option["children"].append(form_dict)
else:
result.append(form_dict)
print(result)
[{'name': 'https://python-rq.org', 'children': [{'name': 'https://python-rq.org/a', 'children': [{'name': 'https://python-rq.org/a/b', 'children': []}]}, {'name': 'https://python-rq.org/c', 'children': []}]}]
When i try to send the bulk_data to the local elasticsearch, my data isn't loaded because of the SerializationError.
I already tried to fill the empty cells in the csv file, but that wasn't the solution.
from elasticsearch import Elasticsearch
bulk_data = []
header = []
count = 0
for row in csv_file_object:
if count > 0 :
data_dict = {}
for i in range(len(row)):
row = row.rstrip()
data_dict[header[i]] = row[i]
op_dict = {
"index": {
"_index": INDEX_NAME,
"_type": TYPE_NAME,
}
}
bulk_data.append(op_dict)
bulk_data.append(data_dict)
else:
header = row
count = count+1
# create ES client, create index
es = Elasticsearch(hosts = [ES_HOST])
if es.indices.exists(INDEX_NAME):
print("deleting '%s' index..." % (INDEX_NAME))
res = es.indices.delete(index = INDEX_NAME)
res = es.bulk(index = INDEX_NAME, body = bulk_data, refresh = True)
See image for the SerializationError and bulk_data values:
Please note: the \n is added by the serialization process itself.
I try to repond to you but I can't understand one thing. How you retrieve your field name from data? In your code I see that you retrieve it from a list called header that is empty? I can't understand how you take this value.. Check my answer i don't know if i understand well
from elasticsearch import Elasticsearch
from elasticsearch import helpers
index_name = "your_index_name"
doc_type = "your_doc_type"
esConnector = Elasticsearch(["http://192.168.1.1:9200/"])
# change your ip here
count = 0
def generate_data(csv_file_object)
with open(csv_file_object, "r") as f:
for line in f:
line = line.split(",").rstrip()
data_dict = {header[count]: line}
obj={
'_op_type': 'index',
'_index': index_name,
'_type': doc_type,
'_id': count+1,
'_source': data_dict
}
count +=1
yield obj
for success, info in helpers.parallel_bulk(client=esConnector, actions=generate_data(csv_file_object), thread_count=4):
if not success:
print 'Doc failed', info
I am new to python and am trying to read a file and create a dictionary from it.
The format is as follows:
.1.3.6.1.4.1.14823.1.1.27 {
TYPE = Switch
VENDOR = Aruba
MODEL = ArubaS3500-48T
CERTIFICATION = CERTIFIED
CONT = Aruba-Switch
HEALTH = ARUBA-Controller
VLAN = Dot1q INSTRUMENTATION:
Card-Fault = ArubaController:DeviceID
CPU/Memory = ArubaController:DeviceID
Environment = ArubaSysExt:DeviceID
Interface-Fault = MIB2
Interface-Performance = MIB2
Port-Fault = MIB2
Port-Performance = MIB2
}
The first line OID (.1.3.6.1.4.1.14823.1.1.27 { ) I want this to be the key and the remaining lines are the values until the }
I have tried a few combinations but am not able to get the correct regex to match these
Any help please?
I have tried something like
lines = cache.readlines()
for line in lines:
searchObj = re.search(r'(^.\d.*{)(.*)$', line)
if searchObj:
(oid, cert ) = searchObj.groups()
results[searchObj(oid)] = ", ".join(line[1:])
print("searchObj.group() : ", searchObj.group(1))
print("searchObj.group(1) : ", searchObj.group(2))
You can try this:
import re
data = open('filename.txt').read()
the_key = re.findall("^\n*[\.\d]+", data)
values = [re.split("\s+\=\s+", i) for i in re.findall("[a-zA-Z0-9]+\s*\=\s*[a-zA-Z0-9]+", data)]
final_data = {the_key[0]:dict(values)}
Output:
{'\n.1.3.6.1.4.1.14823.1.1.27': {'VENDOR': 'Aruba', 'CERTIFICATION': 'CERTIFIED', 'Fault': 'MIB2', 'VLAN': 'Dot1q', 'Environment': 'ArubaSysExt', 'HEALTH': 'ARUBA', 'Memory': 'ArubaController', 'Performance': 'MIB2', 'CONT': 'Aruba', 'MODEL': 'ArubaS3500', 'TYPE': 'Switch'}}
You could use a nested dict comprehension along with an outer and inner regex.
Your blocks can be separated by
.numbers...numbers.. {
// values here
}
In terms of regular expression this can be formulated as
^\s* # start of line + whitespaces, eventually
(?P<key>\.[\d.]+)\s* # the key
{(?P<values>[^{}]+)} # everything between { and }
As you see, we split the parts into key/value pairs.
Your "inner" structure can be formulated like
(?P<key>\b[A-Z][-/\w]+\b) # the "inner" key
\s*=\s* # whitespaces, =, whitespaces
(?P<value>.+) # the value
Now let's build the "outer" and "inner" expressions together:
rx_outer = re.compile(r'^\s*(?P<key>\.[\d.]+)\s*{(?P<values>[^{}]+)}', re.MULTILINE)
rx_inner = re.compile(r'(?P<key>\b[A-Z][-/\w]+\b)\s*=\s*(?P<value>.+)')
result = {item.group('key'):
{match.group('key'): match.group('value')
for match in rx_inner.finditer(item.group('values'))}
for item in rx_outer.finditer(string)}
print(result)
A demo can be found on ideone.com.