Increment id during loop in list of collections - python

I try to increment a list at each iteration of a loop :
ads = []
page = {}
page['titre'] = "Title here"
page['nombre_pages'] = 396
i = 1
total = 3
while i <= total:
print(i)
page['id'] = i
ads.append(page)
i += 1
this return
[{'titre': 'Title here', 'nombre_pages': 396, 'id': 3}, {'titre': 'Title here', 'nombre_pages': 396, 'id': 3}, {'titre': 'Title here', 'nombre_pages': 396, 'id': 3}]
I don't understand why the same id 3 times and not id:1, id:2, id:3
When print page['id'] is ok (increment), ads.append(page['id']) is available too.
Can you help ?
Thanks

you're only creating a single "page" object, i.e. by doing:
page = {}
and referring to it from several index locations in ads. you probably want to be doing something closer to:
ads = []
i = 1
total = 3
while i <= total:
print(i)
page = {}
page['titre'] = "Title here"
page['nombre_pages'] = 396
page['id'] = i
ads.append(page)
i += 1
or slightly more idiomatically:
ads = []
total = 3
for i in range(total):
ads.append({
'nombre_pages': 396,
'titre': "Title here",
'id': i,
})

Related

Convert complex comma-separated string into Python dictionary

I am getting following string format from csv file in Pandas
"title = matrix, genre = action, year = 2000, rate = 8"
How can I change the string value into a python dictionary like this:
movie = "title = matrix, genre = action, year = 2000, rate = 8"
movie = {
"title": "matrix",
"genre": "action",
"year": "1964",
"rate":"8"
}
You can split the string and then convert it into a dictionary.
A sample code is given below
movie = "title = matrix, genre = action, year = 2000, rate = 8"
movie = movie.split(",")
# print(movie)
tempMovie = [i.split("=") for i in movie]
movie = {}
for i in tempMovie:
movie[i[0].strip()] = i[1].strip()
print(movie)
For the solution you can use regex
import re
input_user = "title = matrix, genre = action, year = 2000, rate = 8"
# Create a pattern to match the key-value pairs
pattern = re.compile(r"(\w+) = ([\w,]+)" )
# Find all matches in the input string
matches = pattern.findall(input_user)
# Convert the matches to a dictionary
result = {key: value for key, value in matches}
print(result)
The result:
{'title': 'matrix,', 'genre': 'action,', 'year': '2000,', 'rate': '8'}
I hope this can solve your problem.
movie = "title = matrix, genre = action, year = 2000, rate = 8"
dict_all_movies = {}
for idx in df.index:
str_movie = df.at[idx, str_movie_column]
movie_dict = dict(item.split(" = ") for item in str_movie.split(", "))
dict_all_movies[str(idx)] = movie_dict

For Loop - Failing to Iterate Over Elements

Issue: The for loop for this function is not iterating the over all elements. Its stopping at 1. I used some diagnostic print statements to count the number of loops and its stopping at 1. I have reviewed the indentiation and the loop but cannot seem to find the issue.
def process_data(data):
"""Analyzes the data, looking for maximums.
Returns a list of lines that summarize the information.
"""
loop_count = 0
year_by_sales = dict()
max_revenue = {"revenue": 0}
# ----------->This is where the Loop Issue Exists <-----
for item in data:
item_price = locale.atof(item["price"].strip("$"))
item_revenue = item["total_sales"] * item_price
if item["car"]["car_year"] not in year_by_sales.keys():
year_by_sales[item["car"]["car_year"]] = item["total_sales"]
loop_count += 1
if item_revenue > max_revenue["revenue"]:
item["revenue"] = item_revenue
max_revenue = item
most_sold_model = item['car']['car_model']
highest_total_sales = item["total_sales"]
else:
year_by_sales[item["car"]["car_year"]] += item["total_sales"]
loop_count +=1
most_popular_year = max(year_by_sales, key=year_by_sales.get)
summary = [
"The {} generated the most revenue: ${}".format(
format_car(max_revenue["car"]), max_revenue["revenue"]
),
f"The {most_sold_model} had the most sales: {highest_total_sales}",
f"The most popular year was {most_popular_year} with {highest_total_sales} sales.",
]
print(loop_count)
print(year_by_sales)
return summary
Input Data
[{
"id": 1,
"car": {
"car_make": "Ford",
"car_model": "Club Wagon",
"car_year": 1997
},
"price": "$5179.39",
"total_sales": 446
},
{
"id": 2,
"car": {
"car_make": "Acura",
"car_model": "TL",
"car_year": 2005
},
"price": "$14558.19",
"total_sales": 589
},
{
"id": 3,
"car": {
"car_make": "Volkswagen",
"car_model": "Jetta",
"car_year": 2009
},
"price": "$14879.11",
"total_sales": 825
}]
The entire codebase for this script is https://replit.com/join/dkuzpdujne-terry-brooksjr
Actually problem is that your return statement is inside the for loop so you return after the first iteration itself,
it should run just fine if you move it outside something like below:
def process_data(data):
"""Analyzes the data, looking for maximums.
Returns a list of lines that summarize the information.
"""
loop_count = 0
year_by_sales = dict()
max_revenue = {"revenue": 0}
# ----------->This is where the Loop Issue Exists <-----
for item in data:
item_price = locale.atof(item["price"].strip("$"))
item_revenue = item["total_sales"] * item_price
if item["car"]["car_year"] not in year_by_sales.keys():
year_by_sales[item["car"]["car_year"]] = item["total_sales"]
loop_count += 1
if item_revenue > max_revenue["revenue"]:
item["revenue"] = item_revenue
max_revenue = item
most_sold_model = item['car']['car_model']
highest_total_sales = item["total_sales"]
else:
year_by_sales[item["car"]["car_year"]] += item["total_sales"]
loop_count +=1
most_popular_year = max(year_by_sales, key=year_by_sales.get)
summary = "1"
print(loop_count)
print(year_by_sales)
return summary # move this out of for loop

IP URL Mapping in JSON log file

I have a JSON log file and want to print and count the number of times a URL(requestURL) has been hit by an IP in the same log file. The output should be like the below:
IP(remoteIp): URL1-(Count), URL2-(Count), URL3...
127.0.0.1: http://www.google.com - 12, www.bing.com/servlet-server.jsp - 2, etc..
The Sample of the Logfile is like below
"insertId": "kdkddkdmdkd",
"jsonPayload": {
"#type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"enforcedSecurityPolicy": {
"configuredAction": "DENY",
"outcome": "DENY",
"preconfiguredExprIds": [
"owasp-crs-v030001-id942220-sqli"
],
"name": "shbdbbddjdjdjd",
"priority": 2000
},
"statusDetails": "body_denied_by_security_policy"
},
"httpRequest": {
"requestMethod": "POST",
"requestUrl": "https://dknnkkdkddkd/token",
"requestSize": "3004",
"status": 403,
"responseSize": "274",
"userAgent": "okhttp/3.12.2",
"remoteIp": "127.0.0.1",
"serverIp": "123.123.33.31",
"latency": "0.018728s"
}
The solution that I am using is below. I am able to get the total hits per IP or how many total times a URL has been hit etc.
import json
from collections import Counter
unique_ip = {}
request_url = {}
def getAndSaveValueSafely(freqTable, searchDict, key):
try:
tmp = searchDict['httpRequest'][key]
if tmp in freqTable:
freqTable[tmp] += 1
else:
freqTable[tmp] = 1
except KeyError:
if 'not_present' in freqTable:
freqTable['not_present'] += 1
else:
freqTable['not_present'] = 1
with open("threat_intel_1.json") as file:
data = json.load(file)
for d2 in data:
getAndSaveValueSafely(unique_ip, d2, 'remoteIp')
getAndSaveValueSafely(request_url, d2, 'requestUrl')
mc_unique_ip = (dict(Counter(unique_ip).most_common()))
mc_request_url = (dict(Counter(request_url).most_common()))
def printing():
a = str(len(unique_ip))
b = str(len(request_url))
with open("output.txt", "w") as f1:
print(
f' Start Time of log = {minTs}'
f' \n\n End Time of log = {maxTs} \n\n\n {a} Unique IP List = {mc_unique_ip} \n\n\n {b} Unique URL = {mc_request_url},file=f1)
I dont think you need to use counter and are unlikely to see any benifit
from collections import defaultdict
result = {} # start empty
with open("threat_intel_1.json") as file:
data = json.load(file)
for d2 in data:
req = d2.get('httpRequest',None)
if not req:
continue
url = req['requestUrl']
ip = req['remoteIp']
result.setdefault(url,defaultdict(int))[ip] += 1
print(result)
# {"/endpoint.html": {"127.2.3.4":15,"222.11.31.22":2}}
if instead you want it the other way thats easy also
for d2 in data:
req = d2.get('httpRequest',None)
if not req:
continue
url = req['requestUrl']
ip = req['remoteIp']
result.setdefault(ip,defaultdict(int))[url] += 1
#{"127.1.2.3",{"/endpoint1.html":15,"/endpoint2.php":1},"33.44.55.66":{"/endpoint1.html":5}, ...}
instead of using defaultdict you could add a line
# result.setdefault(ip,defaultdict(int))[url] += 1
result.setdefault(ip,{})
result[ip][url] = result[ip].get(url,0) + 1
which arguably is more readable anyway...

How to extract a couple of fields nested in response using python

I'm a python beginner. I would like to ask for help regarding the retrieve the response data. Here's my script:
import pandas as pd
import re
import time
import requests as re
import json
response = re.get(url, headers=headers, auth=auth)
data = response.json()
Here's a part of json response:
{'result': [{'display': '',
'closure_code': '',
'service_offer': 'Integration Platforms',
'updated_on': '2022-04-23 09:05:53',
'urgency': '2',
'business_service': 'Operations',
'updated_by': 'serviceaccount45',
'description': 'ALERT returned 400 but expected 200',
'sys_created_on': '2022-04-23 09:05:53',
'sys_created_by': 'serviceaccount45',
'subcategory': 'Integration',
'contact_type': 'Email',
'problem_type': 'Design: Availability',
'caller_id': '',
'action': 'create',
'company': 'aaaa',
'priority': '3',
'status': '1',
'opened': 'smith.j',
'assigned_to': 'doe.j',
'number': '123456',
'group': 'blabla',
'impact': '2',
'category': 'Business Application & Databases',
'caused_by_change': '',
'location': 'All Locations',
'configuration_item': 'Monitor',
},
I would like to extract the data only for one group = 'blablabla'. Then I would like to extract fields such as:
number = data['number']
group = data['group']
service_offer = data['service_offer']
updated = data['updated_on']
urgency = data['urgency']
username = data['created_by']
short_desc = data['description']
How it should be done?
I know that to check the first value I should use:
service_offer = data['result'][0]['service_offer']
I've tried to create a dictionary, but, I'm getting an error:
data_result = response.json()['result']
payload ={
number = data_result['number']
group = data_result['group']
service_offer = data_result['service_offer']
updated = data_result['updated_on']
urgency = data_result['urgency']
username = data_result['created_by']
short_desc = data_result['description']
}
TypeError: list indices must be integers or slices, not str:
So, I've started to create something like below., but I'm stuck:
get_data = []
if len(data) > 0:
for item in range(len(data)):
get_data.append(data[item])
May I ask for help?
If data is your decoded json response from the question then you can do:
# find group `blabla` in result:
g = next(d for d in data["result"] if d["group"] == "blabla")
# get data from the `blabla` group:
number = g["number"]
group = g["group"]
service_offer = g["service_offer"]
updated = g["updated_on"]
urgency = g["urgency"]
username = g["sys_created_by"]
short_desc = g["description"]
print(number, group, service_offer, updated, urgency, username, short_desc)
Prints:
123456 blabla Integration Platforms 2022-04-23 09:05:53 2 serviceaccount45 ALERT returned 400 but expected 200

How to make a nested dictionary based on a list of URLs?

I have this list of hierarchical URLs:
data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
And I want to dynamically make a nested dictionary for every URL for which there exists another URL that is a subdomain/subfolder of it.
I already tried the follwoing but it is not returning what I expect:
result = []
for key,d in enumerate(data):
form_dict = {}
r_pattern = re.search(r"(http(s)?://(.*?)/)(.*)",d)
r = r_pattern.group(4)
if r == "":
parent_url = r_pattern.group(3)
else:
parent_url = r_pattern.group(3) + "/"+r
print(parent_url)
temp_list = data.copy()
temp_list.pop(key)
form_dict["name"] = parent_url
form_dict["children"] = []
for t in temp_list:
child_dict = {}
if parent_url in t:
child_dict["name"] = t
form_dict["children"].append(child_dict.copy())
result.append(form_dict)
This is the expected output.
{
"name":"https://python-rq.org/",
"children":[
{
"name":"https://python-rq.org/a",
"children":[
{
"name":"https://python-rq.org/a/b",
"children":[
]
}
]
},
{
"name":"https://python-rq.org/c",
"children":[
]
}
]
}
Any advice?
This was a nice problem. I tried going on with your regex method but got stuck and found out that split was actually appropriate for this case. The following works:
data = ["https://python-rq.org/","https://python-rq.org/a","https://python-rq.org/a/b","https://python-rq.org/c"]
temp_list = data.copy()
# This removes the last "/" if any URL ends with one. It makes it a lot easier
# to match the URLs and is not necessary to have a correct link.
data = [x[:-1] if x[-1]=="/" else x for x in data]
print(data)
result = []
# To find a matching parent
def find_match(d, res):
for t in res:
if d == t["name"]:
return t
elif ( len(t["children"])>0 ):
temp = find_match(d, t["children"])
if (temp):
return temp
return None
while len(data) > 0:
d = data[0]
form_dict = {}
l = d.split("/")
# I removed regex as matching the last parentheses wasn't working out
# split does just what you need however
parent = "/".join(l[:-1])
data.pop(0)
form_dict["name"] = d
form_dict["children"] = []
option = find_match(parent, result)
if (option):
option["children"].append(form_dict)
else:
result.append(form_dict)
print(result)
[{'name': 'https://python-rq.org', 'children': [{'name': 'https://python-rq.org/a', 'children': [{'name': 'https://python-rq.org/a/b', 'children': []}]}, {'name': 'https://python-rq.org/c', 'children': []}]}]

Categories

Resources