get first element from dictionary which is an array - python - python

I have an array store header information:
{'x-frame-options': {'defined': True, 'warn': 0, 'contents': 'SAMEORIGIN'}, 'strict-transport-security': {'defined': True, 'warn': 0, 'contents': 'max-age=15552000'}, 'access-control-allow-origin': {'defined': False, 'warn': 1, 'contents': ''}, 'content-security-policy': {'defined': True, 'warn': 0, 'contents': "upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com"}, 'x-xss-protection': {'defined': False, 'warn': 1, 'contents': ''}, 'x-content-type-options': {'defined': False, 'warn': 1, 'contents': ''}}
I want to get the first element of the dictionary
#header is a return array that store all header information,
headers = headersecurity.verify_header_existance(url, 0)
for header in headers:
if header.find("x-frame-options"):
for headerSett in header:
defined = [elem[0] for elem in headerSett.values()] # here I don't get first element
print(defined)
the expected results is:
x-frame-options : defined = True;
access-control-allow-origin : defined = True;
x-content-type-options : defined = True;
....
thanks

I think it'd be safer to use dictionary keys like so
headers['x-frame-options']['defined']
That way you do not rely on the ordering inside the dict (dict are not ordered)
EDIT: Just saw your edit and what you are expecting as an output, here a simple way of having it:
for key, value in headers.items():
if "defined" in value:
print(f"{key} : defined = {value['defined']}")
output:
x-frame-options : defined = True
strict-transport-security : defined = True
access-control-allow-origin : defined = False
content-security-policy : defined = True
x-xss-protection : defined = False
x-content-type-options : defined = False

Related

Django: How to search through django.template.context.RequestContext

I'm working over tests in Django and faced <class 'django.template.context.RequestContext'>, which I'm trying to iterate through and find <class 'ecom.models.Product'> object inside.
test.py
def test_ProductDetail_object_in_context(self):
response = self.client.get(reverse('product_detail', args=[1]))
# assertEqual - test passes
self.assertEqual(response.context[0]['object'], Product.objects.get(id=1))
# assertIn - test fails
self.assertIn(Product.objects.get(id=1), response.context[0])
views.py
class ProductDetailView(DetailView):
model = Product
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
data = cartData(self.request)
cartItems = data['cartItems']
context['cartItems'] = cartItems
return context
What's inside response.context:
[
[
{'True': True, 'False': False, 'None': None},
{'csrf_token': <SimpleLazyObject: <function csrf.<locals>._get_val at 0x7fd80>>,
'request': <WSGIRequest: GET '/1/'>,
'user': <SimpleLazyObject: <django.contrib.auth.models.AnonymousUser object at 0x7fd820>>, '
perms': <django.contrib.auth.context_processors.PermWrapper object at 0x7fd80>,
'messages': <django.contrib.messages.storage.fallback.FallbackStorage object at 0x7fd8290>,
'DEFAULT_MESSAGE_LEVELS': {'DEBUG': 10, 'INFO': 20, 'SUCCESS': 25, 'WARNING': 30, 'ERROR': 40}
},
{},
{'object': <Product: Pen>,
'product': <Product: Pen>,
'view': <ecom.views.ProductDetailView object at 0x7fd8210>,
'cartItems': 0}
],
[
{'True': True, 'False': False, 'None': None},
{'csrf_token': <SimpleLazyObject: <function csrf.<locals>._get_val at 0x7fd8240>>,
'request': <WSGIRequest: GET '/1/'>,
'user': <SimpleLazyObject: <django.contrib.auth.models.AnonymousUser object at 0x7fd8250>>,
'perms': <django.contrib.auth.context_processors.PermWrapper object at 0x7fd8250>,
'messages': <django.contrib.messages.storage.fallback.FallbackStorage object at 0x7fd8290>,
'DEFAULT_MESSAGE_LEVELS': {'DEBUG': 10, 'INFO': 20, 'SUCCESS': 25, 'WARNING': 30, 'ERROR': 40}
},
{},
{'object': <Product: Pen>,
'product': <Product: Pen>,
'view': <ecom.views.ProductDetailView object at 0x7fd8210>,
'cartItems': 0}
]
]
Type of response.context:
<class 'django.template.context.RequestContext'>
What's inside Product.objects.get(id=1) is: Pen
Type of Product.objects.get(id=1) is: <class 'ecom.models.Product'>
I don't undestand why:
it found Product object in self.assertEqual(response.context[0]['object'], Product.objects.get(id=1)), but not in self.assertIn(Product.objects.get(id=1), response.context[0]['object']) - says TypeError: argument of type 'Product' is not iterable
it also didn't find it in self.assertIn(Product.objects.get(id=1), response.context[0]) - says "AssertionError: <Product: Pen> not found in [....here goes contents of response.context[0]....]"
it also didn't find it in self.assertIn(Product.objects.get(id=1), response.context[0][3]) - says "in getitem raise KeyError(key), KeyError: 3"
how to work with RequestContext class? JSON like?
Sorry for a bit mixed up question, just trying to understand how to work with RequestContext.
Thank you in advance!
I think your test is failing because assertIn looks through the KEYS not the values. Solution would be:
self.assertIn(Product.objects.get(id=1), response.context[0].values())
A little more explanation: response.context[0] seems like it's some key-value storage, i.e. a dict. When you do response.context[0]["object"], you've just accessed the value at the key "object" where response.context[0] is the dict. Doing some in query on the dictionary only looks up the keys of the dictionary.

How to filter efficiently huge list by multiple rules?

I am writing an open-source PyPi package, that should filter the AWS EC2 instances.
In my function ec_compare__from_dict, I am filtering a list of 350+ elements that takes 364Kb on disk.
The following example of execution returns 1 filtered element:
>>> ec_compare__from_dict(_partial=_partial,InstanceType='z1d',FreeTierEligible=False,SupportedUsageClasses='spot',BareMetal=True)
[{'InstanceType': 'z1d.metal', 'CurrentGeneration': True, 'FreeTierEligible': False, 'SupportedUsageClasses': ['on-demand', 'spot'], 'SupportedRootDeviceTypes': ['ebs'], 'BareMetal': True, 'ProcessorInfo': {'SupportedArchitectures': ['x86_64'], 'SustainedClockSpeedInGhz': 4.0}, 'VCpuInfo': {'DefaultVCpus': 48}, 'MemoryInfo': {'SizeInMiB': 393216}, 'InstanceStorageSupported': True, 'InstanceStorageInfo': {'TotalSizeInGB': 1800, 'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]}, 'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'}, 'NetworkInfo': {'NetworkPerformance': '25 Gigabit', 'MaximumNetworkInterfaces': 15, 'Ipv4AddressesPerInterface': 50, 'Ipv6AddressesPerInterface': 50, 'Ipv6Supported': True, 'EnaSupport': 'required'}, 'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']}, 'HibernationSupported': False, 'BurstablePerformanceSupported': False, 'DedicatedHostsSupported': True, 'AutoRecoverySupported': False}]
My problem is the following:
I want to filter the list with all filters that have different rules in one single list comprehension.
But I am losing readability and I am creating a spaghetti code. Please point me to the better design decisions.
from typing import List
def ec2keys(*arg) -> List:
values = {'str': ['InstanceType', 'Hypervisor'], 'bool': ['FreeTierEligible', 'HibernationSupported', 'CurrentGeneration', 'BurstablePerformanceSupported', 'AutoRecoverySupported', 'DedicatedHostsSupported', 'InstanceStorageSupported', 'BareMetal'], 'list': ['SupportedUsageClasses', 'SupportedRootDeviceTypes'], 'dict': ['InstanceStorageInfo', 'VCpuInfo', 'EbsInfo', 'FpgaInfo', 'PlacementGroupInfo', 'GpuInfo', 'InferenceAcceleratorInfo', 'MemoryInfo', 'NetworkInfo', 'ProcessorInfo'], 'other': []}
return [elem for k,v in values.items() if k in arg or not arg for elem in v]
def ec_compare__from_dict(_partial: List,**kwargs):
_instance_type = kwargs.get('InstanceType')
flat_keys = set(ec2keys('str', 'bool')).intersection(
set(kwargs.keys())) - {'InstanceType'}
complex_filter_keys = set(ec2keys()).intersection(
set(kwargs.keys()))
list_keys_dict = {k: list(
(lambda x: x if isinstance(x, list) else [x])(kwargs.get(k)))
for k in set(ec2keys('list')).intersection(
set(kwargs.keys()))
}
# here I started with list comprehension
_partial = [x for x in _partial
if all(elem in x.keys() for elem in flat_keys)
and all(elem in x.keys() for elem in complex_filter_keys)
and all(x[elem] == kwargs[elem] for elem in flat_keys)
]
# this is re-apply filter again to all elements
if isinstance(_instance_type, str) and _instance_type:
_partial = [x for x in _partial
if str(x['InstanceType']).startswith(_instance_type)
]
elif isinstance(_instance_type, (list, set)) and _instance_type:
_partial = [x for x in _partial
if any(str(x['InstanceType']).startswith(elem)
for elem in _instance_type)
]
# this is how I filter list values
if list_keys_dict:
_partial = [x for x in _partial
if any(set(x[k]).intersection(v) for k, v in list_keys_dict.items())
]
return _partial
Example data
_partial = [{'InstanceType': 'z1d.metal', 'CurrentGeneration': True, 'FreeTierEligible': False, 'SupportedUsageClasses': ['on-demand', 'spot'], 'SupportedRootDeviceTypes': ['ebs'], 'BareMetal': True, 'ProcessorInfo': {'SupportedArchitectures': ['x86_64'], 'SustainedClockSpeedInGhz': 4.0}, 'VCpuInfo': {'DefaultVCpus': 48}, 'MemoryInfo': {'SizeInMiB': 393216}, 'InstanceStorageSupported': True, 'InstanceStorageInfo': {'TotalSizeInGB': 1800, 'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]}, 'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'}, 'NetworkInfo': {'NetworkPerformance': '25 Gigabit', 'MaximumNetworkInterfaces': 15, 'Ipv4AddressesPerInterface': 50, 'Ipv6AddressesPerInterface': 50, 'Ipv6Supported': True, 'EnaSupport': 'required'}, 'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']}, 'HibernationSupported': False, 'BurstablePerformanceSupported': False, 'DedicatedHostsSupported': True, 'AutoRecoverySupported': False}]
Because of your nested list and dict structure, I think a class comparison is not the easiest one. But in a class comparison you can generate a comparison method for every item separately, which will cut the big function in many small ones. This will lead to some maintaining issues if the interface changes.
Your dictionary comparison approach is better in that case, but I would rewrite it using recursion for the nested dictionaries. By using recursion, you can simplify the nesting a bit.
By using your provided input:
data = {
'InstanceType': 'z1d.metal',
'CurrentGeneration': True,
'FreeTierEligible': False,
'SupportedUsageClasses': ['on-demand', 'spot'],
'SupportedRootDeviceTypes': ['ebs'],
'BareMetal': True,
'ProcessorInfo': {'SupportedArchitectures': ['x86_64'],
'SustainedClockSpeedInGhz': 4.0},
'VCpuInfo': {'DefaultVCpus': 48},
'MemoryInfo': {'SizeInMiB': 393216},
'InstanceStorageSupported': True,
'InstanceStorageInfo': {'TotalSizeInGB': 1800,
'Disks': [{'SizeInGB': 900, 'Count': 2, 'Type': 'ssd'}]},
'EbsInfo': {'EbsOptimizedSupport': 'default', 'EncryptionSupport': 'supported'},
'NetworkInfo': {'NetworkPerformance': '25 Gigabit',
'MaximumNetworkInterfaces': 15,
'Ipv4AddressesPerInterface': 50,
'Ipv6AddressesPerInterface': 50,
'Ipv6Supported': True,
'EnaSupport': 'required'},
'PlacementGroupInfo': {'SupportedStrategies': ['cluster', 'partition', 'spread']},
'HibernationSupported': False,
'BurstablePerformanceSupported': False,
'DedicatedHostsSupported': True,
'AutoRecoverySupported': False}
I generated a few possible filters (valid are True, invalid are False):
data_check_valid = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800},
'PlacementGroupInfo': {'SupportedStrategies': ['spread']},
}
data_check_invalid_strategy = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800},
'PlacementGroupInfo': {'SupportedStrategies': ['clustering']}, # Clustering is not supported.
}
data_check_invalid_count = {
'InstanceType': 'z1d.metal',
'InstanceStorageInfo': {'TotalSizeInGB': 1800,
'Disks': [{'SizeInGB': 900, 'Count': 4, 'Type': 'ssd'}]}, # Counts are unequal
}
Then we will compare the two dictionaries element for element, including nested elements.
For this the following function is used:
def verify_element(original, check) -> bool:
# Compare the types
if type(original) != type(check):
return False
# recursively call this function for every element in the dictionary (if key exists)
if isinstance(check, dict):
for key, value in check.items():
if key not in original:
return False
if not verify_element(value, original[key]):
return False
return True
# The value inside check has to occur in any of the original elements
# This behaviour is required, because we do not know where the check elements is positioned.
if isinstance(check, (tuple, list)):
for element in check:
if not any(verify_element(each, element) for each in original):
return False
return True
# Verify the element directly.
if isinstance(check, (str, bool, int, float)):
return original == check
# Handle any unknown data types.
raise TypeError(f"Type {type(check)}, with value {check} cannot be compared.")
To compare both dictionaries with each other, the final check will then look like this:
if __name__ == '__main__':
print(verify_element(data, data_check_valid)) # True
print(verify_element(data, data_check_invalid_strategy)) # False
print(verify_element(data, data_check_invalid_count)) # False
# When you change 'Count' to 2, the answer will become # True
If you want to use this cleanly you can put it in a class and compare every element individually using the above function. This makes it possible to also include custom validators, such as should be bigger or smaller than the original value (which is currently impossible with the above code).

Scraping a list with scrapy and structure it

I'm trying to scrape every title and score from that page https://myanimelist.net/animelist/MoonlessMidnite?status=7 and return data in that form :
{"user" : moonlessmidnite, "anime" : A, "score" : x
"user" : moonlessmidnite, "anime" : B, "score" : x
"user" : moonlessmidnite, "anime" : C, "score" : x }
...ect
I managed to get table
table = response.xpath('.//tr[#class = "list-table-data"]')
score = table.xpath('.//td[#class = "data score"]//a/text()').extract()
title = table.xpath('.//td//a[#class = "link sort"]').extract()
but when i'm trying to scrape title or score i got some weird ouput like :
['\n ', '\n ', '${ item.anime_title }']
Look at the raw HTML of the website:
You see that it indeed contains ${ item.anime_title }.
That indicates that the content is generated via Javascript.
There's no easy solution for that, you'll have to look at the XHR requests that are being done and see if you can get something meaningful.
If you look closely at the HTML, you will see that the data is contained in a big JSON string in the table data-item attrbute.
Try this in the scrapy shell:
fetch('https://myanimelist.net/animelist/MoonlessMidnite?status=7')
import json
json.loads(response.xpath('//table[#class="list-table"]/#data-items').extract_first()
This outputs something like this:
{'status': 2,
'score': 0,
'tags': '',
'is_rewatching': 0,
'num_watched_episodes': 1,
'anime_title': 'Hidan no Aria Special',
'anime_num_episodes': 1,
'anime_airing_status': 2,
'anime_id': 10604,
'anime_studios': None,
'anime_licensors': None,
'anime_season': None,
'has_episode_video': False,
'has_promotion_video': True,
'has_video': True,
'video_url': '/anime/10604/Hidan_no_Aria_Special/video',
'anime_url': '/anime/10604/Hidan_no_Aria_Special',
'anime_image_path': 'https://cdn.myanimelist.net/r/96x136/images/anime/2/29138.jpg?s=90cb8381c58c92d39862ac700c43f7b5',
'is_added_to_list': False,
'anime_media_type_string': 'Special',
'anime_mpaa_rating_string': 'PG-13',
'start_date_string': None,
'finish_date_string': None,
'anime_start_date_string': '12-21-11',
'anime_end_date_string': '12-21-11',
'days_string': None,
'storage_string': '',
'priority_string': 'Low'},
{'status': 6,
'score': 0,
'tags': '',
'is_rewatching': 0,
'num_watched_episodes': 0,
'anime_title': '.hack//Roots',
'anime_num_episodes': 26,
'anime_airing_status': 2,
'anime_id': 873,
'anime_studios': None,
'anime_licensors': None,
'anime_season': None,
'has_episode_video': False,
'has_promotion_video': True,
'has_video': True,
'video_url': '/anime/873/hack__Roots/video',
'anime_url': '/anime/873/hack__Roots',
'anime_image_path': 'https://cdn.myanimelist.net/r/96x136/images/anime/3/13050.jpg?s=db9ff70bf19742172f1d0140c95c4a65',
'is_added_to_list': False,
'anime_media_type_string': 'TV',
'anime_mpaa_rating_string': 'PG-13',
'start_date_string': None,
'finish_date_string': None,
'anime_start_date_string': '04-06-06',
'anime_end_date_string': '09-28-06',
'days_string': None,
'storage_string': '',
'priority_string': 'Low'}
You then just have to use this dict to get the info that you need.

How to Escape true/false boolean at python JSON string

i have following code
headers = {'Content-Type': 'application/json', 'cwauth-token': token}
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld'', 'dmrDomain': 'domain.tld'', 'customParam1Available': false, 'realtimeRoutingAPI': false, 'rootRedirect': false}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
r = requests.post('http://stackoverflow.com', json=payload, headers=headers)
When i hit start, it gives error
NameError: name 'false' is not defined
How i can escape those false booleans at payload?
Python doesn't use false, it uses False, hence you're getting a NameError because Python is looking for a variable called false which doesn't exist.
Replace false with False in your dictionary. You've also got a few too many quotes in places, so I've removed those:
payload = {'namePostfix': 'test99682', 'costModel': 'NOT_TRACKED', 'clickRedirectType': 'REGULAR', 'trafficSource':{'id': '3a7ff9ec-19af-4996-94c1-7f33e036e7af'}, 'redirectTarget': 'DIRECT_URL', 'client':{'id': 'clentIDc', 'clientCode': 'xxx', 'mainDomain': 'domain.tld', 'defaultDomain': 'domain.tld', 'dmrDomain': 'domain.tld', 'customParam1Available': False, 'realtimeRoutingAPI': False, 'rootRedirect': False}, 'country':{'code': 'UK'}, 'directRedirectUrl': 'http://google.co.uk'}
Likewise, the opposite boolean value is True (not true), and the "null" data type is None.
False is the correct name to use in Python. In Python, the boolean values, true and false are defined by capitalized True and False

dict is not callable in python-kwargs error

I have the following code which reads from a yaml :
if 'parameters' in options:
for name, parameter_options in options['parameters'].items():
make_parameters = injector.parameters()
print parameter_options
parameter_injected = make_parameters(**parameter_options)
parameters = cft.add_parameters()
parameters(key, **parameter_injected)
which gives me the error :
parameter_injected = make_parameters(**parameter_options) TypeError: 'dict' object is not callable
parameter_options is a dictionary as read from yaml and printed out :
{'constraint_description': 'Malformed input-Parameter MyParameter must only contain upper and lower case letters', 'min_length': 12, 'description': 'to do some stuff', 'default': '10.201.22.33', 'max_value': 34, 'min_value': 12, 'allowed_values': ['sdd', 'asas'], 'max_length': 23, 'allowed_pattern': '[A-Za-z0-9]+', 'no_echo': True, 'type': 'String'}
So when I do **parameter_options shouldn't that just convert the dict to charges?
My bad, corrected my code to :
if 'parameters' in options:
for name, parameter_options in options['parameters'].items():
parameter_injected = injector.parameters(**parameter_options)
parameters = cft.add_parameters(name, **parameter_injected)

Categories

Resources