How to insert Value from JSON Data into table - python

I have a JSON Data,
[[{'Design Project': 'Not Available',
'Design Target*': 'Not Available',
'Median Property*': '50',
'Metric': 'ENERGY STAR score (1-100)'},
{'Design Project': 'Not Available',
'Design Target*': '35.4',
'Median Property*': '141.4',
'Metric': 'Source EUI (kBtu/ft²)'},
{'Design Project': 'Not Available',
'Design Target*': '15.8',
'Median Property*': '63.1',
'Metric': 'Site EUI (kBtu/ft²)'},
{'Design Project': 'Not Available',
'Design Target*': '3,536.0',
'Median Property*': '14,144.1',
'Metric': 'Source Energy Use (kBtu)'},
{'Design Project': 'Not Available',
'Design Target*': '1,578.7',
'Median Property*': '6,314.9',
'Metric': 'Site Energy Use (kBtu)'},
{'Design Project': 'Not Available',
'Design Target*': '34.61',
'Median Property*': '138.44',
'Metric': 'Energy Cost ($)'},
{'Design Project': '0.0',
'Design Target*': '0.2',
'Median Property*': '0.6',
'Metric': 'Total GHG Emissions (Metric Tons CO2e)'}],
[{'Energy Type': ['Energy Not Entered',
'Assumed Mix Based on State & Property Type:',
'',
'Electric - Grid (56.9%)',
'Natural Gas (43.1%)'],
'Target': ' Target % Better than Median: 75',
'Title': "About this Property's Design",
'Uses:': 'Other - Education (100.0%)'}],
[{'Your Design Score ': ' N/A'}]]
I want to remove the Key from the Json because in my Postgres table, key is my column. I want to insert the value to my column.
I would like to do it using Python.

def f(l_of_l_of_ds):
return [list(l2.values())
for l1 in l_of_l_of_ds
for l2 in l1]
Which yields:
[['Not Available', 'Not Available', '50', 'ENERGY STAR score (1-100)'],
['Not Available', '35.4', '141.4', 'Source EUI (kBtu/ft²)'],
['Not Available', '15.8', '63.1', 'Site EUI (kBtu/ft²)'],
['Not Available', '3,536.0', '14,144.1', 'Source Energy Use (kBtu)'],
['Not Available', '1,578.7', '6,314.9', 'Site Energy Use (kBtu)'],
['Not Available', '34.61', '138.44', 'Energy Cost ($)'],
['0.0', '0.2', '0.6', 'Total GHG Emissions (Metric Tons CO2e)'],
[['Energy Not Entered',
'Assumed Mix Based on State & Property Type:',
'',
'Electric - Grid (56.9%)',
'Natural Gas (43.1%)'],
' Target % Better than Median: 75',
"About this Property's Design",
'Other - Education (100.0%)'],
[' N/A']]

Related

how to stop letter repeating itself python

I am making a code which takes in jumble word and returns a unjumbled word , the data.json contains a list and here take a word one-by-one and check if it contains all the characters of the word and later checking if the length is same , but the problem is when i enter a word as helol then the l is checked twice and giving me some other outputs including the main one(hello). i know why does it happen but i cant get a fix to it
import json
val = open("data.json")
val1 = json.load(val)#loads the list
a = input("Enter a Jumbled word ")#takes a word from user
a = list(a)#changes into list to iterate
for x in val1:#iterates words from list
for somethin in a:#iterates letters from list
if somethin in list(x):#checks if the letter is in the iterated word
continue
else:
break
else:#checks if the loop ended correctly (that means word has same letters)
if len(a) != len(list(x)):#checks if it has same number of letters
continue#returns
else:
print(x)#continues the loop to see if there are more like that
EDIT: many people wanted the json file so here it is
['Torres Strait Creole', 'good bye', 'agon', "queen's guard", 'animosity', 'price list', 'subjective', 'means', 'severe', 'knockout', 'life-threatening', 'entry into the war', 'dominion', 'damnify', 'packsaddle', 'hallucinate', 'lumpy', 'inception', 'Blankenese', 'cacophonous', 'zeptomole', 'floccinaucinihilipilificate', 'abashed', 'abacterial', 'ableism', 'invade', 'cohabitant', 'handicapped', 'obelus', 'triathlon', 'habitue', 'instigate', 'Gladstone Gander', 'Linked Data', 'seeded player', 'mozzarella', 'gymnast', 'gravitational force', 'Friedelehe', 'open up', 'bundt cake', 'riffraff', 'resourceful', 'wheedle', 'city center', 'gorgonzola', 'oaf', 'auf', 'oafs', 'galoot', 'imbecile', 'lout', 'moron', 'news leak', 'crate', 'aggregator', 'cheating', 'negative growth', 'zero growth', 'defer', 'ride back', 'drive back', 'start back', 'shy back', 'spring back', 'shrink back', 'shy away', 'abderian', 'unable', 'font manager', 'font management software', 'consortium', 'gown', 'inject', 'ISO 639', 'look up', 'cross-eyed', 'squinting', 'health club', 'fitness facility', 'steer', 'sunbathe', 'combatives', 'HTH', 'hope that helps', 'How The Hell', 'distributed', 'plum cake', 'liberalization', 'macchiato', 'caffè macchiato', 'beach volley', 'exult', 'jubilate', 'beach volleyball', 'be beached', 'affogato', 'gigabyte', 'terabyte', 'petabyte', 'undressed', 'decameter', 'sensual', 'boundary marker', 'poor man', 'cohabitee', 'night sleep', 'protruding ears', 'three quarters of an hour', 'spermophilus', 'spermophilus stricto sensu', "devil's advocate", 'sacred king', 'sacral king', 'myr', 'million years', 'obtuse-angled', 'inconsolable', 'neurotic', 'humiliating', 'mortifying', 'theological', 'rematch', 'varıety', 'be short', 'ontological', 'taxonomic', 'taxonomical', 'toxicology testing', 'on the job training', 'boulder', 'unattackable', 'inviolable', 'resinous', 'resiny', 'ionizing radiation', 'citrus grove', 'comic book shop', 'preparatory measure', 'written account', 'brittle', 'locker', 'baozi', 'bao', 'bau', 'humbow', 'nunu', 'bausak', 'pow', 'pau', 'yesteryear', 'fire drill', 'rotted', 'putto', 'overthrow', 'ankle monitor', 'somewhat stupid', 'a little stupid', 'semordnilap', 'pangram', 'emordnilap', 'person with a sunlamp tan', 'tittle', 'incompatible', 'autumn wind', 'dairyman', 'chesty', 'lacustrine', 'chronophotograph', 'chronophoto', 'leg lace', 'ankle lace', 'ankle lock', 'Babelfy', 'ventricular', 'recurrent', 'long-lasting', 'long-standing', 'long standing', 'sea bass', 'reap', 'break wind', 'chase away', 'spark', 'speckle', 'take back', 'Westphalian', 'Aeolic Greek', 'startup', 'abseiling', 'impure', 'bottle cork', 'paralympic', 'work out', 'might', 'ice-cream man', 'ice cream man', 'ice cream maker', 'ice-cream maker', 'traveling', 'special delivery', 'prizefighter', 'abs', 'ab', 'churro', 'pilfer', 'dehumanize', 'fertilize', 'inseminate', 'digitalize', 'fluke', 'stroke of luck', 'decontaminate', 'abandonware', 'manzanita', 'tule', 'jackrabbit', 'system administrator', 'system admin', 'springtime lethargy', 'Palatinean', 'organized religion', 'bearing puller', 'wheel puller', 'gear puller', 'shot', 'normalize', 'palindromic', 'lancet window', 'terminological', 'back of head', 'dragon food', 'barbel', 'Central American Spanish', 'basis', 'birthmark', 'blood vessel', 'ribes', 'dog-rose', 'dreadful', 'freckle', 'free of charge', 'weather verb', 'weather sentence', 'gipsy', 'gypsy', 'glutton', 'hump', 'low voice', 'meek', 'moist', 'river mouth', 'turbid', 'multitude', 'palate', 'peak of mountain', 'poetry', 'pure', 'scanty', 'spicy', 'spicey', 'spruce', 'surface', 'infected', 'copulate', 'dilute', 'dislocate', 'grow up', 'hew', 'hinder', 'infringe', 'inhabit', 'marry off', 'offend', 'pass by', 'brother of a man', 'brother of a woman', 'sister of a man', 'sister of a woman', 'agricultural farm', 'result in', 'rebel', 'strew', 'scatter', 'sway', 'tread', 'tremble', 'hog', 'circuit breaker', 'Southern Quechua', 'safety pin', 'baby pin', 'college student', 'university student', 'pinus sibirica', 'Siberian pine', 'have lunch', 'floppy', 'slack', 'sloppy', 'wishi-washi', 'turn around', 'bogeyman', 'selfish', 'Talossan', 'biomembrane', 'biological membrane', 'self-sufficiency', 'underevaluation', 'underestimation', 'opisthenar', 'prosody', 'Kumhar Bhag Paharia', 'psychoneurotic', 'psychoneurosis', 'levant', "couldn't-care-less attitude", 'noctambule', 'acid-free paper', 'decontaminant', 'woven', 'wheaten', 'waste-ridden', 'war-ridden', 'violence-ridden', 'unwritten', 'typewritten', 'spoken', 'abiogenetically', 'rasp', 'abstractly', 'cyclically', 'acyclically', 'acyclic', 'ad hoc', 'spare tire', 'spare wheel', 'spare tyre', 'prefabricated', 'ISO 9000', 'Barquisimeto', 'Maracay', 'Ciudad Guayana', 'San Cristobal', 'Barranquilla', 'Arequipa', 'Trujillo', 'Cusco', 'Callao', 'Cochabamba', 'Goiânia', 'Campinas', 'Fortaleza', 'Florianópolis', 'Rosario', 'Mendoza', 'Bariloche', 'temporality', 'papyrus sedge', 'paper reed', 'Indian matting plant', 'Nile grass', 'softly softly', 'abductive reasoning', 'abductive inference', 'retroduction', 'Salzburgian', 'cymotrichous', 'access point', 'wireless access point', 'dynamic DNS', 'IP address', 'electrolyte', 'helical', 'hydrometer', 'intranet', 'jumper', 'MAC address', 'Media Access Control address', 'nickel–cadmium battery', 'Ni-Cd battery', 'oscillograph', 'overload', 'photovoltaic', 'photovoltaic cell', 'refractor telescope', 'autosome', 'bacterial artificial chromosome', 'plasmid', 'nucleobase', 'base pair', 'base sequence', 'chromosomal deletion', 'deletion', 'deletion mutation', 'gene deletion', 'chromosomal inversion', 'comparative genomics', 'genomics', 'cytogenetics', 'DNA replication', 'DNA repair', 'DNA sequence', 'electrophoresis', 'functional genomics', 'retroviral', 'retroviral infection', 'acceptance criteria', 'batch processing', 'business rule', 'code review', 'configuration management', 'entity–relationship model', 'lifecycle', 'object code', 'prototyping', 'pseudocode', 'referential', 'reusability', 'self-join', 'timestamp', 'accredited', 'accredited translator', 'certify', 'certified translation', 'computer-aided design', 'computer-aided', 'computer-assisted', 'management system', 'computer-aided translation', 'computer-assisted translation', 'machine-aided translation', 'conference interpreter', 'freelance translator', 'literal translation', 'mother-tongue', 'whispered interpreting', 'simultaneous interpreting', 'simultaneous interpretation', 'base anhydride', 'binary compound', 'absorber', 'absorption coefficient', 'attenuation coefficient', 'active solar heater', 'ampacity', 'amorphous semiconductor', 'amorphous silicon', 'flowerpot', 'antireflection coating', 'antireflection', 'armored cable', 'electric arc', 'breakdown voltage','casing', 'facing', 'lining', 'assumption of Mary', 'auscultation']
Just a example and the dictionary is full of items
As I understand it you are trying to identify all possible matches for the jumbled string in your list. You could sort the letters in the jumbled word and match the resulting list against sorted lists of the words in your data file.
sorted_jumbled_word = sorted(a)
for word in val1:
if len(sorted_jumbled_word) == len(word) and sorted(word) == sorted_jumbled_word:
print(word)
Checking by length first reduces unnecessary sorting. If doing this repeatedly, you might want to create a dictionary of the words in the data file with their sorted versions, to avoid having to repeatedly sort them.
There are spaces and punctuation in some of the terms in your word list. If you want to make the comparison ignoring spaces then remove them from both the jumbled word and the list of unjumbled words, using e.g. word = word.replace(" ", "")

Is there any other way (to combine values of one column into different groups), instead of using 'df.replace( )' several times in the below problem?

In :
char_df['Loan_Title'].unique()
Out:
array(['debt consolidation', 'credit card refinancing',
'home improvement', 'credit consolidation', 'green loan', 'other',
'moving and relocation', 'credit cards', 'medical expenses',
'refinance', 'credit card consolidation', 'lending club',
'debt consolidation loan', 'major purchase', 'vacation',
'business', 'credit card payoff', 'credit card',
'credit card refi', 'personal loan', 'cc refi', 'consolidate',
'medical', 'loan 1', 'consolidation', 'card consolidation',
'car financing', 'debt', 'home buying', 'freedom', 'consolidated',
'get out of debt', 'consolidation loan', 'dept consolidation',
'personal', 'cards', 'bathroom', 'refi', 'credit card loan',
'credit card debt', 'house', 'debt consolidation 2013',
'debt loan', 'cc refinance', 'home', 'cc consolidation',
'credit card refinance', 'credit loan', 'payoff',
'bill consolidation', 'credit card paydown', 'credit card pay off',
'get debt free', 'myloan', 'credit pay off', 'my loan', 'loan',
'bill payoff', 'cc-refinance', 'debt reduction', 'medical loan',
'wedding loan', 'credit', 'pay off bills', 'refinance loan',
'debt payoff', 'car loan', 'pay off', 'pool', 'credit payoff',
'credit card refinance loan', 'cc loan', 'debt free', 'conso',
'home improvement loan', 'loan consolidation', 'lending loan',
'relief', 'cc', 'loan1', 'getting ahead', 'home loan', 'bills'],
dtype=object)
In :
char_df=char_df.replace(['debt consolidation','debt consolidation loan','dept consolidation','debt consolidation 2013'], 'dept_consolidation')
char_df = char_df.replace(['personal','personal loan'],'personal_loan')
char_df = char_df.replace(['credit card refinancing','credit card refi','credit card refinance','credit card refinance loan'],'credit_card_refinance')
IIUC (it is hard to read) you could try the following:
import pandas as pd
# will use regex pattern as keys and replace string as value
patterns = {
r'dept consolidation.*': 'dept_consolidation',
r'personal.*': 'personal_loan',
r'credit card.*': 'credit_card_refinance'
}
df['Loan_Title'] = df['Loan_Title'].replace(regex=patterns)

Python Generators and how to iterate over correctly to drop records based on a key within the dictionary being present in a a separate list

I'm new to the concept of generators and I'm struggling with how to apply my changes to the records within the generator object returned from the RISparser module.
I understand that a generator only reads a record at a time and doesn't actually store the data in memory but I'm having a tough time iterating over it effectively and applying my changes.
My changes will involve dropping records that have not got ['doi'] values that are contained within a list of DOIs [doi_match].
doi_match = ['10.1002/14651858.CD008259.pub2','10.1002/14651858.CD011552','10.1002/14651858.CD011990']
Generator object returned form RISparser contains the following information, this is just the first 2 records returned of a few 100. I want to iterate over it and compare the 'doi': key from the generator with the list of DOIs.
{'type_of_reference': 'JOUR', 'title': "The CoRe Outcomes in WomeN's health (CROWN) initiative: Journal editors invite researchers to develop core outcomes in women's health", 'secondary_title': 'Neurourology and Urodynamics', 'alternate_title1': 'Neurourol. Urodyn.', 'volume': '33', 'number': '8', 'start_page': '1176', 'end_page': '1177', 'year': '2014', 'doi': '10.1002/nau.22674', 'issn': '07332467 (ISSN)', 'authors': ['Khan, K.'], 'keywords': ['Bias (epidemiology)', 'Clinical trials', 'Consensus', 'Endpoint determination/standards', 'Evidence-based medicine', 'Guidelines', 'Research design/standards', 'Systematic reviews', 'Treatment outcome', 'consensus', 'editor', 'female', 'human', 'medical literature', 'Note', 'outcomes research', 'peer review', 'randomized controlled trial (topic)', 'systematic review (topic)', "women's health", 'outcome assessment', 'personnel', 'publication', 'Female', 'Humans', 'Outcome Assessment (Health Care)', 'Periodicals as Topic', 'Research Personnel', "Women's Health"], 'publisher': 'John Wiley and Sons Inc.', 'notes': ['Export Date: 14 July 2020', 'CODEN: NEURE'], 'type_of_work': 'Note', 'name_of_database': 'Scopus', 'custom2': '25270392', 'language': 'English', 'url': 'https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908368202&doi=10.1002%2fnau.22674&partnerID=40&md5=b220702e005430b637ef9d80a94dadc4'}
{'type_of_reference': 'JOUR', 'title': "The CROWN initiative: Journal editors invite researchers to develop core outcomes in women's health", 'secondary_title': 'Gynecologic Oncology', 'alternate_title1': 'Gynecol. Oncol.', 'volume': '134', 'number': '3', 'start_page': '443', 'end_page': '444', 'year': '2014', 'doi': '10.1016/j.ygyno.2014.05.005', 'issn': '00908258 (ISSN)', 'authors': ['Karlan, B.Y.'], 'author_address': 'Gynecologic Oncology and Gynecologic Oncology Reports, India', 'keywords': ['clinical trial (topic)', 'decision making', 'Editorial', 'evidence based practice', 'female infertility', 'health care personnel', 'human', 'outcome assessment', 'outcomes research', 'peer review', 'practice guideline', 'premature labor', 'priority journal', 'publication', 'systematic review (topic)', "women's health", 'editorial', 'female', 'outcome assessment', 'personnel', 'publication', 'Female', 'Humans', 'Outcome Assessment (Health Care)', 'Periodicals as Topic', 'Research Personnel', "Women's Health"], 'publisher': 'Academic Press Inc.', 'notes': ['Export Date: 14 July 2020', 'CODEN: GYNOA', 'Correspondence Address: Karlan, B.Y.; Gynecologic Oncology and Gynecologic Oncology ReportsIndia'], 'type_of_work': 'Editorial', 'name_of_database': 'Scopus', 'custom2': '25199578', 'language': 'English', 'url': 'https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908351159&doi=10.1016%2fj.ygyno.2014.05.005&partnerID=40&md5=ab5a4d26d52c12d081e38364b0c79678'}
I tried iterating over the generator and applying the changes. But the records that have matches are not being placed in the match list.
match = []
for entry in ris_records:
if entry['doi'] in doi_match:
match.append(entry)
else:
del entry
any advice on how to iterate over a generator correctly, thanks.

How to check frequency of every unique value from pandas data-frame?

If I have a data-frame of 2000 and in which let say brand have 142 unique values and i want to count frequency of every unique value form 1 to 142.values should change dynamically.
brand=clothes_z.brand_name
brand.describe(include="all")
unique_brand=brand.unique()
brand.describe(include="all"),unique_brand
Output:
(count 2613
unique 142
top Mango
freq 54
Name: brand_name, dtype: object,
array(['Jack & Jones', 'TOM TAILOR DENIM', 'YOURTURN', 'Tommy Jeans',
'Alessandro Zavetti', 'adidas Originals', 'Volcom', 'Pier One',
'Superdry', 'G-Star', 'SIKSILK', 'Tommy Hilfiger', 'Karl Kani',
'Alpha Industries', 'Farah', 'Nike Sportswear',
'Calvin Klein Jeans', 'Champion', 'Hollister Co.', 'PULL&BEAR',
'Nike Performance', 'Even&Odd', 'Stradivarius', 'Mango',
'Champion Reverse Weave', 'Massimo Dutti', 'Selected Femme Petite',
'NAF NAF', 'YAS', 'New Look', 'Missguided', 'Miss Selfridge',
'Topshop', 'Miss Selfridge Petite', 'Guess', 'Esprit Collection',
'Vero Moda', 'ONLY Petite', 'Selected Femme', 'ONLY', 'Dr.Denim',
'Bershka', 'Vero Moda Petite', 'PULL & BEAR', 'New Look Petite',
'JDY', 'Even & Odd', 'Vila', 'Lacoste', 'PS Paul Smith',
'Redefined Rebel', 'Selected Homme', 'BOSS', 'Brave Soul', 'Mind',
'Scotch & Soda', 'Only & Sons', 'The North Face',
'Polo Ralph Lauren', 'Gym King', 'Selected Woman', 'Rich & Royal',
'Rooms', 'Glamorous', 'Club L London', 'Zalando Essentials',
'edc by Esprit', 'OYSHO', 'Oasis', 'Gina Tricot',
'Glamorous Petite', 'Cortefiel', 'Missguided Petite',
'Missguided Tall', 'River Island', 'INDICODE JEANS',
'Kings Will Dream', 'Topman', 'Esprit', 'Diesel', 'Key Largo',
'Mennace', 'Lee', "Levi's®", 'adidas Performance', 'jordan',
'Jack & Jones PREMIUM', 'They', 'Springfield', 'Benetton', 'Fila',
'Replay', 'Original Penguin', 'Kronstadt', 'Vans', 'Jordan',
'Apart', 'New look', 'River island', 'Freequent', 'Mads Nørgaard',
'4th & Reckless', 'Morgan', 'Honey punch', 'Anna Field Petite',
'Noisy may', 'Pepe Jeans', 'Mavi', 'mint & berry', 'KIOMI', 'mbyM',
'Escada Sport', 'Lost Ink', 'More & More', 'Coffee', 'GANT',
'TWINTIP', 'MAMALICIOUS', 'Noisy May', 'Pieces', 'Rest',
'Anna Field', 'Pinko', 'Forever New', 'ICHI', 'Seafolly', 'Object',
'Freya', 'Wrangler', 'Cream', 'LTB', 'G-star', 'Dorothy Perkins',
'Carhartt WIP', 'Betty & Co', 'GAP', 'ONLY Tall', 'Next', 'HUGO',
'Violet by Mango', 'WEEKEND MaxMara', 'French Connection'],
dtype=object))
As it is showing only frequency of Mango "54" because it is top frequency and I want every value frequency like what is the frequency of Jack & Jones, TOM TAILOR DENIM and YOURTURN and so on... and values should change dynamically.
You could simply do,
clothes_z.brand_name.value_counts()
This would list down the unique values and would give you the frequency of every element in that Pandas Series.
from collections import Counter
ll = [...your list of brands...]
c = Counter(ll)
# you can do whatever you want with your counted values
df = pd.DataFrame.from_dict(c, orient='index', columns=['counted'])

How to extract multiple table tag using beautifulsoup?

I'm trying to extract data from cvedetails.com for the product windows 10 and in the page source, there is a table. There is one tr for the details of the vulnerabilities and one tr for the description of the vulnerability
I want to be able to extract both tr as they are correlated
#!/usr/bin/python
import requests
r = requests.get('https://www.cvedetails.com/vulnerability-list.php? vendor_id=26&product_id=32238&version_id=&page=1&hasexp=0&opdos=0&opec=0&opov= 0&opcsrf=0&opgpriv=0&opsqli=0&opxss=0&opdirt=0&opmemc=0&ophttprs=0&opbyp=0&opfileinc=0&opginf=0&cvssscoremin=0&cvssscoremax=0&year=0&month=0&cweid=0&order=1&trc=845&sha=41e451b72c2e412c0a1cb8cb1dcfee3d16d51c44')
#print(r.text[0:500])
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text,'html.parser')
#results = soup.find_all('tr',attrs={'class':'srrowns'})
#resultdesc = soup.find_all('td',attrs={'class':'cvesummarylong'})
#print(results[0:3])
#print(resultdesc[0:3])
results = soup.find_all(('tr',attrs={'class':'srrowns'}),('td',attrs=
{'class':'cvesummarylong'}))
print(results[0:3])
The commented lines above is the one that succeeded but as separated values
</tr>
<tr class="srrowns">
<td class="num">
<a name="y2019"> </a>
1 </td>
<td nowrap>CVE-2019-0879</td>
<td>119</td>
<td class="num">
<b style="color:red">
</b>
</td>
<td>
Exec Code Overflow </td>
<td>2019-04-09</td>
<td>2019-05-08</td>
<td><div class="cvssbox" style="background-color:#ff9c20">7.2</div></td>
<td align="center">None</td>
<td align="center">Local</td>
<td align="center">Low</td>
<td align="center">Not required</td>
<td align="center">Complete</td>
<td align="center">Complete</td>
<td align="center">Complete</td>
</tr>
<tr>
<td class="cvesummarylong" colspan="20">
A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0847, CVE-2019-0851, CVE-2019-0877. </td>
</tr>
<tr class="srrowns">
<td class="num">
<a name="y2019"> </a>
2 </td>
<td nowrap>CVE-2019-0877</td>
<td>119</td>
<td class="num">
<b style="color:red">
</b>
</td>
<td>
Exec Code Overflow </td>
<td>2019-04-09</td>
<td>2019-05-08</td>
<td><div class="cvssbox" style="background-color:#ff9c20">7.2</div></td>
<td align="center">None</td>
<td align="center">Local</td>
<td align="center">Low</td>
<td align="center">Not required</td>
<td align="center">Complete</td>
<td align="center">Complete</td>
<td align="center">Complete</td>
</tr>
<tr>
<td class="cvesummarylong" colspan="20">
A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0847, CVE-2019-0851, CVE-2019-0879. </td>
</tr>
I want the results to be extracted in one line with the cve numbers, severity, etc. with the description
but the only method I tried that extracted both is separated
The end result is i need the details in the table and the description and be able to output them in a .csv file
It's not clear which information exactly you are looking for, but there are a number of tables in that page and you can extract them - without regex. For example:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('your url')
soup = bs(r.content, 'lxml')
tables = soup.find_all('table')
my_table = pd.read_html(str(tables[4]))
To get the first row of that particular table:
print(my_table[0].iloc[0,:].dropna(axis=0,how='all'))
Output:
# 1
CVE ID CVE-2019-0879
CWE ID 119
Vulnerability Type(s) Exec Code Overflow
Publish Date 2019-04-09
Update Date 2019-05-08
Score 7.2
Gained Access Level None
Access Local
Complexity Low
Authentication Not required
Conf. Complete
Integ. Complete
Avail. Complete
You can play with the index numbers of the tables, and see what else you can discover...
You can scrape the full table, and then access your desired parameters:
from bs4 import BeautifulSoup as soup
import requests, re
d = soup(requests.get('https://www.cvedetails.com/vulnerability-list.php?%20vendor_id=26&product_id=32238&version_id=&page=1&hasexp=0&opdos=0&opec=0&opov=%200&opcsrf=0&opgpriv=0&opsqli=0&opxss=0&opdirt=0&opmemc=0&ophttprs=0&opbyp=0&opfileinc=0&opginf=0&cvssscoremin=0&cvssscoremax=0&year=0&month=0&cweid=0&order=1&trc=845&sha=41e451b72c2e412c0a1cb8cb1dcfee3d16d51c44').text, 'html.parser')
_t = d.find('table', {'id':'vulnslisttable'})
headers, [_, *data] = [re.sub('^[\t\n]+|[\t\n]+$', '', i.text) for i in _t.find_all('th')], [[re.sub('^[\s\t\n]+|[\t\n]+$', '', i.text) for i in b.find_all('td')] for b in _t.find_all('tr')]
result = [{**dict(zip(headers, data[i])), 'summary':data[i+1][0]} for i in range(0, len(data), 2)]
Output (shortened, due to SO's character limit):
[{'#': '1', 'CVE ID': 'CVE-2019-0879', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0847, CVE-2019-0851, CVE-2019-0877."}, {'#': '2', 'CVE ID': 'CVE-2019-0877', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0847, CVE-2019-0851, CVE-2019-0879."}, {'#': '3', 'CVE ID': 'CVE-2019-0859', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "An elevation of privilege vulnerability exists in Windows when the Win32k component fails to properly handle objects in memory, aka 'Win32k Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0685, CVE-2019-0803."}, {'#': '4', 'CVE ID': 'CVE-2019-0856', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.0', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Low', 'Authentication': 'Single system', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when Windows improperly handles objects in memory, aka 'Windows Remote Code Execution Vulnerability'."}, {'#': '5', 'CVE ID': 'CVE-2019-0853', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-15', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists in the way that the Windows Graphics Device Interface (GDI) handles objects in the memory, aka 'GDI+ Remote Code Execution Vulnerability'."}, {'#': '6', 'CVE ID': 'CVE-2019-0851', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0847, CVE-2019-0877, CVE-2019-0879."}, {'#': '7', 'CVE ID': 'CVE-2019-0849', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '4.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory, aka 'Windows GDI Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0802."}, {'#': '8', 'CVE ID': 'CVE-2019-0848', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the win32k component improperly provides kernel information, aka 'Win32k Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0814."}, {'#': '9', 'CVE ID': 'CVE-2019-0847', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0846, CVE-2019-0851, CVE-2019-0877, CVE-2019-0879."}, {'#': '10', 'CVE ID': 'CVE-2019-0846', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Windows Jet Database Engine improperly handles objects in memory, aka 'Jet Database Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0847, CVE-2019-0851, CVE-2019-0877, CVE-2019-0879."}, {'#': '11', 'CVE ID': 'CVE-2019-0845', 'CWE ID': '20', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the IOleCvt interface renders ASP webpage content, aka 'Windows IOleCvt Interface Remote Code Execution Vulnerability'."}, {'#': '12', 'CVE ID': 'CVE-2019-0844', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows kernel improperly handles objects in memory, aka 'Windows Kernel Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0840."}, {'#': '13', 'CVE ID': 'CVE-2019-0842', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists in the way that the VBScript engine handles objects in memory, aka 'Windows VBScript Engine Remote Code Execution Vulnerability'."}, {'#': '14', 'CVE ID': 'CVE-2019-0841', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-15', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "An elevation of privilege vulnerability exists when Windows AppX Deployment Service (AppXSVC) improperly handles hard links, aka 'Windows Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0730, CVE-2019-0731, CVE-2019-0796, CVE-2019-0805, CVE-2019-0836."}, {'#': '15', 'CVE ID': 'CVE-2019-0840', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows kernel improperly handles objects in memory, aka 'Windows Kernel Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0844."}, {'#': '16', 'CVE ID': 'CVE-2019-0839', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Terminal Services component improperly discloses the contents of its memory, aka 'Windows Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0838."}, {'#': '17', 'CVE ID': 'CVE-2019-0838', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when Windows Task Scheduler improperly discloses credentials to Windows Credential Manager, aka 'Windows Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0839."}, {'#': '18', 'CVE ID': 'CVE-2019-0837', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when DirectX improperly handles objects in memory, aka 'DirectX Information Disclosure Vulnerability'."}, {'#': '19', 'CVE ID': 'CVE-2019-0836', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '4.6', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'Partial', 'Avail.': 'Partial', 'summary': "An elevation of privilege vulnerability exists when Windows improperly handles calls to the LUAFV driver (luafv.sys), aka 'Windows Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0730, CVE-2019-0731, CVE-2019-0796, CVE-2019-0805, CVE-2019-0841."}, {'#': '20', 'CVE ID': 'CVE-2019-0821', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '4.0', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Low', 'Authentication': 'Single system', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists in the way that the Windows SMB Server handles certain requests, aka 'Windows SMB Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0703, CVE-2019-0704."}, {'#': '21', 'CVE ID': 'CVE-2019-0814', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-11', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the win32k component improperly provides kernel information, aka 'Win32k Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0848."}, {'#': '22', 'CVE ID': 'CVE-2019-0805', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '4.6', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'Partial', 'Avail.': 'Partial', 'summary': "An elevation of privilege vulnerability exists when Windows improperly handles calls to the LUAFV driver (luafv.sys), aka 'Windows Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0730, CVE-2019-0731, CVE-2019-0796, CVE-2019-0836, CVE-2019-0841."}, {'#': '23', 'CVE ID': 'CVE-2019-0803', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "An elevation of privilege vulnerability exists in Windows when the Win32k component fails to properly handle objects in memory, aka 'Win32k Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0685, CVE-2019-0859."}, {'#': '24', 'CVE ID': 'CVE-2019-0802', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '4.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory, aka 'Windows GDI Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0849."}, {'#': '25', 'CVE ID': 'CVE-2019-0797', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-08', 'Update Date': '2019-05-08', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "An elevation of privilege vulnerability exists in Windows when the Win32k component fails to properly handle objects in memory, aka 'Win32k Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0808."}, {'#': '26', 'CVE ID': 'CVE-2019-0796', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-05-08', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'None', 'Integ.': 'Partial', 'Avail.': 'None', 'summary': "An elevation of privilege vulnerability exists when Windows improperly handles calls to the LUAFV driver (luafv.sys), aka 'Windows Elevation of Privilege Vulnerability'. This CVE ID is unique from CVE-2019-0730, CVE-2019-0731, CVE-2019-0805, CVE-2019-0836, CVE-2019-0841."}, {'#': '27', 'CVE ID': 'CVE-2019-0795', 'CWE ID': '611', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-11', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Microsoft XML Core Services MSXML parser processes user input, aka 'MS XML Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0790, CVE-2019-0791, CVE-2019-0792, CVE-2019-0793."}, {'#': '28', 'CVE ID': 'CVE-2019-0794', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-11', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when OLE automation improperly handles objects in memory, aka 'OLE Automation Remote Code Execution Vulnerability'."}, {'#': '29', 'CVE ID': 'CVE-2019-0793', 'CWE ID': '611', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Microsoft XML Core Services MSXML parser processes user input, aka 'MS XML Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0790, CVE-2019-0791, CVE-2019-0792, CVE-2019-0795."}, {'#': '30', 'CVE ID': 'CVE-2019-0792', 'CWE ID': '611', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Microsoft XML Core Services MSXML parser processes user input, aka 'MS XML Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0790, CVE-2019-0791, CVE-2019-0793, CVE-2019-0795."}, {'#': '31', 'CVE ID': 'CVE-2019-0791', 'CWE ID': '611', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Microsoft XML Core Services MSXML parser processes user input, aka 'MS XML Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0790, CVE-2019-0792, CVE-2019-0793, CVE-2019-0795."}, {'#': '32', 'CVE ID': 'CVE-2019-0790', 'CWE ID': '611', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code ', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists when the Microsoft XML Core Services MSXML parser processes user input, aka 'MS XML Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0791, CVE-2019-0792, CVE-2019-0793, CVE-2019-0795."}, {'#': '33', 'CVE ID': 'CVE-2019-0786', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-09', 'Update Date': '2019-04-11', 'Score': '7.5', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'Partial', 'Avail.': 'Partial', 'summary': "An elevation of privilege vulnerability exists in the Microsoft Server Message Block (SMB) Server when an attacker with valid credentials attempts to open a specially crafted file over the SMB protocol on the same machine, aka 'SMB Server Elevation of Privilege Vulnerability'."}, {'#': '34', 'CVE ID': 'CVE-2019-0784', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-10', 'Score': '7.6', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'High', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists in the way that the ActiveX Data objects (ADO) handles objects in memory, aka 'Windows ActiveX Remote Code Execution Vulnerability'."}, {'#': '35', 'CVE ID': 'CVE-2019-0782', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows kernel fails to properly initialize a memory address, aka 'Windows Kernel Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0702, CVE-2019-0755, CVE-2019-0767, CVE-2019-0775."}, {'#': '36', 'CVE ID': 'CVE-2019-0776', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the win32k component improperly provides kernel information, aka 'Win32k Information Disclosure Vulnerability'."}, {'#': '37', 'CVE ID': 'CVE-2019-0775', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '1.9', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows kernel improperly handles objects in memory, aka 'Windows Kernel Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0702, CVE-2019-0755, CVE-2019-0767, CVE-2019-0782."}, {'#': '38', 'CVE ID': 'CVE-2019-0774', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '4.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows GDI component improperly discloses the contents of its memory, aka 'Windows GDI Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0614."}, {'#': '39', 'CVE ID': 'CVE-2019-0772', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists in the way that the VBScript engine handles objects in memory, aka 'Windows VBScript Engine Remote Code Execution Vulnerability'. This CVE ID is unique from CVE-2019-0665, CVE-2019-0666, CVE-2019-0667."}, {'#': '40', 'CVE ID': 'CVE-2019-0767', 'CWE ID': '200', '# of Exploits': '', 'Vulnerability Type(s)': '+Info ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-10', 'Score': '2.1', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Partial', 'Integ.': 'None', 'Avail.': 'None', 'summary': "An information disclosure vulnerability exists when the Windows kernel improperly initializes objects in memory.To exploit this vulnerability, an authenticated attacker could run a specially crafted application, aka 'Windows Kernel Information Disclosure Vulnerability'. This CVE ID is unique from CVE-2019-0702, CVE-2019-0755, CVE-2019-0775, CVE-2019-0782."}, {'#': '41', 'CVE ID': 'CVE-2019-0766', 'CWE ID': '264', '# of Exploits': '', 'Vulnerability Type(s)': '', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-09', 'Score': '7.2', 'Gained Access Level': 'None', 'Access': 'Local', 'Complexity': 'Low', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "An elevation of privilege vulnerability exists in Windows AppX Deployment Server that allows file creation in arbitrary locations. To exploit the vulnerability, an attacker would first have to log on to the system, aka 'Microsoft Windows Elevation of Privilege Vulnerability'."}, {'#': '42', 'CVE ID': 'CVE-2019-0765', 'CWE ID': '119', '# of Exploits': '', 'Vulnerability Type(s)': 'Exec Code Overflow ', 'Publish Date': '2019-04-08', 'Update Date': '2019-04-10', 'Score': '9.3', 'Gained Access Level': 'None', 'Access': 'Remote', 'Complexity': 'Medium', 'Authentication': 'Not required', 'Conf.': 'Complete', 'Integ.': 'Complete', 'Avail.': 'Complete', 'summary': "A remote code execution vulnerability exists in the way that comctl32.dll handles objects in memory, aka 'Comctl32 Remote Code Execution Vulnerability'."}, ...]
Something like as follows? With bs4 4.7.1 you can use :nth-child(odd) and :nth-child(even) for handling row issue and append description to appropriate row.
import requests
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
r = requests.get('https://www.cvedetails.com/vulnerability-list.php?%20vendor_id=26&product_id=32238&version_id=&page=1&hasexp=0&opdos=0&opec=0&opov=%200&opcsrf=0&opgpriv=0&opsqli=0&opxss=0&opdirt=0&opmemc=0&ophttprs=0&opbyp=0&opfileinc=0&opginf=0&cvssscoremin=0&cvssscoremax=0&year=0&month=0&cweid=0&order=1&trc=845&sha=41e451b72c2e412c0a1cb8cb1dcfee3d16d51c44')
soup = bs(r.content, 'lxml')
descs = [re.sub(r'\t+|(\n+)?',' ',item.text.strip()) for item in soup.select('#vulnslisttable tr:nth-child(odd)')[1:]] #
items = [ item for item in soup.select('#vulnslisttable tr:nth-child(even)')]
results = []
i = 0
for desc in descs:
row = [re.sub(r'\t+|(\n+)?',' ',item.text.strip()) for item in items[i].select('td')]
row.append(desc)
results.append(row)
i+=1
df = pd.DataFrame(results)
headers = [re.sub(r'\t+|(\n+)?',' ',item.text.strip()) for item in soup.select('#vulnslisttable th')]
headers.append('description')
df.columns = headers
print(df)
regex explanation
Sample output:

Categories

Resources