search a list of parameter in JSON list/object Python - python

I am using below data set which uses SQL to query the data.
The results are in JSON list and not object.
I try to do slicing but couldn't make it work on the list. So used SQL queries(Query 1) to filter the columns instead.
Now I am trying to take a list of parameters and search it in the query results so my results are filtered to only items in the list.
Example my_list['60', '61', '62', '63', '66', '67', '68', '69', '70', '71', '72]
I want the result to shows the 5 columns including only the above precincts.
Any hep appreciated, because I been stuck for days.
data_url='data.cityofnewyork.us'
data_set='nc67-uf89'
url = Socrata(data_url)
JSON_results = url.get(data_set, limit=100)
print(JSON_results)
Query1 = url.get(data_set, select="precinct, violation, fine_amount, interest_amount, issuing_agency" where= interest_amount>0",order="interest_amount DESC", limit=10)
print(Query1)
# this is the result of Query 1 {'precinct': '013', 'violation': 'COMML PLATES-UNALTERED VEHICLE', 'fine_amount': '495', 'interest_amount': '349.72', 'issuing_agency': 'TRAFFIC'}
if Query1['precinct']==mylist:
for i in Query1:
print (i)
#Below is output for JSON_results
[{'plate': 'GDP4579', 'state': 'NY', 'license_type': 'PAS', 'summons_number': '5104469750', 'issue_date': '11/17/2018', 'violation_time': '11:37A', 'violation': 'FAILURE TO STOP AT RED LIGHT', 'fine_amount': '50', 'penalty_amount': '25', 'interest_amount': '0', 'reduction_amount': '0', 'payment_amount': '75', 'amount_due': '0', 'precinct': '000', 'county': 'BK', 'issuing_agency': 'DEPARTMENT OF TRANSPORTATION', 'summons_image': {'url': 'http://nycserv.nyc.gov/NYCServWeb/ShowImage?searchID=VGxSRmQwNUVVVEpQVkdNeFRVRTlQUT09&locationName=_____________________', 'description': 'View Summons'}}]

I had to make some assumptions about the data structure of JSON_Results but hopefully this can get you in the right direction.
# This is the structure I am assuming for json results
import json
JSON_RESULTS = json.loads(
'[{"precinct":"001","fine":32},{"precinct":"002","fine":44},{"precinct":"003","fine":12}]')
print(JSON_RESULTS)
returns
[{'precinct': '001', 'fine': 32}, {'precinct': '002', 'fine': 44}, {'precinct': '003', 'fine': 12}]
if thats the case, then you should be able to filter these results in your for loop
my_list = ['60', '61', '62', '63', '66', '67', '68', '69', '70', '71', '72']
for jsonResult in JSON_RESULTS:
if jsonResult['precinct'] in my_list:
print(f"{jsonResult['precinct']},{jsonResult['violation']},{jsonResult['fine_amount']},{jsonResult['interest_amount']},{jsonResult['issuing_agency']}")
*edit: just print the columns you want

#### make sure my_list is str '013' not a int 013 cause '013' not equal to 013.
Query_1 = {'precinct': '013', 'violation': 'COMML PLATES-UNALTERED VEHICLE', 'fine_amount': '495', 'interest_amount': '349.72', 'issuing_agency': 'TRAFFIC'}
my_list = ['013','60', '61', '62', '63', '66', '67', '68', '69', '70', '71', '72']
if Query_1['precinct'] in my_list:
print(Query_1)

Related

Printing Specific Elements from a Nested List

I have a nested list that looks like this:
[['Student', 'Exam 1', 'Exam 2', 'Exam 3'],
['Thorny', '100', '90', '80'],
['Mac', '88', '99', '111'],
['Farva', '45', '56', '67'],
['Rabbit', '59', '61', '67'],
['Ursula', '73', '79', '83'],
['Foster', '89', '97', '101']]
I was able to pull out student names using the following:
`def get_students(grades):
student_names = []
for i in range(1,len(grades)):
student_names.append(grades[i][0])
return student_names `
The output was ['Thorny', 'Mac', 'Farva', 'Rabbit', 'Ursula', 'Foster']
Now I need to pull out the "Exam" headers. I used similar code but it is getting very different results.
`def get_assignments(grades):
exam_list = []
for i in range(6, len(grades)):
exam_list.append(grades[0][1 : ])
return exam_list`
This returns the output:
`[['Exam 1', 'Exam 2', 'Exam 3']]`
I have been looking here and at GeeksForGeeks to try and figure out what I am doing wrong. It seemed like it should be a simple tweaking of the first code to access a different part of the nested list but I can't seem to get it. Any tips would be appreciated.
EDIT: Apologies, my problem may not have been clear. The desired output is
['Exam 1', 'Exam 2', 'Exam 3']
Thank you!
this is a short solution with list Comprehension
lst=[['Student', 'Exam 1', 'Exam 2', 'Exam 3'], ['Thorny', '100', '90', '80'], ['Mac', '88', '99', '111'], ['Farva', '45', '56', '67'], ['Rabbit', '59', '61', '67'], ['Ursula', '73', '79', '83'], ['Foster', '89', '97', '101']]
lstStudents=[nlst[0] for nlst in lst]
lstExams=[nlst[1:] for nlst in lst[1:]]
print(lstStudents)
print(lstExams)
the out put will be:
['Student', 'Thorny', 'Mac', 'Farva', 'Rabbit', 'Ursula', 'Foster']
[['100', '90', '80'], ['88', '99', '111'], ['45', '56', '67'], ['59', '61', '67'], ['73', '79', '83'], ['89', '97', '101']]
Maybe consider turning your data in to a list of dictionaries, then we can make use of Python's list comprehensions to retrieve whatever you need.
grades = [['Student', 'Exam 1', 'Exam 2', 'Exam 3'], ['Thorny', '100', '90', '80'], ['Mac', '88', '99', '111'], ['Farva', '45', '56', '67'], ['Rabbit', '59', '61', '67'], ['Ursula', '73', '79', '83'], ['Foster', '89', '97', '101']]
headers = grades[0]
grades_list = [dict(zip(headers, student)) for student in grades[1:]]
print(*(grade for grade in grades_list if grade['Student']=='Rabbit'))
# Output {'Student': 'Rabbit', 'Exam 1': '59', 'Exam 2': '61', 'Exam 3': '67'}

How to iterate through all tags of a website in Python with Beautifulsoup?

I'm a newbie in this sector. Here is the website I need to crawling "http://py4e-data.dr-chuck.net/comments_1430669.html" and here is it source code "view-source:http://py4e-data.dr-chuck.net/comments_1430669.html"
It's a simple website for practice. The HTML code look something like:
<html>
<head>
<title>Welcome to the comments assignment from www.py4e.com</title>
</head>
<body>
<h1>This file contains the actual data for your assignment - good luck!</h1>
<table border="2">
<tr>
<td>Name</td><td>Comments</td>
</tr>
<tr><td>Melodie</td><td><span class="comments">100</span></td></tr>
<tr><td>Machaela</td><td><span class="comments">100</span></td></tr>
<tr><td>Rhoan</td><td><span class="comments">99</span></td></tr>
I need to get the number between comments and span (100,100,99)
Below is my code:
html=urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup=BeautifulSoup(html,'html.parser')
tag=soup.span
print(tag) #<span class="comments">100</span>
print(tag.string) #100
I got the number 100 but only the first one, now I want to get all of them by iterating through a list or sth like that. What is the method to do this with beautifulsoup?
import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.find_all("span")
for i in tags:
print(i.string)
You can use find_all() function and then iterate it to get the numbers.
If you want names also you can use python dictionary :
import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.find_all("span")
comments = {}
for index, tag in enumerate(tags):
commentorName = tag.find_previous('tr').text
commentorComments = tag.string
comments[commentorName] = commentorComments
print(comments)
This will give you the output as :
{'Melodie100': '100', 'Machaela100': '100', 'Rhoan99': '99', 'Murrough96': '96', 'Lilygrace93': '93', 'Ellenor93': '93', 'Verity89': '89', 'Karlie88': '88', 'Berlin85': '85', 'Skylar84': '84', 'Benny84': '84', 'Crispin81': '81', 'Asya79': '79', 'Kadi76': '76', 'Dua74': '74', 'Stephany73': '73', 'Eila71': '71', 'Jennah70': '70', 'Eduardo67': '67', 'Shannan61': '61', 'Chymari60': '60', 'Inez60': '60', 'Charlene59': '59', 'Rosalin54': '54', 'James53': '53', 'Rhy53': '53', 'Zein52': '52', 'Ayren50': '50', 'Marissa46': '46', 'Mcbride46': '46', 'Ruben45': '45', 'Mikee41': '41', 'Carmel38': '38', 'Idahosa37': '37', 'Brooklin37': '37', 'Betsy36': '36', 'Kayah34': '34', 'Szymon26': '26', 'Tea24': '24', 'Queenie24': '24', 'Nima23': '23', 'Eassan23': '23', 'Haleema21': '21', 'Rahma17': '17', 'Rob17': '17', 'Roma16': '16', 'Jeffrey14': '14', 'Yorgos12': '12', 'Denon11': '11', 'Jasmina7': '7'}
Try the following approach:
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_1430669.html').read()
soup = BeautifulSoup(html, 'html.parser')
data = []
for tr in soup.find_all('tr'):
row = [td.text for td in tr.find_all('td')]
data.append(row[1]) # or data.append(row) for both
print(data)
Giving you data holding a list containing just the one column:
['Comments', '100', '100', '99', '96', '93', '93', '89', '88', '85', '84', '84', '81', '79', '76', '74', '73', '71', '70', '67', '61', '60', '60', '59', '54', '53', '53', '52', '50', '46', '46', '45', '41', '38', '37', '37', '36', '34', '26', '24', '24', '23', '23', '21', '17', '17', '16', '14', '12', '11', '7']
First locate all of the table <tr> rows. Then extract all of the <td> values for each row. As you only want the second one, append row[1] to a data list holding your values.
You can skip the first one if needed with data[1:].
This approach would let you also save the name at the same time by appending the whole of row. e.g. use data.append(row) instead...
You could then display the entries using:
for name, comment in data[1:]:
print(name, comment)
Giving output starting:
Melodie 100
Machaela 100
Rhoan 99
Murrough 96
Lilygrace 93
Ellenor 93
Verity 89
Karlie 88

Python Loop: string Index Out of Range

Given the following csv file:
['offre_bfr.entreprise', 'offre_bfr.nombreemp', 'offre_bfr.ca2020', 'offre_bfr.ca2019', 'offre_bfr.ca2018', 'offre_bfr.benefice2020', 'offre_bfr.benefice2019', 'offre_bfr.benefice2018', 'offre_bfr.tauxrenta2020', 'offre_bfr.tauxrenta2019', 'offre_bfr.tauxrenta2018', 'offre_bfr.tauximposition', 'offre_bfr.chargesalariale', 'offre_bfr.chargesfixes', 'offre_bfr.agedirigeant', 'offre_bfr.partdirigeant', 'offre_bfr.agemoyact', 'offre_bfr.parttotaleact', 'offre_bfr.mtdmdcred', 'offre_bfr.creditusuel', 'offre_bfr.capipropres', 'offre_bfr.dettefin', 'offre_bfr.dettenonfin', 'offre_bfr.stock', 'offre_bfr.creances', 'offre_bfr.actifimmobilise', 'offre_bfr.passiftotal', 'offre_bfr.tresorerie', 'offre_bfr.capitalisation2020', 'offre_bfr.capitalisation2019', 'offre_bfr.capitalisation2018', 'offre_bfr.nivrisque', 'offre_bfr.indconfiance', 'offre_bfr.indperseverance', 'offre_bfr.score']
['1', '15', '1.84', '5.18', '7.96', '0.48', '1.19', '0.11', '26.086956', '22.972973', '1.3819095', '17.9', '0.035295', '1.2', '55', '33', '69', '67', '10', '14.98', '0.05', '0.04', '0.21', '0.1', '0.08', '0.41', '0.8', '0.0', '7.5', '52.8', '0.16', 'Bas', '4', '4', '5.0']
['3', '3030', '546.7', '589.7', '430.9', '62.58', '20.63', '99.06', '11.446863', '3.498389', '22.989092', '17.4', '7.12959', '270.9', '46', '37', '69', '73', '2973', '1567.3', '46.97', '13.39', '61.92', '3.0', '8.0', '145.0', '278.4', '-51.0', '1063.5', '3047.8', '538.08', 'Eleve', '4', '4', '3.0']
['4', '42', '4.28', '9.13', '8.99', '0.45', '0.59', '0.08', '10.514019', '6.4622126', '0.8898776', '31.5', '0.098826', '2.2', '70', '32', '53', '68', '9', '22.4', '0.13', '0.06', '0.31', '0.1', '0.07', '0.92', '1.7', '-0.3', '42.5', '69.5', '2.73', 'Eleve', '4', '4', '3.0']
['5', '497', '92.2', '62.5', '40.3', '20.14', '6.91', '4.92', '21.843819', '11.056', '12.208437', '32.2', '1.169441', '5.1', '64', '32', '70', '68', '197', '195.0', '6.07', '1.83', '12.49', '5.9', '3.83', '16.41', '16.5', '-2.7', '1048.3', '618.8', '11.24', 'Moyen', '4', '4', '4.0']
['8', '122', '67.8', '24.5', '91.4', '12.67', '5.69', '8.43', '18.687315', '23.22449', '9.223195', '24.8', '0.287066', '19.5', '53', '35', '61', '65', '424', '183.7', '1.64', '1.92', '6.48', '4.9', '2.45', '23.6', '23.7', '-3.5', '204.2', '109.5', '5.33', 'Eleve', '4', '4', '3.0']
['11', '310', '77.5', '78.7', '24.9', '8.05', '21.76', '1.79', '10.387096', '27.649302', '7.188755', '29.0', '0.72943', '12.0', '47', '32', '65', '68', '38', '181.1', '6.55', '3.27', '8.16', '5.1', '2.08', '15.09', '36.3', '-7.0', '669.8', '705.3', '22.95', 'Eleve', '4', '4', '3.0']
['14', '283', '91.9', '52.9', '51.9', '10.48', '7.01', '12.57', '11.4037', '13.251418', '24.219654', '24.2', '0.665899', '2.3', '61', '29', '58', '71', '60', '196.7', '8.02', '2.93', '7.79', '7.0', '3.87', '25.1', '42.7', '-4.4', '434.0', '143.4', '17.18', 'Eleve', '4', '4', '3.0']
['16', '41', '5.54', '6.48', '5.5', '1.55', '1.51', '0.73', '27.97834', '23.30247', '13.272727', '15.9', '0.096473', '2.4', '71', '39', '56', '61', '29', '17.52', '0.41', '0.11', '0.62', '0.3', '0.17', '1.47', '2.4', '0.0', '36.7', '76.0', '4.2', 'Bas', '4', '4', '5.0']
I would like to create a bar chart from columns 0 and 34 of the csv file.
Here is the python script I am running:
# -*-coding:Latin-1 -*
#!/usr/bin/python
#!/usr/bin/env python
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
import csv
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv​'))
linesBfr = list(Bfr)
i=1
for l in linesBfr:
x.append(l[i][0])
y.append(int(l[i][34]))
plt.bar(x, y, color = 'g', width = 0.72, label = "Score")
plt.xlabel('Entreprise')
plt.ylabel('Scores')
plt.title('Scores des entreprises en BFR')
plt.legend()
plt.show()
But i'm getting the following error:
Traceback (most recent call last):
File "barplot.py", line 20, in <module>
y.append(int(l[i][34]))
IndexError: string index out of range
Can someone help me out?
Python lists are zero-indexed. You are trying to iterate to the 35th element in a 34 element list.
Firstly, there are 35 elements from 0 to 34. This means that starting your indexing i at i=1 will look for an element at the 35th index, which does not exist, or be an "index out of range". To be more specific, your code is looking for a list that does not exist. Secondly, this is not the standard way to use 2d lists in python. I suggest using a method more as such:
https://www.kite.com/python/answers/how-to-append-to-a-2d-list-in-python#:~:text=Append%20a%20list%20to%20a,list%20to%20the%202D%20list.
Hope this was helpful.
You probably meant to write this:
x = []
y = []
Bfr = csv.reader(open('/home/cloudera/PMGE/Bfr.csv​'))
next(Bfr , None) # skip the header
for l in Bfr:
x.append(int(l[0]))
y.append(int(l[34]))
...
(See this question about skipping the header of a csv)

Combining two dictionaries into one if the value from the one dictionary exist in another

I am trying to construct a dictionary called author_venues in which author names are the keys and values are the list of venues where they have published.
I was given two dictionaries:
A sample author_pubs dictionary where author name is the key and a list of publication ids is the value
defaultdict(list,
{'José A. Blakeley': ['2',
'25',
'2018',
'2185',
'94602',
'145114',
'182779',
'182780',
'299422',
'299426',
'299428',
'299558',
'302125',
'511816',
'521294',
'597967',
'598123',
'598125',
'598130',
'598132',
'598134',
'598136',
'598620',
'600180',
'600221',
'642049',
'643606',
'808458',
'832249',
'938531',
'939047',
'1064640',
'1064641',
'1065929',
'1118153',
'1269074',
'2984279',
'3154713',
'3169639',
'3286099',
'3494140'],
'Yuri Breitbart': ['3',
'4',
'76914',
'113875',
'140847',
'147900',
'147901',
'150951',
'176221',
'176896',
'182963',
'200336',
'262940',
'285098',
'285564',
'299526',
'301313',
'303418',
'304160',
'400040',
'400041',
'400174',
'400175',
'402178',
'482506',
'482785',
'544757',
'545233',
'545429',
'559737',
'559761',
'559765',
'559783',
'559785',
'597889',
'598201',
'598202',
'598203',
'599325',
'599899',
'620806',
'636455',
'641884',
'642157',
'654200',
'654201',
'740600',
'740602',
'833336',
'844280',
'856032',
'856222',
'888870',
'934979',
'938228',
'941484',
'945339',
'949548',
'971592',
'971593',
'972813',
'972958',
'1064100',
'1064690',
'1064691',
'1064693',
'1064694',
'1078369',
'1078370',
'1089675',
'1095084',
'1121956',
'1122006',
'1122610',
'1127610',
'1138059',
'1138061',
'1141938',
'1227365',
'1278703',
'1319498',
'2818906',
'2876867',
'2978458',
'3015058',
'3223418'],
A sample venue_pubs dictionary where venue name is the key and a list of publication ids is the value
defaultdict(list,
{'Modern Database Systems': ['2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'10',
'11',
'12',
'13',
'14',
'15',
'16',
'17',
'18',
'19',
'20',
'21',
'22',
'23',
'24',
'25',
'26',
'27',
'28',
'29',
'30',
'31',
'32',
'33',
'34',
'1203459',
'3000615',
'3000616',
'3000617',
'3000618',
'3000619',
'3000620',
'3000621',
'3000622',
'3000623',
'3000624',
'3000625',
'3000626'],
'Object-Oriented Concepts, Databases, and Applications': ['36',
'37',
'38',
'39',
'40',
'41',
'42',
'43',
'44',
'45',
'46',
'47',
'48',
'49',
'50',
'51',
'52',
'53',
'54',
'55',
'56',
'57',
'58',
'59'],
'The INGRES Papers': ['60',
'61',
'62',
'63',
'64',
'65',
'66',
'67',
'68',
'69'],
'Temporal Databases': ['168',
'169',
'170',
'171',
'172',
'173',
'174',
'175',
'176',
'177',
'178',
'179',
'180',
'181',
'182',
'183',
'184',
'185',
'186',
'187',
'188',
'189',
'190',
'627582',
'627584',
'627588',
'627589',
'627591',
'627592',
'627593',
'627594',
'627596',
'627600',
'627601',
'627602',
'627603',
'627604',
'627605',
'627608',
'627613',
'627615',
'627616',
'627617'],
The resulting dictionary should look like {'author':['venue1','venue2','venue3']}
author_venue = defaultdict(list)
This is code I wrote:
for k,v in author_pubs.items():
for item in v:
for x,y in venue_pubs.items():
if item in y:
venue = x
author_venue[k].append(venue)
But this loop takes forever since I have over 3million records
please help!
You can "invert" the dictionary venue_pubs to speed up the search:
from collections import defaultdict
author_pubs = {
"author1": [1, 2, 3],
"author2": [3, 4, 5],
}
venue_pubs = {
"xxx1": [1, 4, 20],
"xxx2": [4, 30, 40],
}
# "invert" dictionary `venue_pubs`:
tmp = defaultdict(list)
for k, v in venue_pubs.items():
for val in v:
tmp[val].append(k)
author_venue = defaultdict(list)
for k, v in author_pubs.items():
for item in v:
venues = tmp.get(item)
if not venues is None:
author_venue[k].extend(venues)
print(author_venue)
Prints:
defaultdict(<class 'list'>, {'author1': ['xxx1'], 'author2': ['xxx1', 'xxx2']})
EDIT: To remove duplicates:
# ...
for k in author_venue:
author_venue[k] = list(set(author_venue[k]))
print(author_venue)

OverWrite an existing list in CSV files Python

I have a csv files with player attributes:
['Peter Regin', '2', 'DAN', 'N', '1987', '6', '6', '199', '74', '2', '608000', '', '77', '52', '74', '72', '58', '72', '71', '72', '70', '72', '74', '68', '74', '41', '40', '51']
['Andrej Sekera', '8', 'SVK', 'N', '1987', '6', '6', '198', '72', '3', '1323000', '', '65', '39', '89', '78', '75', '70', '72', '56', '53', '56', '57', '72', '57', '59', '70', '51']
For example, I want to check if a player is a CENTER ('2' in position 1 in my list) and after I want to modify the 12 element (which is '77' for Peter Regin)
How can I do that using the CSV module ?
import csv
class ManipulationFichier:
def __init__(self, fichier):
self.fichier = fichier
def read(self):
with open(self.fichier) as f:
reader = csv.reader(f)
for row in reader:
print(row)
def write(self):
with open(self.fichier) as f:
writer = csv.writer(f)
for row in f:
if row[1] == 2:
writer.writerows(row[1] for row in f)
Which do nothing important..
Thanks,
In general, CSV files cannot be reliably modified in-place.
Read the entire file into memory (usually a list of lists, as in your example), modify the data, then write the entire file back.
Unless your file is really huge, and you do this really often, the performance hit will be negligible.

Categories

Resources