I have a code that should write information to excel using selenium. I have 1 list with some information. I need to write all this to excel, and i have solution. But, when i tried to use it i got 'DataFrame' object is not callable. How can i solve it?
All this code into iteration:
for schools in List: #in the List i have data from excel file with Name of schools
data = pd.DataFrame()
data({
"School Name":School_list_result[0::17],
"Principal":School_list_result[1::17],
"Principal's E-mail":School_list_result[2::17],
"Type":School_list_result[8::17],
"Grade Span": School_list_result[3::17],
"Address":School_list_result[4::17],
"Phone":School_list_result[14::17],
"Website":School_list_result[13::17],
"Associations/Communities":School_list_result[5::17],
"GreatSchools Summary Rating":School_list_result[6::17],
"U.S.News Rankings":School_list_result[12::17],
"Total # Students":School_list_result[15::17],
"Full-Time Teachers":School_list_result[16::17],
"Student/Teacher Ratio":School_list_result[17::17],
"Charter":School_list_result[9::17],
"Enrollment by Race/Ethnicity": School_list_result[7::17],
"Enrollment by Gender":School_list_result[10::17],
"Enrollment by Grade":School_list_result[11::17],
})
data.to_excel("D:\Schools.xlsx")
In School_list_result i have this data:
'Cape Elizabeth High School',
'Mr. Jeffrey Shedd',
'No data.',
'9-12',
'345 Ocean House Road, Cape Elizabeth, ME 04107',
'Cape Elizabeth Public Schools',
'8/10',
'White\n91%\nAsian\n3%\nTwo or more races\n3%\nHispanic\n3%\nBlack\n1%',
'Regular school',
'No',
' Male Female\n Students 281 252',
' 9 10 11 12\n Students 139 135 117 142',
'#5,667 in National Rankings',
'https://cehs.cape.k12.me.us/',
'Tel: (207)799-3309',
'516 students',
'47 teachers',
'11:1',
Please follow the syntax about how to create a dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
So your code should be modified as:
for schools in List: #in the List i have data from excel file with Name of schools
data = pd.DataFrame(data={
"School Name": School_list_result[0::17],
"Principal": School_list_result[1::17],
"Principal's E-mail": School_list_result[2::17],
"Type": School_list_result[8::17],
"Grade Span": School_list_result[3::17],
"Address": School_list_result[4::17],
"Phone": School_list_result[14::17],
"Website": School_list_result[13::17],
"Associations/Communities": School_list_result[5::17],
"GreatSchools Summary Rating": School_list_result[6::17],
"U.S.News Rankings": School_list_result[12::17],
"Total # Students": School_list_result[15::17],
"Full-Time Teachers": School_list_result[16::17],
"Student/Teacher Ratio": School_list_result[17::17],
"Charter": School_list_result[9::17],
"Enrollment by Race/Ethnicity": School_list_result[7::17],
"Enrollment by Gender": School_list_result[10::17],
"Enrollment by Grade": School_list_result[11::17],
})
Do you want to add in an existing xlsx file?
First, create the dictionary and then call the DataFrame method, like this:
r = {"column1":["data"], "column2":["data"]}
data = pd.DataFrame(r)
Related
I am fighting with the query as I have to count What are the total CO2 emissions by each Airline?
I have managed to get all the data to a dictionary which looks like
Flight Number is 2HX and the airline is IT and the aircraft is E195 going from EDDF to LIMF
Flight distance is 542.93 km
Flight CO2 emissions is 16.87 kg
Flight Number is 8031 and the airline is ES and the aircraft is B752 going from LEBL to EDDP
Flight distance is 1365.97 km
Flight CO2 emissions is 31.07 kg
Flight Number is 39DV and the airline is ES and the aircraft is A320 going from LEPA to LEMD
Flight distance is 546.33 km
Flight CO2 emissions is 16.92 kg
All calculations are done by all of the flights but I would like to group them based on the AIRLINE, thus increasing the total results for them and printing them accordingly.
Any ideas how I could start it?
JSON file loaded looks like this
[{"hex": "150694", "reg_number": "RA-67220", "flag": "RU", "lat": 51.633911, "lng": 50.050518, "alt": 11582, "dir": 290, "speed": 761, "v_speed": 0.3, "squawk": "0507", "flight_number": "9004", "flight_icao": "TUL9004", "dep_icao": "VIDP", "dep_iata": "DEL", "airline_icao": "PLG", "aircraft_icao": "CRJ2", "updated": 1675528289, "status": "en-route"}, {"hex": "152038", "reg_number": "RA-73784", "flag": "RU", "lat": 43.352108, "lng": 35.634342, "alt": 11277, "dir": 4, "speed": 881, "v_speed": 0, "squawk": "7313", "flight_number": "427", "flight_icao": "AFL427", "flight_iata": "SU427", "dep_icao": "HESH", "dep_iata": "SSH", "arr_icao": "UUEE", "arr_iata": "SVO", "airline_icao": "AFL", "airline_iata": "SU", "aircraft_icao": "A333", "updated": 1675528054, "status": "en-route"}, {"hex": "152052", "reg_number": "RA-73810", "flag": "RU", "lat": 59.739784, "lng": 85.652138, "alt": 9745, "dir": 89, "speed": 801, "v_speed": 0, "squawk": "5521", "flight_number": "173", "flight_icao": "SVR173", "flight_iata": "U6173", "dep_icao": "USSS", "dep_iata": "SVX", "arr_icao": "UHHH", "arr_iata": "KHV", "airline_icao": "SVR", "airline_iata": "U6", "aircraft_icao": "A319", "updated": 1675528294, "status": "en-route"}
Basically function for listing flights look like this, but I would like to group them by airlines and add value of co2 emissions to each of individual results
def list_all_flights(self):
#List all flights
total_result = 0
for i in self.flights_list.read_data_file(): # json.file
if(i.get('dep_icao') and i.get('arr_icao')):
print(f"Flight Number is {i['flight_number']} and the airline is {i['flag']} and the aircraft is {i['aircraft_icao']} going from {i['dep_icao']} to {i['arr_icao']}");
I have managed to count all encounters of different airline inside new dictionary and it works
if 'flag' in i:
temp[i['flag']] = temp.get(i['flag'], 0) + 1'
Now I would like to add the result for co2 emissions as a total for an airline.
By making use of Pandas module (you can install via pip install pandas) I made this.
import pandas as pd
df = pd.read_json("data.json")
result = {a_iata:[] for a_iata in df.airline_iata.unique()}
for a_iata in result:
result[a_iata] = df.loc[df.airline_iata == a_iata]
Where data.json is the data that you have provided. The code essentially filtrates every entry by its airline_iata value and stores them into individual DataFrames. You can see the data by using result['AIRLINE_CODE'] and it will return the DataFrame.
When you are constructing your message, you can use something like this:
temp_df = result['AIRLINE_CODE']
message = f"Flight reg number is {temp_df.reg_number}..."
You can fill the message out however you like.
I have a dataframe
import pandas as pd
data = {
"ID": [123123, 222222, 333333],
"Main Authors": ["[Jim Allen, Tim H]", "[Rob Garder, Harry S, Tim H]", "[Wo Shu, Tee Ru, Fuu Wan, Gee Han]"],
"Abstract": ["This is paper about hehe", "This paper is very nice", "Hello there paper from kellogs"],
"paper IDs": ["[123768, 123123]", "[123432, 34345, 353545, 454545]", "[123123, 3433434, 55656655, 988899]"],
}
and I am trying to export it to a JSON schema. I do so via
df.to_json(orient='records')
'[{"ID":123123,"Main Authors":"[Jim Allen, Tim H]","Abstract":"This is paper about hehe","paper IDs":"[123768, 123123]"},
{"ID":222222,"Main Authors":"[Rob Garder, Harry S, Tim H]","Abstract":"This paper is very nice","paper IDs":"[123432, 34345, 353545, 454545]"},
{"ID":333333,"Main Authors":"[Wo Shu, Tee Ru, Fuu Wan, Gee Han]","Abstract":"Hello there paper from kellogs","paper IDs":"[123123, 3433434, 55656655, 988899]"}]'
but this is not in the right format for JSON. How can I get my output to look like this
{"ID": "123123", "Main Authors": ["Jim Allen", "Tim H"], "Abstract": "This is paper about hehe", "paper IDs": ["123768", "123123"]}
{and so on for paper 2...}
I can't find an easy way to achieve this schema with the basic functions.
to_json returns a proper JSON document. What you want is not a JSON document.
Add lines=True to the call:
df.to_json(orient='records', lines=True)
The output you desire is not valid JSON. It's a very common way to stream JSON objects though: write one unindented JSON object per line.
Streaming JSON is an old technique, used to write JSON records to logs, send them over the network etc. There's no specification for this, but a lot of people tried to hijack it, even creating sites that mirrored Douglas Crockford's original JSON site, or mimicking the language of RFCs.
Streaming JSON formats are used a lot in IoT and event processing applications, where events will arrive over a long period of time.
PS: I remembered I saw a few months ago a question about json-seq. Seems there was an attempt to standardize streaming JSON RFC 7464 as JSON Sequences, using the mime type application/json-seq.
You can convert DataFrame to list of dictionaries first.
import pandas as pd
data = {
"ID": [123123, 222222, 333333],
"Main Authors": [["Jim Allen", "Tim H"], ["Rob Garder", "Harry S", "Tim H"], ["Wo Shu", "Tee Ru", "Fuu Wan", "Gee Han"]],
"Abstract": ["This is paper about hehe", "This paper is very nice", "Hello there paper from kellogs"],
"paper IDs": [[123768, 123123], [123432, 34345, 353545, 454545], [123123, 3433434, 55656655, 988899]],
}
df = pd.DataFrame(data)
df.to_dict('records')
The result:
[{'ID': 123123,
'Main Authors': ['Jim Allen', 'Tim H'],
'Abstract': 'This is paper about hehe',
'paper IDs': [123768, 123123]},
{'ID': 222222,
'Main Authors': ['Rob Garder', 'Harry S', 'Tim H'],
'Abstract': 'This paper is very nice',
'paper IDs': [123432, 34345, 353545, 454545]},
{'ID': 333333,
'Main Authors': ['Wo Shu', 'Tee Ru', 'Fuu Wan', 'Gee Han'],
'Abstract': 'Hello there paper from kellogs',
'paper IDs': [123123, 3433434, 55656655, 988899]}]
Is that what you are looking for?
I am learning how to get data from arrays and I am slightly stuck on an easy way of locating where that data is to pull it from the array. It feels like there should be an easier way than counting on the screen.
Here is what I have:
r2 = requests.get(
f'https://www.thesportsdb.com/api/v1/json/{apiKey}/lookupevent.php?id={id}')
arr_events = np.array([r2.json()])
#print(arr_events)
event_id = arr_events[0]['events'][0]['idEvent']
locate = arr_events.index('strHomeTeam')
print(locate)
The problem is, on the console this prints out a massive array that looks like (I'll give one line, you probably get the idea):
[{'events': [{'idEvent': '1032723', 'idSoccerXML': None, 'idAPIfootball': '592172', 'strEvent': 'Aston Villa vs Liverpool', 'strEventAlternate': 'Liverpool # Aston Villa', 'strFilename': 'English Premier League 2020-10-04 Aston Villa vs Liverpool'...}]}]
It's a sizeable array, enough to cause a minor slowdown if I neced to pull some info.
So, idEvent was easy to pull using the method above. And if I wanted some of these others in the top line, proabably not hard to count to 5 or 6. But I know there must be an easier way for Python to just locate the ones I want. For instance, I want the home and away team:
'strHomeTeam': 'Aston Villa', 'strAwayTeam': 'Liverpool',
So is there an easier way to just pull the 'strHomeTeam' rather than counting all the way to the point in the array?
I realise this is a basic question - and I have searched and searched, but everything seems to be in a single, really small array and they don't seem to explain getting the data from big arrays easily.
The JSON file is here: https://www.thesportsdb.com/api/v1/json/1/lookupevent.php?id=1032723
Thank you for your help on this - I appreciate it.
So is there an easier way to just pull the 'strHomeTeam' rather than counting all the way to the point in the array?
Try the below
data = {"events": [
{"idEvent": "1032723", "idSoccerXML": "", "idAPIfootball": "592172", "strEvent": "Aston Villa vs Liverpool",
"strEventAlternate": "Liverpool # Aston Villa",
"strFilename": "English Premier League 2020-10-04 Aston Villa vs Liverpool", "strSport": "Soccer",
"idLeague": "4328", "strLeague": "English Premier League", "strSeason": "2020-2021",
"strDescriptionEN": "Aston Villa and Liverpool square off at Villa Park, where last season, these teams produced one of the most exciting finishes of the campaign, as Liverpool scored twice late on to overturn an early Trezeguet goal.",
"strHomeTeam": "Aston Villa", "strAwayTeam": "Liverpool", "intHomeScore": "7", "intRound": "4",
"intAwayScore": "2", "intSpectators": "", "strOfficial": "", "strHomeGoalDetails": "", "strHomeRedCards": "",
"strHomeYellowCards": "", "strHomeLineupGoalkeeper": "", "strHomeLineupDefense": "",
"strHomeLineupMidfield": "", "strHomeLineupForward": "", "strHomeLineupSubstitutes": "",
"strHomeFormation": "", "strAwayRedCards": "", "strAwayYellowCards": "", "strAwayGoalDetails": "",
"strAwayLineupGoalkeeper": "", "strAwayLineupDefense": "", "strAwayLineupMidfield": "",
"strAwayLineupForward": "", "strAwayLineupSubstitutes": "", "strAwayFormation": "", "intHomeShots": "",
"intAwayShots": "", "strTimestamp": "2020-10-04T18:15:00+00:00", "dateEvent": "2020-10-04",
"dateEventLocal": "2020-10-04", "strDate": "", "strTime": "18:15:00", "strTimeLocal": "19:15:00",
"strTVStation": "", "idHomeTeam": "133601", "idAwayTeam": "133602", "strResult": "", "strVenue": "Villa Park",
"strCountry": "England", "strCity": "", "strPoster": "", "strFanart": "",
"strThumb": "https:\/\/www.thesportsdb.com\/images\/media\/event\/thumb\/r00vzl1601721606.jpg", "strBanner": "",
"strMap": "", "strTweet1": "https:\/\/twitter.com\/brfootball\/status\/1312843172385521665",
"strTweet2": "https:\/\/twitter.com\/TomJordan21\/status\/1312854281444306946",
"strTweet3": "https:\/\/twitter.com\/FutbolBible\/status\/1312847622592442370",
"strVideo": "https:\/\/www.youtube.com\/watch?v=0Nbw3jSafGM", "strStatus": "Match Finished", "strPostponed": "no",
"strLocked": "unlocked"}]}
filtered_data = [{'home':entry['strHomeTeam'],'away':entry['strAwayTeam']}for entry in data['events']]
print(filtered_data)
output
[{'home': 'Aston Villa', 'away': 'Liverpool'}]
Ug... I tried something different and it worked - sigh... I am sorry.
event_id = arr_events[0]['events'][0]['idEvent']
home_team = arr_events[0]['events'][0]['strHomeTeam']
away_team = arr_events[0]['events'][0]['strAwayTeam']
home_score = arr_events[0]['events'][0]['intHomeScore']
away_score = arr_events[0]['events'][0]['intAwayScore']
I assume this is the right way to do it.
You should look into
https://python-json-pointer.readthedocs.io/en/latest/tutorial.html
inspect the json, get the path you want to access the value -> use https://github.com/stefankoegl/python-json-pointer
I'm trying to figure out why my loop is printing everything in the dict and not just the values
films = {
"2005": ["Munich", "Steven Spielberg"],
"2006": [["The Prestige", "Christopher Nolan"], ["The Departed", "Martin Scorsese"]]
}
for year in movies:
print (year)
for x in films[year]:
print (films[year])
I would like it to print like this
2005
Munich, Steven Spielberg
2006
The prestige, Christopher Nolan
the Departed, Martin Scorsese
But instead its printing like this with brackets and apostrophes
2005
['Munich', 'Steven Spielberg']
You're not using x (I suggest a better name).
This code works in both Python 2.7 and Python 3.x:
films = {
"2005": ["Munich", "Steven Spielberg"],
"2006": [["The Prestige", "Christopher Nolan"], ["The Departed", "Martin Scorsese"]]
}
for year in sorted(films.keys()):
print(year)
if isinstance(films[year][0], list):
films_list = films[year]
else:
films_list = [films[year]]
for film in films_list:
print(", ".join(film))
print("")
Instead of
print (films[year])
use
print (", ".join(films[x]))
This construction joins members of the list films[x] (yes - [x], not [year]) using the string
", " (comma and a space) as separators between individual members.
I am trying to get the country name from the latitude and longitude points from my pandas dataframe.
Currently I have used geolocator.reverse(latitude,longitude) to get the full address of the geographic location. But there is no option to retrieve the country name from the full address as it returns a list.
Method used:
def get_country(row):
pos = str(row['StartLat']) + ', ' + str(row['StartLong'])
locations = geolocator.reverse(pos)
return locations
Call to get_country by passing the dataframe:
df4['country'] = df4.apply(lambda row: get_country(row), axis = 1)
Current output:
StartLat StartLong Address
52.509669 13.376294 Potsdamer Platz, Mitte, Berlin, Deutschland, Europe
Just wondering whether there is some Python library to retrieve the country when we pass the geographic points.
Any help would be appreciated.
In your get_country function, your return value location will have an attribute raw, which is a dict that looks like this:
{
'address': {
'attraction': 'Potsdamer Platz',
'city': 'Berlin',
'city_district': 'Mitte',
'country': 'Deutschland',
'country_code': 'de',
'postcode': '10117',
'road': 'Potsdamer Platz',
'state': 'Berlin'
},
'boundingbox': ['52.5093982', '52.5095982', '13.3764983', '13.3766983'],
'display_name': 'Potsdamer Platz, Mitte, Berlin, 10117, Deutschland',
... and so one ...
}
so location.raw['address']['country'] gives 'Deutschland'
If I read your question correctly, a possible solution could be:
def get_country(row):
pos = str(row['StartLat']) + ', ' + str(row['StartLong'])
locations = geolocator.reverse(pos)
return location.raw['address']['country']
EDIT: The format of the location.raw object will differ depending on which geolocator service you are using. My example uses geopy.geocoders.Nominatim, from the example on geopy's documentation site, so your results might differ.
My code,hopefully that helps:
from geopy.geocoders import Nominatim
nm = Nominatim()
place, (lat, lng) = nm.geocode("3995 23rd st, San Francisco,CA 94114")
print('Country' + ": " + place.split()[-1])
I'm not sure what service you're using with geopy, but as a small plug which I'm probably biased towards, this I think could be a simpler solution for you.
https://github.com/Ziptastic/ziptastic-python
from ziptastic import Ziptastic
# Set API key.
api = Ziptastic('<your api key>')
result = api.get_from_coordinates('42.9934', '-84.1595')
Which will return a list of dictionaries like so:
[
{
"city": "Owosso",
"geohash": "dpshsfsytw8k",
"country": "US",
"county": "Shiawassee",
"state": "Michigan",
"state_short": "MI",
"postal_code": "48867",
"latitude": 42.9934,
"longitude": -84.1595,
"timezone": "America/Detroit"
}
]