Cannot export to ".csv" file - pandas.DataFrame

Cannot export to ".csv" file - pandas.DataFrame - python

I would like to seek help with regards to my Google Colaboratory Notebook. The error is located in fourth cell.
Context:
We're performing Web scraping BTC's Historical Data.
Here's my codes:
First cell (Executed successfully)
#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
Second cell (Executed successfully)
#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text, 'lxml')
#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []
#creating a table
for row in table_body.find_all('tr'):
col = row.find_all('td')
col = [ele.text.strip() for ele in col ] # stripping the whitespaces
row_data.append(col) #append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
df
Third cell (Executed successfully)
headers = []
for i in soup.find_all('th'):
col_name = i.text.strip().lower().replace(" ", "_")
headers.append(col_name)
headers
Fourth cell (Execution failed)
df = pd.DataFrame(row_data, columns=headers)
df
#into a file
df.to_csv('/content/file.csv')
The error! :(
AssertionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
563 try:
--> 564 columns = _validate_or_indexify_columns(content, columns)
565 result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
AssertionError: 13 columns passed, passed data had 7 columns
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
565 result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
566 except AssertionError as e:
--> 567 raise ValueError(e) from e
568 return result, columns
569
ValueError: 13 columns passed, passed data had 7 columns

To load the table you can use simple pd.read_html(). For example:
import pandas as pd
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
df = pd.read_html(url)[0]
print(df)
df.to_csv("data.csv")
Creates data.csv (screenshot from LibreOffice):
To correct your example:
# importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
# sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
# request the page
page = requests.get(url)
# creating a soup object and the parser
soup = BeautifulSoup(page.text, "lxml")
# creating a table body to pass on the soup to find the table
table_body = soup.find("table")
# creating an empty list to store information
row_data = []
# creating a table
for row in table_body.select("tr:has(td)"):
col = row.find_all("td")
col = [ele.text.strip() for ele in col] # stripping the whitespaces
row_data.append(col) # append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
headers = []
for i in table_body.select("th"):
col_name = i.text.strip().lower().replace(" ", "_")
headers.append(col_name)
df = pd.DataFrame(row_data, columns=headers)
print(df)
df.to_csv("/content/file.csv")

import pandas as pd
df = pd.read_json(
'https://www.bitrates.com/api/node/v1/symbols/USDTUSD/bitrates/series?aggregate=3&period=lastMonth').T['series'].to_dict()['data']
print(pd.DataFrame(df))
Output:
date open close ... supply market_volume24 btc_ratio
0 2021-04-11T06:00:00.000Z 0.999212 0.999114 ... 4.584629e+10 3.146109e+08 0.000016
1 2021-04-12T00:00:00.000Z 0.999114 0.999317 ... 4.584629e+10 2.100706e+09 0.000016
2 2021-06-04T18:00:00.000Z 0.999317 1.000613 ... 6.447629e+10 7.298208e+08 0.000025
3 2021-06-05T12:00:00.000Z 1.000613 1.000328 ... 0.000000e+00 6.502947e+09 0.000025
4 2021-06-06T06:00:00.000Z 1.000328 1.000499 ... 6.447629e+10 6.649574e+08 0.000025
5 2021-06-07T00:00:00.000Z 1.000499 1.000408 ... 6.447629e+10 8.272473e+09 0.000025
6 2021-06-07T18:00:00.000Z 1.000408 1.000338 ... 6.447629e+10 1.090599e+09 0.000025
7 2021-06-08T12:00:00.000Z 1.000338 1.000840 ... 6.447177e+10 2.196249e+09 0.000028
8 2021-06-09T06:00:00.000Z 1.000840 1.001088 ... 0.000000e+00 1.080053e+10 0.000028
9 2021-06-10T00:00:00.000Z 1.001088 1.000618 ... 6.447177e+10 4.158914e+09 0.000026
10 2021-06-10T18:00:00.000Z 1.000618 1.000436 ... 6.447177e+10 6.713012e+08 0.000026
11 2021-06-11T12:00:00.000Z 1.000436 1.000234 ... 6.447177e+10 4.093096e+09 0.000025
12 2021-06-12T06:00:00.000Z 1.000234 1.000385 ... 6.447177e+10 5.042653e+09 0.000026
13 2021-06-13T00:00:00.000Z 1.000385 1.000302 ... 0.000000e+00 5.502808e+09 0.000026
14 2021-06-13T18:00:00.000Z 1.000302 1.000110 ... 6.447177e+10 1.008952e+10 0.000024
15 2021-06-14T12:00:00.000Z 1.000110 1.000309 ... 6.447177e+10 7.405940e+09 0.000024
16 2021-06-15T06:00:00.000Z 1.000309 1.000205 ... 6.447177e+10 4.256491e+09 0.000023
17 2021-06-16T00:00:00.000Z 1.000205 1.000104 ... 0.000000e+00 1.495518e+09 0.000023
18 2021-06-16T18:00:00.000Z 1.000104 0.999833 ... 0.000000e+00 3.033091e+09 0.000024
19 2021-06-17T12:00:00.000Z 0.999833 1.000016 ... 6.447177e+10 1.449031e+08 0.000024
20 2021-07-10T00:00:00.000Z 1.000016 1.000100 ... 6.446977e+10 7.586923e+08 0.000025
21 2021-07-10T18:00:00.000Z 1.000100 1.000199 ... 6.446977e+10 2.312489e+09 0.000025
22 2021-07-11T12:00:00.000Z 1.000199 1.000134 ... 6.446977e+10 2.236517e+09 0.000024
23 2021-07-12T06:00:00.000Z 1.000134 1.000192 ... 6.446977e+10 8.140557e+09 0.000024
24 2021-07-13T00:00:00.000Z 1.000192 1.000290 ... 6.446977e+10 3.846952e+09 0.000026
25 2021-07-13T18:00:00.000Z 1.000290 1.000411 ... 6.446977e+10 1.278604e+09 0.000026
26 2021-07-14T12:00:00.000Z 1.000411 1.000315 ... 6.446977e+10 3.279535e+09 0.000026
27 2021-07-15T06:00:00.000Z 1.000315 1.000142 ... 6.446977e+10 8.086642e+08 0.000026
28 2021-07-16T00:00:00.000Z 1.000142 1.000295 ... 6.446977e+10 1.187211e+09 0.000027
29 2021-07-16T18:00:00.000Z 1.000295 1.000610 ... 6.446977e+10 7.721854e+08 0.000027
30 2021-07-17T12:00:00.000Z 1.000610 1.000535 ... 6.446977e+10 4.535049e+09 0.000027
31 2021-07-18T06:00:00.000Z 1.000535 1.000610 ... 6.446977e+10 2.345491e+09 0.000026
32 2021-07-19T00:00:00.000Z 1.000610 1.000386 ... 6.446977e+10 4.725531e+09 0.000027
33 2021-07-19T18:00:00.000Z 1.000386 1.000215 ... 6.446977e+10 3.314499e+09 0.000028
34 2021-07-20T12:00:00.000Z 1.000215 1.000324 ... 6.446977e+10 5.315525e+09 0.000030
35 2021-07-21T06:00:00.000Z 1.000324 1.000277 ... 6.446977e+10 7.141479e+09 0.000028
36 2021-07-22T00:00:00.000Z 1.000277 1.000255 ... 6.446977e+10 2.533840e+09 0.000028
37 2021-07-22T18:00:00.000Z 1.000255 1.000325 ... 6.446977e+10 2.699050e+09 0.000027
38 2021-07-23T12:00:00.000Z 1.000325 1.000363 ... 6.446977e+10 2.681340e+09 0.000026
39 2021-07-24T06:00:00.000Z 1.000363 1.000644 ... 6.446974e+10 6.241232e+08 0.000026
[40 rows x 10 columns]

Related

Python BeautifulSoup html parser not working

Here i'm trying to read the page and create a csv with columns respectively. But i'm unable to read the parsed data to use find function. The soup data doesn't have the data present in webpage
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.fancraze.com/marketplace/sales/mornemorkel1?tab=latest-sales"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

Site use API to get data, so you can handle it
import pandas as pd
import requests
url = 'https://api.faze.app/v1/latestSalesInAGroup/mornemorkel1'
result = []
response = requests.get(url=url)
for data in response.json()['data']:
data = {
'id': data['momentId']['id'],
'seller': data['sellerAddress']['userName'],
'buyer': data['buyerAddress']['userName'],
'price': data['price'],
'created': data['createdAt']
}
result.append(data)
df = pd.DataFrame(result)
print(df)
OUTPUT:
id seller ... price created
0 1882 singal22 ... 8 2022-06-22T14:34:39.403Z
1 1737 olive_creepy2343 ... 7 2022-06-22T14:09:32.070Z
2 1256 tomato_wicked3294 ... 10 2022-06-22T13:49:20.895Z
3 1931 aquamarine_productive9244 ... 6 2022-06-22T13:41:49.153Z
4 1603 aquamarine_productive9244 ... 9 2022-06-22T13:28:01.624Z
.. ... ... ... ... ...
95 1026 olive_creepy2343 ... 7 2022-04-16T18:00:00.662Z
96 1719 Hhassan136 ... 5 2022-04-14T23:14:12.037Z
97 2054 Cricket101 ... 5 2022-04-14T21:30:13.185Z
98 1961 emzeden_9 ... 6 2022-04-14T18:02:05.194Z
99 1194 amaranth_curious1871 ... 5 2022-04-14T17:45:25.266Z

When using findAll with BeautifulSoup it returns an empty list

I'm practicing some web scraping and for this project I'm scraping this website: https://assetdash.com/?all=true
I'm getting and parsing the HTML code as following.
my_url = 'https://assetdash.com/?all=true'
client = urlopen(my_url)
page_html = client.read()
client.close()
soup = BeautifulSoup(page_html, 'html.parser')
rows = soup.findAll("tr", {"class":"Table__Tr-sc-1pfmqa-5 gNrtPb"})
print(len(rows))
This returns a length of 0 whereas it should be returning a much higher value. Have I done something wrong with the parsing or am I retrieving the rows incorrectly?

It dynamic and javascript rendered. Go straight to the source of the data.
Code:
import requests
my_url = 'https://assetdash.herokuapp.com/assets?currentPage=1&perPage=200&typesOfAssets[]=Stock&typesOfAssets[]=ETF&typesOfAssets[]=Cryptocurrency'
data = requests.get(my_url).json()
df = pd.DataFrame(data['data'])
Output:
print (df)
id ticker ... peRatio rank
0 60 AAPL ... 35.17 1
1 2287 MSFT ... 34.18 2
2 251 AMZN ... 91.52 3
3 1527 GOOGL ... 33.79 4
4 1276 FB ... 31.09 5
.. ... ... ... ... ...
195 537 BMWYY ... 15.06 196
196 3756 WBK ... 35.57 197
197 1010 DG ... 23.40 198
198 1711 HUM ... 12.77 199
199 1194 EQNR ... -15.82 200
[200 rows x 13 columns]

Multiple table header <thead> in table <table> and how to scrape data from <thead> as a table row

I'm trying to scrape data from a website but the table has two sets of data, first, 2-3 lines of data are in thead and rest in tbody. I can easily extract data only from one at a time when I try both I got some error like TypeError, AttributeError. btw I'm using python
here is the code
import requests
from bs4 import BeautifulSoup
import pandas as pd
url="https://www.worldometers.info/world-population/"
r=requests.get(url)
print(r)
html=r.text
soup=BeautifulSoup(html,'html.parser')
print(soup.title.text)
print()
print()
live_data=soup.find_all('div',id='maincounter-wrap')
print(live_data)
for i in live_data:
print(i.text)
table_body=soup.find('thead')
table_rows=table_body.find_all('tr')
table_body_2=soup.find('tbody')
table_rows_2=soup.find_all('tr')
year_july1=[]
population=[]
yearly_change_in_perchantage=[]
yearly_change=[]
median_age=[]
fertillity_rate=[]
density=[]#density (p\km**)
urban_population_in_perchantage=[]
urban_population=[]
for tr in table_rows:
td=tr.find_all('td')
year_july1.append(td[0].text)
population.append(td[1].text)
yearly_change_in_perchantage.append(td[2].text)
yearly_change.append(td[3].text)
median_age.append(td[4].text)
fertillity_rate.append(td[5].text)
density.append(td[6].text)
urban_population_in_perchantage.append(td[7].text)
urban_population.append(td[8].text)
for tr in table_rows_2:
td=tr.find_all('td')
year_july1.append(td[0].text)
population.append(td[1].text)
yearly_change_in_perchantage.append(td[2].text)
yearly_change.append(td[3].text)
median_age.append(td[4].text)
fertillity_rate.append(td[5].text)
density.append(td[6].text)
urban_population_in_perchantage.append(td[7].text)
urban_population.append(td[8].text)
headers=['year_july1','population','yearly_change_in_perchantage','yearly_change','median_age','fertillity_rate','density','urban_population_in_perchantage','urban_population']
data_2= pd.DataFrame(list(zip(year_july1,population,yearly_change_in_perchantage,yearly_change,median_age,fertillity_rate,density,urban_population_in_perchantage,urban_population)),columns=headers)
print(data_2)
data_2.to_csv("C:\\Users\\data_2.csv")

you can try the below code it generates the required data. Do let me know if you need any clarification:-
import requests
import pandas as pd
url = 'https://www.worldometers.info/world-population/'
html = requests.get(url).content
df_list = pd.read_html(html, header=0)
df = df_list[0]
#print(df)
df.to_csv("data.csv", index=False)
gives me below output
print(df)
Year (July 1) Population ... Urban Pop % Urban Population
0 2020 7794798739 ... 56.2 % 4378993944
1 2019 7713468100 ... 55.7 % 4299438618
2 2018 7631091040 ... 55.3 % 4219817318
3 2017 7547858925 ... 54.9 % 4140188594
4 2016 7464022049 ... 54.4 % 4060652683
5 2015 7379797139 ... 54.0 % 3981497663
6 2010 6956823603 ... 51.7 % 3594868146
7 2005 6541907027 ... 49.2 % 3215905863
8 2000 6143493823 ... 46.7 % 2868307513
9 1995 5744212979 ... 44.8 % 2575505235
10 1990 5327231061 ... 43.0 % 2290228096
11 1985 4870921740 ... 41.2 % 2007939063
12 1980 4458003514 ... 39.3 % 1754201029
13 1975 4079480606 ... 37.7 % 1538624994
14 1970 3700437046 ... 36.6 % 1354215496
15 1965 3339583597 ... N.A. N.A.
16 1960 3034949748 ... 33.7 % 1023845517
17 1955 2773019936 ... N.A. N.A.
[18 rows x 9 columns]

How to create DataFrame from json data - dicts, lists and arrays within an array

I'm not able to get the data but only the headers from json data
Have tried to use json_normalize which creates a DataFrame from json data, but when I try to loop and append data the result is that I only get the headers.
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
import numpy as np
# importing json data
def get_json(file_path):
r = requests.get('https://www.atg.se/services/racinginfo/v1/api/games/V75_2019-09-29_5_6')
jsonResponse = r.json()
with open(file_path, 'w', encoding='utf-8') as outfile:
json.dump(jsonResponse, outfile, ensure_ascii=False, indent=None)
# Run the function and choose where to save the json file
get_json('../trav.json')
# Open the json file and print a list of the keys
with open('../trav.json', 'r') as json_data:
d = json.load(json_data)
print(list(d.keys()))
[Out]:
['#type', 'id', 'status', 'pools', 'races', 'currentVersion']
To get all data for the starts in one race I can use json_normalize function
race_1_starts = json_normalize(d['races'][0]['starts'])
race_1_starts_df = race_1_starts.drop('videos', axis=1)
print(race_1_starts_df)
[Out]:
distance driver.birth ... result.prizeMoney result.startNumber
0 1640 1984 ... 62500 1
1 1640 1976 ... 11000 2
2 1640 1968 ... 500 3
3 1640 1953 ... 250000 4
4 1640 1968 ... 500 5
5 1640 1962 ... 18500 6
6 1640 1961 ... 7000 7
7 1640 1989 ... 31500 8
8 1640 1960 ... 500 9
9 1640 1954 ... 500 10
10 1640 1977 ... 125000 11
11 1640 1977 ... 500 12
Above we get a DataFrame with data on all starts from one race. However, when I try to loop through all races in range in order to get data on all starts for all races, then I only get the headers from each race and not the data on starts for each race:
all_starts = []
for t in range(len(d['races'])):
all_starts.append([t+1, json_normalize(d['races'][t]['starts'])])
all_starts_df = pd.DataFrame(all_starts, columns = ['race', 'starts'])
print(all_starts_df)
[Out]:
race starts
0 1 distance ... ...
1 2 distance ... ...
2 3 distance ... ...
3 4 distance ... ...
4 5 distance ... ...
5 6 distance ... ...
6 7 distance ... ...
In output I want a DataFrame that is a merge of data on all starts from all races. Note that the number of columns can differ depending on which race, but that I expect in case one race has 21 columns and another has 20 columns - then the all_starts_df should contain all columns but in case a race do not have data for one column it should say 'NaN'.
Expected result:
[Out]:
race distance driver.birth ... result.column_20 result.column_22
1 1640 1984 ... 12500 1
1 1640 1976 ... 11000 2
2 2140 1968 ... NaN 1
2 2140 1953 ... NaN 2
3 3360 1968 ... 1500 NaN
3 3360 1953 ... 250000 NaN

If you want all columns you can try this.. (I find a lot more than 20 columns so I might have something wrong.)
all_starts = []
headers = []
for idx, race in enumerate(d['races']):
df = json_normalize(race['starts'])
df['race'] = idx
all_starts.append(df.drop('videos', axis=1))
headers.append(set(df.columns))
# Create set of all columns for all races
columns = set.union(*headers)
# If columns are missing from one dataframe add it (as np.nan)
for df in all_starts:
for c in columns - set(df.columns):
df[c] = np.nan
# Concatenate all dataframes for each race to make one dataframe
df_all_starts = pd.concat(all_starts, axis=0, sort=True)
Alternatively, if you know the names of the columns you want to keep, try this
columns = ['race', 'distance', 'driver.birth', 'result.prizeMoney']
all_starts = []
for idx, race in enumerate(d['races']):
df = json_normalize(race['starts'])
df['race'] = idx
all_starts.append(df[columns])
# Concatenate all dataframes for each race to make one dataframe
df_all_starts = pd.concat(all_starts, axis=0)

Using pd.to_datetime to convert "object" column into %HH:MM:SS

I am doing some exploratory data analysis using finish-time data scraped from the 2018 KONA IRONMAN. I used JSON to format the data and pandas to read into csv. The 'swim','bike','run' columns should be formatted as %HH:MM:SS to be operable, however, I am receiving a ValueError: ('Unknown string format:', '--:--:--').
print(data.head(2))
print(kona.info())
print(kona.describe())
Name div_rank ... bike run
0 Avila, Anthony 2470 138 ... 05:27:59 04:31:56
1 Lindgren, Mikael 1050 151 ... 05:17:51 03:49:20
swim 2472 non-null object
bike 2472 non-null object
run 2472 non-null object
Name div_rank ... bike run
count 2472 2472 ... 2472 2472
unique 2472 288 ... 2030 2051
top Jara, Vicente 986 -- ... --:--:-- --:--:--
freq 1 165 ... 122 165
How should I use pd.to_datetime to properly format the 'bike','swim','run' column and for future use sum these columns and append a 'Total Finish Time' column? Thanks!

The reason the error is because it can't pull the time from '--:--:--'. So you'd need to convert all those to '00:00:00', but then that implies they did the event in 0 time. The other option is to just convert the times that are present, leaving a null in the places that don't have a time. This will also include a date of 1900-01-01, when you convert to datetime. I put the .dt.time so only time will display.
timed_events = ['bike', 'swim', 'run']
for event in timed_events:
result[event] = pd.to_datetime(result[result[event] != '--:--:--'][event], format="%H:%M:%S").dt.time
The problem with this though is I remember seeing you wanted to sum those times, which would require you to do some extra conversions. So I'm suggesting to use .to_timedelta() instead. It'll work the same way, as you'd need to not include the --:--:--. But then you can sum those times. I also added a column of number of event completed, so that if you want to sort by best times, you can filter out anyone who hasn't competed in all three events, as obviously they'd have better times because they are missing entire events:
I'll also add, regarding the comment of:
"You think providing all the code will be helpful but it does not. You
will get a quicker and more useful response if you keep the code
minimum that can replicate your issue.stackoverflow.com/help/mcve –
mad_ "
I'll give him the benefit of the doubt as seeing the whole code and not realizing that the code you provided was the minimal code to replicate your issue, as no one wants to code a way to generate your data to work with. Sometimes you can explicitly state that in your question.
ie:
Here's the code to generate my data:
CODE PART 1
import bs4
import pandas as pd
code...
But now that I have the data, here's where I'm having trouble:
df = pd.to_timedelta()...
...
Luckily I remembered helping you earlier on this so knew I could go back and get that code. So the code you originally had was fine.
But here's the full code I used, which is a different way of storing the csv than you orginially had. So you can change that part, but the end part is what you'll need:
from bs4 import BeautifulSoup, Comment
from collections import defaultdict
import requests
import pandas as pd
sauce = 'http://m.ironman.com/triathlon/events/americas/ironman/world-championship/results.aspx'
r = requests.get(sauce)
data = r.text
soup = BeautifulSoup(data, 'html.parser')
def parse_table(soup):
result = defaultdict(list)
my_table = soup.find('tbody')
for node in my_table.children:
if isinstance(node, Comment):
# Get content and strip comment "<!--" and "-->"
# Wrap the rows in "table" tags as well.
data = '<table>{}</table>'.format(node[4:-3])
break
table = BeautifulSoup(data, 'html.parser')
for row in table.find_all('tr'):
name, _, swim, bike, run, div_rank, gender_rank, overall_rank = [col.text.strip() for col in row.find_all('td')[1:]]
result[name].append({
'div_rank': div_rank,
'gender_rank': gender_rank,
'overall_rank': overall_rank,
'swim': swim,
'bike': bike,
'run': run,
})
return result
jsonObj = parse_table(soup)
result = pd.DataFrame()
for k, v in jsonObj.items():
temp_df = pd.DataFrame.from_dict(v)
temp_df['name'] = k
result = result.append(temp_df)
result = result.reset_index(drop=True)
result.to_csv('C:/data.csv', index=False)
# However you read in your csv/dataframe, use the code below on it to get those times
timed_events = ['bike', 'swim', 'run']
for event in timed_events:
result[event] = pd.to_timedelta(result[result[event] != '--:--:--'][event])
result['total_events_participated'] = 3 - result.isnull().sum(axis=1)
result['total_times'] = result[timed_events].sum(axis=1)
Output:
print (result)
bike div_rank ... total_events_participated total_times
0 05:27:59 138 ... 3 11:20:06
1 05:17:51 151 ... 3 10:16:17
2 06:14:45 229 ... 3 14:48:28
3 05:13:56 162 ... 3 10:19:03
4 05:19:10 6 ... 3 09:51:48
5 04:32:26 25 ... 3 08:23:26
6 04:49:08 155 ... 3 10:16:16
7 04:50:10 216 ... 3 10:55:47
8 06:45:57 71 ... 3 13:50:28
9 05:24:33 178 ... 3 10:21:35
10 06:36:36 17 ... 3 14:36:59
11 NaT -- ... 0 00:00:00
12 04:55:29 100 ... 3 09:28:53
13 05:39:18 72 ... 3 11:44:40
14 04:40:41 -- ... 2 05:35:18
15 05:23:18 45 ... 3 10:55:27
16 05:15:10 3 ... 3 10:28:37
17 06:15:59 78 ... 3 11:47:24
18 NaT -- ... 0 00:00:00
19 07:11:19 69 ... 3 15:39:51
20 05:49:02 29 ... 3 10:32:36
21 06:45:48 4 ... 3 13:39:17
22 04:39:46 -- ... 2 05:48:38
23 06:03:01 3 ... 3 11:57:42
24 06:24:58 193 ... 3 13:52:57
25 05:07:42 116 ... 3 10:01:24
26 04:44:46 112 ... 3 09:29:22
27 04:46:06 55 ... 3 09:32:43
28 04:41:05 69 ... 3 09:31:32
29 05:27:55 68 ... 3 11:09:37
... ... ... ... ...
2442 NaT -- ... 0 00:00:00
2443 05:26:40 3 ... 3 11:28:53
2444 05:04:37 19 ... 3 10:27:13
2445 04:50:45 74 ... 3 09:15:14
2446 07:17:40 120 ... 3 14:46:05
2447 05:26:32 45 ... 3 10:50:48
2448 05:11:26 186 ... 3 10:26:00
2449 06:54:15 185 ... 3 14:05:16
2450 05:12:10 22 ... 3 11:21:37
2451 04:59:44 45 ... 3 09:29:43
2452 06:03:59 96 ... 3 12:12:35
2453 06:07:27 16 ... 3 12:47:11
2454 04:38:06 91 ... 3 09:52:27
2455 04:41:56 14 ... 3 08:58:46
2456 04:38:48 85 ... 3 09:18:31
2457 04:42:30 42 ... 3 09:07:29
2458 04:40:54 110 ... 3 09:32:34
2459 06:08:59 37 ... 3 12:15:23
2460 04:32:20 -- ... 2 05:31:05
2461 04:45:03 96 ... 3 09:30:06
2462 06:14:29 95 ... 3 13:38:54
2463 06:00:20 164 ... 3 12:10:03
2464 05:11:07 22 ... 3 10:32:35
2465 05:56:06 188 ... 3 13:32:48
2466 05:09:26 2 ... 3 09:54:55
2467 05:22:15 7 ... 3 10:26:14
2468 05:53:14 254 ... 3 12:34:21
2469 05:00:29 156 ... 3 10:18:29
2470 04:30:46 7 ... 3 08:38:23
2471 04:34:59 39 ... 3 09:04:13
[2472 rows x 9 columns]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot export to ".csv" file - pandas.DataFrame - python

Related

Python BeautifulSoup html parser not working

When using findAll with BeautifulSoup it returns an empty list

Multiple table header <thead> in table <table> and how to scrape data from <thead> as a table row

How to create DataFrame from json data - dicts, lists and arrays within an array

Using pd.to_datetime to convert "object" column into %HH:MM:SS

Categories

Resources