Python 2.7 csv download from URL - python

I'm trying to do some basic analsys on ether historical prices for a school project. My problem is quite a simple one I think. I made a function that download the data from the URL, but the format is wrong. I got a dataframe thats size is (0,~14k). So I download the data, but I'm not sure how should I format it into a form that I can use.
I see 2 possibilities, I format the dataframe after download, which I will try to do. Or I download it in the correct format first, which would be the better and more elegant solution.
My problem that I don't know how to do the 2. and I may not succeed on the 1. thats why I make this post.
def get_stock_price_csv_from_poloniex():
import requests
from pandas import DataFrame
from io import StringIO
url = 'https://poloniex.com/public?command=returnChartData&currencyPair=USDT_ETH&start=1435699200&end=9999999999&period=14400'
csv = requests.get(url)
if csv.ok:
return DataFrame.from_csv(StringIO(csv.text), sep=',')
else:
return None

The source data is not CSV, it's json. Luckily pandas provides facilities for working with it as well.
import requests
from pandas.io.json import json_normalize
url = 'https://poloniex.com/public?command=returnChartData&currencyPair=USDT_ETH&start=1435699200&end=9999999999&period=14400'
resp = requests.get(url)
data_frame = json_normalize(resp.json())

Related

How to get json file into panda dataframe python

How can I get this json file in a python dataframe? https://data.cdc.gov/resource/8xkx-amqh.json
I tried to read the data using socrata and was working. However it has a limit and I need the whole data.
That's what I have:
client = Socrata("data.cdc.gov", app_token=None)
# First 5000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
vcounty = client.get_all("8xkx-amqh", limit=5000)
# Convert to pandas DataFrame
vcounty_df = pd.DataFrame.from_records(vcounty)
But I want the whole data and for what I understand Socrata has a limit which is less than what I need.
API is limited for unauthorized users but you can download all data in csv format and convert them to dataframe. There are 1.5+ millions rows.
# pip install requests
# pip install pandas
import requests
import pandas as pd
import io
urlData = requests.get('https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD').content
df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
df
Returns

Solution for redirecting link which is present inside the csv file

I want a solution for reading the csv file and redirect the link present inside csv file
I want solution in any programming language.
Firstly csv file should be read which consists of links
Then it should be able to redirect to the links
Using python, we can use the pandas and urllib to get your requirement working.
Example:
import pandas as pd
from urllib.request import urlopen
df = pd.read_csv("<your_filename>", index_col=None)
for index, row in df.iterrows():
urlopen(row["<column_name_containing_links>"])

How to extract and save a table shown in a specific tab from a website using pandas and python?

I want to extract this table http://pfam.xfam.org/family/PF00018#tabview=tab9 using python and pandas to dump into a csv file. I have tried:
import requests
import pandas as pd
url = 'http://pfam.xfam.org/family/PF00018#tabview=tab9'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
Using all indexes available for df_list. However, the table of interest is not present.
It seems the table you wanted loaded by Javascript. Open browser's developer tool and you see they load via ajax request at http://pfam.xfam.org/family/PF00018/mapping
Building off of #hunzter's answer, here's some code to load a table from that page:
import pandas as pd
tables = pd.read_html("http://pfam.xfam.org/family/PF00018/mapping")
print(tables[0])

Converting HTML table to CSV file using python

I am very new to pandas, so I wanted to convert this HTML table to CSV file with the pandas however my CSV file is giving me a weird sign and it didn't manage to covert all the table over to the CSV.
Here's my code. I read about using beautifulsoup but I'm not too sure how to use the function.
import as pandas
df = pd.read_html('https://aim-sg.caas.gov.sg/aip/2020-10-13/final/2020-09-10-Non-AIR'
'AC/html/eAIP/ENR-3.1-en-GB.html?s=B2EE1C5E1D2A684224A194E69D18338A560504FC#ENR-3.1')
df[0].to_csv('ENR3.0.csv')
Thank you!
Edited: I have changed my import to import pandas as dp but i still did not manage to convert all the HTML table to CSV file.
Greatly appreciate all your help!
You can use pandas itself to do this. You have messed up with the import statement. Here is how you do it correctly:
import pandas as pd
df = pd.read_html('https://aim-sg.caas.gov.sg/aip/2020-10-13/final/2020-09-10-Non-AIR'
'AC/html/eAIP/ENR-3.1-en-GB.html?s=B2EE1C5E1D2A684224A194E69D18338A560504FC#ENR-3.1')
df[0].to_csv('ENR3.0.csv', index = False)
If you want to get all the dataframes present within the variable df, then replace the last line with this:
for x in range(len(df)):
df[x].to_csv(f"CSV_File_{x+1}", index = False)
There is issue in import statement
It should be import pandas as pd and not import as pandas, as your are using alias pd in the code below.
Study about beautiful soup and use lxml parser to parse required data ( it is very fast ).
This link might help you out:
BeautifulSoup different parsers
If any other help is required, then do leave a comment on this post and will try to sort our your issue :)
Made correction in your code:
import pandas as pd
df = pd.read_html('https://aim-sg.caas.gov.sg/aip/2020-10-13/final/2020-09-10-Non-AIR'
'AC/html/eAIP/ENR-3.1-en-GB.html?s=B2EE1C5E1D2A684224A194E69D18338A560504FC#ENR-3.1')
df[0].to_csv('ENR3.0.csv')

How to print json data from url to excel?

import urllib
import json
import re
import csv
from bs4 import BeautifulSoup
game_code = open("/Users//Desktop/PYTHON/gc.txt").read()
game_code = game_code.split("\r")
for gc in game_code:
htmltext =urllib.urlopen("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id="+gc+"&lang_code=en&fmt=json&tab=pxpverbose")
soup= BeautifulSoup(htmltext, "html.parser")
j= json.loads(soup.text)
summary = ['GC'],['Pxpverbose']
for event in summary:
print gc, ["event"]
I can not seem to access the lib to print the proper headers and row. I ultimately want to export specific rows to csv. I downloaded python 2 days ago, so i am very new. I needed this one data set for a project. Any advice or direction would be greatly appreciated.
Here are a few game codes if anyone wanted to take a look. Thanks
21127,20788,20922,20752,21094,21196,21295,21159,21128,20854,21057
Here are a few thoughts:
I'd like to point out the excellent requests as an alternative to urllib for all your HTTP needs in Python (you may need to pip install requests).
requests comes with a built-in json decoder (you don't need BeautifulSoup).
In fact, you have already imported a great module (csv) to print headers and rows of data. You can also use this module to write the data to a file.
Your data is returned as a dictionary (dict) in Python, a data structure indexed by keys. You can access the values (I think this is what you mean by "specific rows") in your data with these keys.
One of many possible ways to accomplish what you want:
import requests
import csv
game_code = open("/Users//Desktop/PYTHON/gc.txt").read()
game_code = game_code.split("\r")
for gc in game_code:
r = requests.get("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id="+gc+"&lang_code=en&fmt=json&tab=pxpverbose")
data = r.json()
with open("my_data.csv", "a") as csvfile:
wr = csv.writer(csvfile,delimiter=',')
for summary in data["GC"]["Pxpverbose"]:
wr.writerow([gc,summary["event"]])
# add keys to write additional values;
# e.g. summary["some-key"]. Example:
# wr.writerow([gc,summary["event"],summary["id"]])
You don't need beautiful soup for this; the data can be read directly from the URL into JSON format.
import urllib, json
response = urllib.urlopen("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id=" + gc +"&lang_code=en&fmt=json&tab=pxpverbose")
data = json.loads(response.read())
At this point, data is the parsed JSON of your web page.
Excel can read csv files, so easiest route would be exporting the data you want into a CSV file using this library.
This should be enough to get you started. Modify fieldnames to include specific event details in the columns of the csv file.
import csv
with open('my_games.csv', 'w') as csvfile:
fieldnames = ['event', 'id']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames,
extrasaction='ignore')
writer.writeheader()
for event in data['GC']['Pxpverbose']:
writer.writerow(event)

Categories

Resources