downloading CSV file in python using pandas - python

I am trying to download a csv file to python. For some reason I can not do it. I suppose I need to add an additional argument to read_csv?
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)

The code you attempt is downloading the content from the url and pasting it in the data frame named 'df'.
You need to save the output csv by using the following line. You will find the output file in the same directory where the python script is saved.
import pandas as pd
url = "https://raw.githubusercontent.com/UofGAnalyticsData/"\
"DPIP/main/assesment_datasets/assessment3/starwars.csv"
df = pd.read_csv(url)
df.to_csv('output.csv')

Related

Issue with pandas dataframe

When I tried to read data from a csv file, I got the result that is in a weird format.
import pandas as pd
df = pd.read_csv('HDMA Boston Housing Data.csv')
df
The file seems to be .txt file, but renamed and read like a .csv file.

How to Read CSV from url in pandas? - error tokenizing data

How can I download this following file in python? I have no issue doing this in R. I believe this issue is the last row in the file which will change. How can I change the code to work?
import pandas as pd
url = "https://ark-funds.com/wp-content/uploads/funds-etf-csv/ARK_INNOVATION_ETF_ARKK_HOLDINGS.csv"
test = pd.read_csv(url)
You should better download the csv file first by using the requests module.
Then you can read the file from the download directory by passing the file path instead of the URL (pd.read_csv(download_path)).

Solution for redirecting link which is present inside the csv file

I want a solution for reading the csv file and redirect the link present inside csv file
I want solution in any programming language.
Firstly csv file should be read which consists of links
Then it should be able to redirect to the links
Using python, we can use the pandas and urllib to get your requirement working.
Example:
import pandas as pd
from urllib.request import urlopen
df = pd.read_csv("<your_filename>", index_col=None)
for index, row in df.iterrows():
urlopen(row["<column_name_containing_links>"])

Reading in a stata .dta file as a python pandas data frame using pd.read_stata()

I want to read in an .dta file as a pandas data frame.
I've tried using code from https://www.fragilefamilieschallenge.org/using-dta-files-in-python/ but it gives me an error.
Thanks for any help!
import pandas as pd
df_path = "https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1"
df = None
with open(df_path, "r") as f:
df = pd.read_stata(f)
print df.head()
open can be used when you have a file saved locally on your machine. With pd.read_stata this is not necessary however, as you can specify the file path directly as a parameter.
In this case you want to read in a .dta file from a url so this does not apply. The solution is simple though, as pd.read_stata can read in files from urls directly.
import pandas as pd
url = 'https://zenodo.org/record/3635384/files/B-PROACT1V%20Year%204%20%26%206%20child%20BP%2C%20BMI%20and%20PA%20dataset.dta?download=1'
df = pd.read_stata(url)

Download multiple CSV files from a list in a single CSV (Python)

I have a 2 column CSV with download links in the first column and company symbols in the second column. For example:
http://data.com/data001.csv, BHP
http://data.com/data001.csv, TSA
I am trying to loop through the list so that Python opens each CSV via the download link and saves it separately as the company name. Therefore each file should be downloaded and saved as follows:
BHP.csv
TSA.csv
Below is the code I am using. It currently exports the entire CSV into a single row tabbed format, then loops back and does it again and again in an infinite loop.
import pandas as pd
data = pd.read_csv('download_links.csv', names=['download', 'symbol'])
file = pd.DataFrame()
cache = []
for d in data.download:
df = pd.read_csv(d,index_col=None, header=0)
cache.append(df)
file = pd.DataFrame(cache)
for s in data.symbol:
file.to_csv(s+'.csv')
print("done")
Up until I convert the list 'cache' into the DataFrame 'file' to export it, the data is formatted perfectly. It's only when it gets converted to a DataFrame when the trouble starts.
I'd love some help on this one as I've been stuck on it for a few hours.
import pandas as pd
data = pd.read_csv('download_links.csv')
links = data.download
file_names = data.symbol
for link, file_name in zip(links,file_names):
file = pd.read_csv(link).to_csv(file_name+'.csv', index=False)
Iterate over both fields in parallel:
for download, symbol in data.itertuples(index=False):
df = pd.read_csv(d,index_col=None, header=0)
df.to_csv('{}.csv'.format(symbol))

Categories

Resources