Read from website csv file with variable name - python

I am using Jupyter and I would like to read csv file from a web site.
The problem I'm facing is that this file changes the name according the time of . For example, if now is 11/21/2019, 02:45:33, than the name will be "Visao_329465_11212019_024533.csv".
So I can't use just this
import pandas as pd
url="https://anythint.csv"
c=pd.read_csv(url)
Returns the error: ParserError: Error tokenizing data. C error: Expected 1 fields in line 31, saw 2
Any ideia?

Try:
import pandas as pd
url="https://anythint.csv"
c=pd.read_csv(url, error_bad_lines=False)

Related

How to import the json file in python

I am trying to import a json file to python and then export is to an excel file using the following code:
import pandas as pd
df = pd.read_json('pub_settings.json')
df.to_excel('pub_settings.xlsx')
but i am getting the following error:
can anyone please tell me what i am doing wrong?
First import json file as a dictionary using following code:-
import json
with open("") as f:
data = json.load(f)
Then you can use following link to convert it to xlsx:-
https://pypi.org/project/tablib/0.9.3/

Reading in a .cat file that is a talbe into Pandas

I am trying to read in numerous ".cat" (catalog) files into panda tables so I can manipulate the data easier. I am not familiar with ".cat" files, but in this case each file looks like a text file table with columns and data. I tried using pd.read_csv(filename) since I figured it was space separated, not comma separated but similiar.
clustname = ["SpARCS-0035", "SpARCS-0219", "SpARCS-0335", "SpARCS-1034", "SpARCS-1051", "SpARCS-1616",\
"SpARCS-1634", "SpARCS-1638", "SPTCL-0205", "SPTCL-0546", "SPTCL-2106"]
for iclust in range(len(clustname)):
rfcolpath = restframe + 'RESTFRAME_MASTER_' + clustname[iclust] + '_indivredshifts.cat'
rfcol_table[iclust] = pd.read_csv(rfcolpath[iclust], engine='python')
rfcol_table.head()
And I got "ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'." as an error. So I tried adding in "engine = 'python'" to the read_csv command and got this error: "IsADirectoryError: [Errno 21] Is a directory: '/'". I don't know what this means or how to fix it, how should I read in each individual file? Thanks for any help!
Try this
import subprocess
import pandas as pd
df = pd.DataFrame()
task = subprocess.Popen(["cat file.txt"], stdout=subprocess.PIPE)
for line in task.stdout:
df=pd.concat([df, line])
task.wait()

How to parse_dates for the Imported CSV File in google colab ?? the CSV file is imported from the Local drive

In Google Colab I have imported the CSV file from local drive, using the below code :
from google.colab import files
uploaded = files.upload()
then to read the CSV file to parse_date I have the below code :
import pandas as pd
import io
df = pd.read_csv(io.StringIO(uploaded['Auto.csv'], parse_dates = ['Date'],date_parser=parse))
print(df)
it show's the error message as below :
TypeError: StringIO() takes at most 2 arguments (3 given)
But when importing file from github it works good for example shown below :
df = pd.read_csv('https://raw.githubusercontent.com/master/dataset/electricity_consumption.csv', parse_dates = ['Bill_Date'],date_parser=parse) #this code works good from github
so I want to parse_dates for the csv file imported from the Local drive ??? Kindly help me on this???
Data set looks like this :

How to load a json file in jupyter notebook using pandas?

I am trying to load a json file in my jupyter notebook
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as plt
import json
%matplotlib inline
with open("pud.json") as datafile:
data = json.load(datafile)
dataframe = pd.DataFrame(data)
I am getting the following error
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Please help
If you want to load a json file use pandas.read_json.
pandas.read_json("pud.json")
This will load the json as a dataframe.
The function usage is as shown below
pandas.read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None, compression='infer')
You can get more information about the parameters here
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
Another way using json!
import pandas as pd
import json
with open('File_location.json') as f:
data = json.load(f)
df=pd.DataFrame(data)
with open('pud.json', 'r') as file:
variable_name = json.load(file)
The json file will be loaded as python dictionary.
This code you are writing here is completely okay . The problem is the .json file that you are loading is not a JSON file. Kindly check that file.

How to download a Excel file from behind a paywall into a pandas dataframe?

I have this website that requires log in to access data.
import pandas as pd
import requests
r = requests.get(my_url, cookies=my_cookies) # my_cookies are imported from a selenium session.
df = pd.io.excel.read_excel(r.content, sheetname=0)
Reponse:
IOError: [Errno 2] No such file or directory: 'Ticker\tAction\tName\tShares\tPrice\...
Apparently, the str is processed as a filename. Is there a way to process it as a file? Alternatively can we pass cookies to pd.get_html?
EDIT: After further processing we can now see that this is actually a csv file. The content of the downloaded file is:
In [201]: r.content
Out [201]: 'Ticker\tAction\tName\tShares\tPrice\tCommission\tAmount\tTarget Weight\nBRSS\tSELL\tGlobal Brass and Copper Holdings Inc\t400.0\t17.85\t-1.00\t7,140\t0.00\nCOHU\tSELL\tCohu Inc\t700.0\t12.79\t-1.00\t8,953\t0.00\nUNTD\tBUY\tUnited Online Inc\t560.0\t15.15\t-1.00\t-8,484\t0.00\nFLXS\tBUY\tFlexsteel Industries Inc\t210.0\t40.31\t-1.00\t-8,465\t0.00\nUPRO\tCOVER\tProShares UltraPro S&P500\t17.0\t71.02\t-0.00\t-1,207\t0.00\n'
Notice that it is tab delimited. Still, trying:
# csv version 1
df = pd.read_csv(r.content)
# Returns error, file does not exist. Apparently read_csv() is also trying to read it as a file.
# csv version 2
fh = io.BytesIO(r.content)
df = pd.read_csv(fh) # ValueError: No columns to parse from file.
# csv version 3
s = StringIO(r.content)
df = pd.read_csv(s)
# No error, but the resulting df is not parsed properly; \t's show up in the text of the dataframe.
Simply wrap the file contents in a BytesIO:
with io.BytesIO(r.content) as fh:
df = pd.io.excel.read_excel(fh, sheetname=0)
This functionality was included in an update from 2014. According to the documentation it is as simple as providing the url:
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file://localhost/path/to/workbook.xlsx
Based on the code you've provided, it looks like you are using pandas 0.13.x? If you can upgrade to a newer version (code below is tested with 0.16.x) you can get this to work without the additional utilization of the requests library. This was added in 0.14.1
data2 = pd.read_excel(data_url)
As an example of a full script (with the example XLS document taken from the original bug report stating the read_excel didn't accept a URL):
import pandas as pd
data_url = "http://www.eia.gov/dnav/pet/xls/PET_PRI_ALLMG_A_EPM0_PTC_DPGAL_M.xls"
data = pd.read_excel(data_url, "Data 1", skiprows=2)

Categories

Resources