Reading in a .cat file that is a talbe into Pandas - python

I am trying to read in numerous ".cat" (catalog) files into panda tables so I can manipulate the data easier. I am not familiar with ".cat" files, but in this case each file looks like a text file table with columns and data. I tried using pd.read_csv(filename) since I figured it was space separated, not comma separated but similiar.
clustname = ["SpARCS-0035", "SpARCS-0219", "SpARCS-0335", "SpARCS-1034", "SpARCS-1051", "SpARCS-1616",\
"SpARCS-1634", "SpARCS-1638", "SPTCL-0205", "SPTCL-0546", "SPTCL-2106"]
for iclust in range(len(clustname)):
rfcolpath = restframe + 'RESTFRAME_MASTER_' + clustname[iclust] + '_indivredshifts.cat'
rfcol_table[iclust] = pd.read_csv(rfcolpath[iclust], engine='python')
rfcol_table.head()
And I got "ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'." as an error. So I tried adding in "engine = 'python'" to the read_csv command and got this error: "IsADirectoryError: [Errno 21] Is a directory: '/'". I don't know what this means or how to fix it, how should I read in each individual file? Thanks for any help!

Try this
import subprocess
import pandas as pd
df = pd.DataFrame()
task = subprocess.Popen(["cat file.txt"], stdout=subprocess.PIPE)
for line in task.stdout:
df=pd.concat([df, line])
task.wait()

Related

Pandas, Python - Problem with converting xlsx to csv

I found to have problem with conversion of .xlsx file to .csv using pandas library.
Here is the code:
import pandas as pd
# If pandas is not installed: pip install pandas
class Program:
def __init__(self):
# file = input("Insert file name (without extension): ")
file = "Daty"
self.namexlsx = "D:\\" + file + ".xlsx"
self.namecsv = "D:\\" + file + ".csv"
Program.export(self.namexlsx, self.namecsv)
def export(namexlsx, namecsv):
try:
read_file = pd.read_excel(namexlsx, sheet_name='Sheet1', index_col=0)
read_file.to_csv(namecsv, index=False, sep=',')
print("Conversion to .csv file has been successful.")
except FileNotFoundError:
print("File not found, check file name again.")
print("Conversion to .csv file has failed.")
Program()
After running the code the console shows the ValueError: File is not a recognized excel file error
File i have in that directory is "Daty.xlsx". Tried couple of thigns like looking up to documentation and other examples around internet but most had similar code.
Edit&Update
What i intend afterwards is use the created csv file for conversion to .db file. So in the end the line of import will go .xlsx -> .csv -> .db. The idea of such program came as a training, but i cant get past point described above.
You can use like this-
import pandas as pd
data_xls = pd.read_excel('excelfile.xlsx', 'Sheet1', index_col=None)
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
I checked the xlsx itself, and apparently for some reason it was corrupted with columns in initial file being merged into one column. After opening and correcting the cells in the file everything runs smoothly.
Thank you for your time and apologise for inconvenience.

Filtering large CSV files using Python - KeyError: False Exception

hoping for some insight and help.
I am running into an exception whilst trying to filter multiple very large CSV files and output the results into a new csv file(s).
Initially I was getting an "AttributeError: 'list' object has no attribute 'query'
After some research, I understood this was because method "query" is a pandas dataframe method and so I added the line "df = pd.DataFrame(data)" which resolved the error however the program is now throwing a "KeyError: False" exception and I don't fully understand the reason for it nor how to solve for it.
Any insight as how to resolve this would be greatly appreciated !
import pandas as pd
from glob import glob
filenames = glob("mySourceFiles*")
data = [ pd.read_csv(f, dtype = unicode, engine = 'python',
sep = ',', quotechar = '"', error_bad_lines = False, encoding = 'Latin-1') for f in filenames]
df = pd.DataFrame(data)
df.query('"Location " == "LN"').to_csv('myOutputFile.csv')

Read from website csv file with variable name

I am using Jupyter and I would like to read csv file from a web site.
The problem I'm facing is that this file changes the name according the time of . For example, if now is 11/21/2019, 02:45:33, than the name will be "Visao_329465_11212019_024533.csv".
So I can't use just this
import pandas as pd
url="https://anythint.csv"
c=pd.read_csv(url)
Returns the error: ParserError: Error tokenizing data. C error: Expected 1 fields in line 31, saw 2
Any ideia?
Try:
import pandas as pd
url="https://anythint.csv"
c=pd.read_csv(url, error_bad_lines=False)

Unable to save csv file with Pandas

Sorry for the dummy question but I read lots of topics but my code still do not create and save a .csv file.
import pandas as pd
def save_csv(lista):
try:
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
except:
pass
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Path name is 'DayMonthYear01.csv' (20121701.csv).
When I run the code it finishes but no file is saved.
The output of the code is just:
>>>
RESTART: C:\Users\eduhz\AppData\Local\Programs\Python\Python36-32\testeCSV.py
Salvando...
>>>
Does anyone knows what am I missing?
First, as answered by #Abdou I changed the code to provide me what was the error.
import pandas as pd
import time
def save_csv(lista):
try:
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
except Exception as e:
print(e)
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Then I found out it was due to a permission error
[Errno 13] Permission denied:
caused by the fact Notepad (without being opened as Administrator) does not have access to some directories and therefore anything run inside it wouldn't be able to write to those directories.
I tried running Notepad as administrator but it didn't work.
The solution was running the code with the Python IDLE.
Did you import the time module? All i did was add that and it made a 21121701.csv with the 3 entries in one columns in the current working directory.
import pandas as pd
import time
def save_csv(lista):
print("Salvando...")
name_path = time.strftime('%d%m%y') + '01' + '.csv'
df = pd.DataFrame(lista, columns=["column"])
df.to_csv(name_path, index=False)
dados = [-0.9143399074673653, -1.0944355744868517, -1.1022400576621294]
save_csv(dados)
Removing the try/except gives a file permission error if you have a file of the same name already open. You have to close any file you are trying to write (on windows at least).
Per Abdou's comment, if you (or the program) don't have write access to the directory then that would cause a permission error too.

How to download a Excel file from behind a paywall into a pandas dataframe?

I have this website that requires log in to access data.
import pandas as pd
import requests
r = requests.get(my_url, cookies=my_cookies) # my_cookies are imported from a selenium session.
df = pd.io.excel.read_excel(r.content, sheetname=0)
Reponse:
IOError: [Errno 2] No such file or directory: 'Ticker\tAction\tName\tShares\tPrice\...
Apparently, the str is processed as a filename. Is there a way to process it as a file? Alternatively can we pass cookies to pd.get_html?
EDIT: After further processing we can now see that this is actually a csv file. The content of the downloaded file is:
In [201]: r.content
Out [201]: 'Ticker\tAction\tName\tShares\tPrice\tCommission\tAmount\tTarget Weight\nBRSS\tSELL\tGlobal Brass and Copper Holdings Inc\t400.0\t17.85\t-1.00\t7,140\t0.00\nCOHU\tSELL\tCohu Inc\t700.0\t12.79\t-1.00\t8,953\t0.00\nUNTD\tBUY\tUnited Online Inc\t560.0\t15.15\t-1.00\t-8,484\t0.00\nFLXS\tBUY\tFlexsteel Industries Inc\t210.0\t40.31\t-1.00\t-8,465\t0.00\nUPRO\tCOVER\tProShares UltraPro S&P500\t17.0\t71.02\t-0.00\t-1,207\t0.00\n'
Notice that it is tab delimited. Still, trying:
# csv version 1
df = pd.read_csv(r.content)
# Returns error, file does not exist. Apparently read_csv() is also trying to read it as a file.
# csv version 2
fh = io.BytesIO(r.content)
df = pd.read_csv(fh) # ValueError: No columns to parse from file.
# csv version 3
s = StringIO(r.content)
df = pd.read_csv(s)
# No error, but the resulting df is not parsed properly; \t's show up in the text of the dataframe.
Simply wrap the file contents in a BytesIO:
with io.BytesIO(r.content) as fh:
df = pd.io.excel.read_excel(fh, sheetname=0)
This functionality was included in an update from 2014. According to the documentation it is as simple as providing the url:
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file://localhost/path/to/workbook.xlsx
Based on the code you've provided, it looks like you are using pandas 0.13.x? If you can upgrade to a newer version (code below is tested with 0.16.x) you can get this to work without the additional utilization of the requests library. This was added in 0.14.1
data2 = pd.read_excel(data_url)
As an example of a full script (with the example XLS document taken from the original bug report stating the read_excel didn't accept a URL):
import pandas as pd
data_url = "http://www.eia.gov/dnav/pet/xls/PET_PRI_ALLMG_A_EPM0_PTC_DPGAL_M.xls"
data = pd.read_excel(data_url, "Data 1", skiprows=2)

Categories

Resources