importing a dat file into pandas dataframe

importing a dat file into pandas dataframe - python

I have a dat file downloaded from the below address:
ratings
I need to import it as a pandas dataframe. I've used the below code:
ratings = pd.read_csv('ratings.dat', sep='::', header=None, names['user_id', 'movie_id', 'rating', 'timestamp'])
But the datframe resulted as below which is false:
should I used another method to import dat files?
I've also checked the below link but that doesn't help me.
Read data (.dat file) with Pandas

Works well for me :
DAT=pd.read_csv('https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch02/movielens/ratings.dat',
sep='::', header=None, names=['user_id', 'movie_id', 'rating', 'timestamp'])
DAT.head(5)
Are you sure you read the proper file (the raw one from github) ?
The URL you mention doest not point to the proper file, use the raw one.

Related

Cannot read content in CSV File in Pandas

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!

If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')

try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

Pandas read_excel - How to try two different sheet names to import

I am attempting to import a large group of excels and the code that selects what to import is included below.
df = pd.read_excel (file, sheet_name = ['Sheet1', 'Sheet2'])
I know that the excels either use sheet1 or sheet2, however they do not use both. This makes my code error out. Is there anyway to tell pandas to try importing sheet1, and if that errors, trying sheet2?
Thanks for any help.

try:
df = pd.read_excel (file, sheet_name = ['Sheet1'])
except:
df = pd.read_excel (file, sheet_name = ['Sheet2'])

Assuming your Excel files aren't too large to import everything, you could do this:
df = pd.read_excel(file, sheet_name=None)
That would return all the sheets in the file as a dict, where the key is sheet name and the value is the dataframe. You can then test for the key you want and use that sheet, and drop the rest.
(Edit: I'll note that this may be a heavy-handed approach, but I tried to generalize the answer to how to select one or more sheets when you aren't sure of their names)

combining multiple files into a single file with DataFrame

I have been able to generate several CSV files through an API. Now I am trying to combine all CSV's into a unique Master file so that I can then work on it. But it does not work. Below code is what I have attempted What am I doing wrong?
import glob
import pandas as pd
from pandas import read_csv
master_df = pd.DataFrame()
for file in files:
df = read_csv(file)
master_df = pd.concat([master_df, df])
del df
master_df.to_csv("./master_df.csv", index=False)

Although it is hard to tell what the precise problem is without more information (i.e., error message, pandas version), I believe it is that in the first iteration, master_df and df do not have the same columns. master_df is an empty DataFrame, whereas df has whatever columns are in your CSV. If this is indeed the problem, then I'd suggest storing all your data-frames (each of which represents one CSV file) in a single list, and then concatenating all of them. Like so:
import pandas as pd
df_list = [pd.read_csv(file) for file in files]
pd.concat(df_list, sort=False).to_csv("./master_df.csv", index=False)
Don't have time to find/generate a set of CSV files and test this right now, but am fairly sure this should do the job (assuming pandas version 0.23 or compatible).

EmptyDataError: No columns to parse from file

Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:

Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.

First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.

[Python]; Parser error: Too many columns specified

I just want to read a simple .csv file with a header specifying the column types.
The following is the code:
import pandas as pd
url="https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=0"
names=['User country','Nr. reviews','Nr. hotel reviews','Helpful
votes','Score','Period of stay','Traveler Type','Pool','Gym','Tennis
court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr.
rooms','User continent','Member years','Review month','Review weekday']
data=pd.read_csv(url, names=names, header=0, delimiter=';',
error_bad_lines=False)
print(data.shape)
OUT:-
ParserError: Too many columns specified: expected 20 and found 2
P.S:The URL is public and can be accessed

The problem is the URL doesn't directly lead to the .csv file. It leads to the entire html page.
You can see that by removing the names argument
pd.read_csv(url, header=0, delimiter=';', error_bad_lines=False)
This successfully executes, but when inspecting the returned values, you'll see html code and JavaScript scripts.
What you need to do is make sure you provide actual csv as input (try another source for the .csv file)

In dropbox url just replace 0 with 1 as below
https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=1
Which makes the file to be downloaded directly

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

importing a dat file into pandas dataframe - python

Related

Cannot read content in CSV File in Pandas

Pandas read_excel - How to try two different sheet names to import

combining multiple files into a single file with DataFrame

EmptyDataError: No columns to parse from file

[Python]; Parser error: Too many columns specified

Categories

Resources