How do I convert an NDFrame to a DataFrame?

How do I convert an NDFrame to a DataFrame? - python

I can't seem to find any information around converting an NDFrame to a DataFrame. I am looking to do this as I can't seem to write an NDFrame to a CSV file. I've tried the code below, but it still returns an NDFrame. How do I make this conversion? Or how do I write an NDFrame to CSV?
df = pd.DataFrame(df)
Here is the error I'm receiving:
ImportError: cannot import name 'get_compression_method' from 'pandas.io.common'

Related

String to pandas dataframe Name too long error

Having a comma separated string.When i export it in a CSV file,and then load the resultant CSV in pandas,i get my desired result. But when I try to load the string directly in pandas dataframe,i get error Filename too big. Please help me solve it.

I found the error. Actually it was showing that behaviour for large strings.
I used
import io
df= pd.read_csv(io.StringIO(str),sep=',', engine = 'python')
It solved the issue..str is the name of the string.

Problems working with a column of a CSV file

I am reading a CSV file using this piece of code:
import pandas as pd
import os
#Open document (.csv format)
path=os.getcwd()
database=pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0)
#Date in the requiered format
size=len(database.Date)
I get the next error: 'DataFrame' object has no attribute 'Date'
As you can see in the image, the first column of mydocument.csv is called Date. This is weird because I used this same procedure to work with this document, and it worked.

Try using delimeter=',' . It must be a comma.

(Can't post comments yet, but think I can help).
The explanation for your problem is, simply, you don't have a column called Date. Has pandas interpretted Date as an index column?
IF your Date is spelled correctly (no trailing whitespace or something else that might confuse things), then try this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=False)
or if that fails this:
pd.read_csv(path+"/mydocument.csv",delimiter=';',header=0, index_col=None)
In my experience, I've had pandas unexpectedly infer an index_col.

Saving the DateTime format applied in the .csv file in Pandas

I have imported a csv file in pandas that contains fields that look like 'datetime' but initially parsed as 'object'. I make the required conversion from 'datetime' to 'object' using 'df.X = pd.to_datetime(df.X)'.
Now, when I try to save these changes by writing this out to a new .csv file and importing that, the format is still 'object'. Is there anyway to fix it's datatype so that on importing it I don't have to perform the conversion everytime? My dataset is quite big and conversion takes some time, which I want to save.

Date parsing can be expensive, so pandas doesn't parse dates by default. You need to specify parse_dates argument when call read_csv
df = pd.read_csv('my_file.csv', parse_dates=['date_column'])

Convert dataframe column into integers from within loop

I'm trying loop through a folder of csv's and put them into a dataframe, change certain columns into an integer, before passing them through a Django model. Here is my code:
import glob
import pandas as pd
path = 'DIV1FCS_2017/*/*'
for fname in glob.glob(path):
df = pd.read_csv(fname)
df['Number'].apply(pd.to_numeric)
I am receiving the following: ValueError: Unable to parse string
Does anybody know if I can convert a column of strings into integers using pd.to_numeric from within a loop? Outside of the loop it seems to work properly.

I think you probably have some non-numbers data stored in your dataframe, and that's what's casuing the error.
You can examine your data and make sure everything's fine. In the meantime, you can also do pd.to_numeric(errors="ignore") to ignore errors for now.

Proper way of writing and reading Dataframe to file in Python

I would like to write and later read a dataframe in Python.
df_final.to_csv(self.get_local_file_path(hash,dataset_name), sep='\t', encoding='utf8')
...
df_final = pd.read_table(self.get_local_file_path(hash,dataset_name), encoding='utf8',index_col=[0,1])
But then I get:
sys:1: DtypeWarning: Columns (7,17,28) have mixed types. Specify dtype
option on import or set low_memory=False.
I found this question. Which in the bottom line says I should specify the field types when I read the file because "low_memory" is deprecated... I find it very inefficient.
Isn't there a simple way to write & later read a Dataframe? I don't care about the human-readability of the file.

You can pickle your dataframe:
df_final.to_pickle(self.get_local_file_path(hash,dataset_name))
Read it back later:
df_final = pd.read_pickle(self.get_local_file_path(hash,dataset_name))
If your dataframe ist big and this gets to slow, you might have more luck using the HDF5 format:
df_final.to_hdf(self.get_local_file_path(hash,dataset_name))
Read it back later:
df_final = pd.read_hdf(self.get_local_file_path(hash,dataset_name))
You might need to install PyTables first.
Both ways store the data along with their types. Therefore, this should solve your problem.

The warning is because Pandas has detected conflicting Data values in your Column. You can specify the datatypes in the DataFrame Constructor if you wish.
,dtype={'FIELD':int,'FIELD2':str}
Etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I convert an NDFrame to a DataFrame? - python

Related

String to pandas dataframe Name too long error

Problems working with a column of a CSV file

Saving the DateTime format applied in the .csv file in Pandas

Convert dataframe column into integers from within loop

Proper way of writing and reading Dataframe to file in Python

Categories

Resources