I'm facing an issue where I need to read the file but instead giving me an error
"Unable to allocate 243. MiB for an array with shape (5, 6362620) and data type float64"
here are my code
import numpy as np
import pandas as pd
import os
for dirname, _, filenames in os.walk('D:/School/Classes/2nd Sem/Datasets/fraud.csv'):
for filename in filenames:
print(os.path.join(dirname, filename))
df = pd.read_csv('D:/School/Classes/2nd Sem/Datasets/fraud.csv')
when i run the last line of code, it will give me an error.
PS. I am using python3 jupyter notebook, windows 10 home single language
The MemoryError is coming because your file is too large in size, to solve this, you can use the chunk-size.
import pandas as pd
df = pd.read_csv("D:/School/Classes/2nd Sem/Datasets/fraud.csv", chunksize=1000)
Link for more help -
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Related
I'm getting the following error:
MemoryError: Unable to allocate array with shape (118, 840983) and data type float64
in my python code whenever I am running a python pandas.readcsv() function to read a text file. Why is this??
This is my code:
import pandas as pd
df = pd.read_csv("LANGEVIN_DATA.txt", delim_whitespace=True)
The MemoryError means, you file is too large to readcsv in one time, you need used the chunksize to avoid the error.
just like:
import pandas as pd
df = pd.read_csv("LANGEVIN_DATA.txt", delim_whitespace=True, chunksize=1000)
you can read the official document for more help.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
I am trying to import an Excel sheet from my computer into Jupyter Notebook. I am using the below code:
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\ALL my folders\Budget_2021_Twinsies.xlsx')
I get a lengthy error code, the essence of which is:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\User\\Desktop\\ALL my folders\\Budget_2021_Twinsies.xlsx'
I do have this file on my computer and I type in the correct directory, yet still get this error.
Could anyone shed some light?
The best practice is to use the pathlib module function that always select the correct configuration for your OS.
import pandas as pd
import pathlib
file = pathlib.Path(mydir, myfile)
df = pd.read_excel(file)
While trying to load a big csv file (150 MB) I get the error "Kernel died, restarting". Then only code that I use is the following:
import pandas as pd
from pprint import pprint
from pathlib import Path
from datetime import date
import numpy as np
import matplotlib.pyplot as plt
basedaily = pd.read_csv('combined_csv.csv')
Before it used to work, but I do not know why it is not working anymore. I tried to fixed it using engine="python" as follows:
basedaily = pd.read_csv('combined_csv.csv', engine='python')
But it gives me an error execution aborted.
Any help would be welcome!
Thanks in advance!
It may be because of the lack of memory you got this error. You can split your data in many data frames, do your work than you can re merge them, below some useful code that you may use:
import pandas as pd
# the number of row in each data frame
# you can put any value here according to your situation
chunksize = 1000
# the list that contains all the dataframes
list_of_dataframes = []
for df in pd.read_csv('combined_csv.csv', chunksize=chunksize):
# process your data frame here
# then add the current data frame into the list
list_of_dataframes.append(df)
# if you want all the dataframes together, here it is
result = pd.concat(list_of_dataframes)
I am trying to read into a pandas dataframe from a csv. The data is in the format:
date,total_bytes
2018-08-27,1.84E+14
2018-08-30,1.90E+14
2018-08-31,1.93E+14
My code looks like:
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
series =
read_csv(r'/Users/taylorjewell/Desktop/dataset_size_daily.csv',
header=0)
print(series.head())
series.plot()
pyplot.show()
Despite that path existing (I have checked countless times), I am getting a file not found exception for some reason:FileNotFoundError: File b'/Users/taylorjewell/Desktop/dataset_size_daily' does not exist
I am running this on a mac if that is relevant. Any help you are able to offer would be much appreciated!!
For file paths, I would suggest using pathlib:
from pathlib import Path
data_file = Path("/Users/taylorjewell/Desktop/dataset_size_daily.csv")
series = read_csv(data_file, header=0)
However, it also depends on where you are trying to access the file from.
i dont think you need to use the r bit for mac
try
read_csv('/Users/taylorjewell/Desktop/dataset_size_daily.csv',
header=0)
Just ran into this issue today and wanted to share-
If you download a CSV file to a mac
But then open the file and save it
The file extension changes to .numbers
So make sure you just move the file without opening it, and double-check that the file extension is .csv
My code looks like this:
import pandas as pd
import os
import glob
import numpy as np
# Reading files and getting Dataframes
PathCurrentPeriod = '/home/sergio/Documents/Energyfiles'
allFiles = glob.glob(PathCurrentPeriod + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_)
list_.append(df)
frame = pd.concat(list_, axis='rows')
However, files are about 300. I'm thinking I get a "killed" response from the terminal when I run it on VSCode since trying to get those 300 files stored on "frame" may cause the Virtual Machine where I run this to go out of RAM Memory.
Is there a work around? Is it possible to use the Hard Drive as the memory for processing instead or the RAM?.
The problem is not the size itself of each .csv, so that I could read them by chunks...the problem is that I'm appending too many.