I keep getting a Name Error in anaconda and I did try to import numpy as nan the error does not change. Anybody that can point me in the right direction??
Code snipped shared below
import pandas as pd
import lzhw
import time
#Start counting the time
start = time.perf_counter()
#Begin compression
chunks = int(gc.shape[0] / 4) ## to have 4 chunks
compressed_chunks = lzhw.CompressedFromCSV("Fake\\File\\Path\\sensor_readings.csv", chunksize = chunks)
print("Execution Complete")
The easiest way is to import nan from numpy:
from numpy import nan
In your snippet, by running import numpy as nan you were making a short-hand label for numpy, which is usually np:
import numpy as np
Related
I have the following MATLAB code that works fine using text data files, now I am trying to rewrite it using Python but running into errors. I have results that I am trying to apply some calculations on (perform data analysis). My results are in the format of binary files and I have a specific package I am using to help me import the data. For example, here ne is a 1024x256 array with 159 number of files printed per each iteration. So, in MATLAB I can simply do the following:
% Load data:
frame = 6; % how many number of output files
ne_bg = load([DirPath '/ne_unpert.txt']);
ne_p = load([DirPath '/ne_' num2str(frame) '.txt']);
% perform calculations on data:
ne = ne_bg + ne_p;
dn_over_n = ne_p ./ ne;
Since MATLAB deals easily with multi-dimensional arrays and matrices, I am struggling to interpret that to python.
My Python code:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec
import matplotlib.colors as colors
import matplotlib.patches as patches
import scipy.optimize as opt
from scipy.special import erf, comb, gamma, gammainc
import scipy.constants as const
from scipy.integrate import odeint
import sys
from glob import glob
from mpl_toolkits.axes_grid1 import make_axes_locatable
import Package as pg
# Initialize sizes
ne = np.zeros((1024,256))
ne_p = np.zeros((1024,256))
# Data
data = pg.GData('ne_p.bp')
dg = pg.GInterpModal(data, 2, 'ms')
#dg.interpolate(overwrite=True)
ne_p = data.getValues()
data = pg.GData('ne0.gkyl')
dg = pg.GInterpModal(data, 2, 'ms')
#dg.interpolate(overwrite=True)
ne_bg = data.getValues()
for i in range(1,159): # would like to look at files start from 1 to 159 not 0
data = pg.GData('ne{:d}.gkyl'.format(i))
dg = pg.GInterpModal(data, 2, 'ms')
ne[i,:] = data.getValues() # ERROR HERE
dn_over_n = ne_p/ne # get
....
Error message:
ValueError Traceback (most recent call last)
<ipython-input-35-d6134fb807e8> in <module>
48 dg = pg.GInterpModal(data, 2, 'ms')
49 #dg.interpolate(overwrite=True)
---> 50 ne[i,:] = data.getValues()
ValueError: could not broadcast input array from shape (1024,256,1) into shape (256)
Can someone show me how to fix this and explain what it means?
I have a function that I created and I want the function to be applied to these different values using a for loop or something.
How do I create a for loop that takes each value but stores them in different arrays?
I have this so far:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import netCDF4 as s
import numpy.ma as ma
fwf_tot = fwf_ice + ds.runoff_tundra*ds.LSMGr #data input i am using
# function i want to apply to the data
def ob_annual(ob_monthly, id_number):
ann_sum = ob_monthly.where(ds.ocean_basins == id_number).resample(TIME='1AS').sum().sum(dim=('X','Y'))
return ann_sum
This is where my problem is to create the for loop to save for these different values. I think this for loop is just saving the function applied to the last value (87) and not the others. How might I fix this? I expected there to be an output of 7 arrays with each a size of 59.
obs = np.array([26,28,29,30,76,84,87])
total_obs = []
for i in obs:
total_obs = ob_annual(fwf_tot_grnl, i)
print(total_obs.shape)
(59)
You replace your list total_obs at each iteration. You must append each value into it:
for i in obs:
total_obs.append(ob_annual(fwf_tot_grnl, i))
or use a comprehension list
total_obs = [ob_annual(fwf_tot_grnl, i) for i in obs]
Whenever i run this code, it runs successfully, but the Kernel dies when the last 4 lines of code are executed, where the Apriori algorithm is used for Market Basket Analysis:
Data set:
https://archive.ics.uci.edu/ml/datasets/Online+Retail
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import mlxtend as ml
print("test1")
retail_df=pd.read_excel("Online Retail.xlsx",sheet_name="Online Retail")
retail_df.head()
rslt_df = retail_df[retail_df['Quantity'] > 5]
rslt_df=rslt_df.iloc[:10000]
rslt_df.shape
rslt_df.head()
df = rslt_df.groupby(['Quantity','Description']).size().reset_index(name='count')
df.head()
basket = df.groupby(['Quantity', 'Description'])['count'].sum().unstack().reset_index().fillna(0).set_index('Quantity')
basket
#The encoding function
def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
basket_sets
**#THE NOTEBOOK CRASHES FOR THE BELOW 4 LINES OF CODE**
frequent_itemsets = apriori(basket_sets, min_support=0.01, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift")
rules.sort_values('confidence', ascending = False, inplace = True)
rules.head(10)
Please help in resolving this.
I tried every method but didn't worked out for me.
I had the same problem as you.
I changed min support to 0.5 or somethings else that are more than 0.01 and it worked!
I think it’s because there are a lot of output.
I hope this will help you as well.
I am new to Python and I am trying to perform a spline interpolation. My data contains 3 columns with a number of rows having 'NaN' in one of the columns. I need to ignore/remove the NaN without reducing the length. I have tried a number of ways, but each time the length is reduced. Any help or advice would be grateful received.
import numpy as np
import pandas as pd
import scipy.linalg
import matplotlib.style
import math
data = pd.read_excel('prob_data.xlsx')
np.array(data['A'])
np.array(data['B'])
np.array(data['C'])
x = abun_data['A'][~np.isnan(abun_data['A'])]
print(len(x))
z = abun_data['B'][~np.isnan(abun_data['B'])]
print(len(z))
y = abun_data['C'][~np.isnan(abun_data['C'])]
print(len(y))
You can use SimpleInputer class:
from sklearn.impute import SimpleImputer
inputer = SimpleImputer(strategy='median')
data = pd.read_excel('prob_data.xlsx')
nice_data = pd.DataFrame(imputer.fit_transform(data))
I used np.argmax to search for the index of the highest value of this array:
And it returned 720. It was supposed to be 721. I tried to google the problem but haven't found the solution yet.
Here is my code:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from matplotlib.pylab import rcParams
from statsmodels.tsa.stattools import acf, pacf
dir='C:\\Users\\DELL\\Google Drive\\JVN couse materials\\Projects\\Practice projects\\Time series project\\energydata_complete.csv'
rawdata=pd.read_csv(dir, index_col='date')
timeseries=pd.DataFrame(rawdata['Appliances'])
timeseries.index=pd.to_datetime(timeseries.index)
timeseries['Log scale']=np.log10(timeseries['Appliances'])
lag_pacf = pacf(timeseries.loc['2016-01-12':'2016-01-21','Log scale'], nlags=1439, method='ols')
highest_pacf_lag=np.argmax(lag_pacf[1:]) ###this is where the problem happens
csv file indexes values from 1 and Python (and numpy and pandas too)is zero indexed. Hence cell no 721 is shown as 720 in python