Read SAS data into Python - python

I am tying to read SAS dataset using Python but this is showing an error:
"IndexError list assignment index out of range"
I am not sure what could be the reason. Can anyone help me out?
Following is the code where I am trying to read SAS data (which is in multi millions) into Python:
import pandas as pd
import numpy as np
from sas7bdat import SAS7BDAT
with SAS7BDAT('/dat_xyz/mdpqr/data_test.sas7bdat') as m:
mdata = m.to_data_frame()
Let me know the solution.
Thanks,
Surya

Related

Inconsistency in results when using numpy.median on the dataframe vs list

I wanna find the median of a dataset using np.median . But for unexpected reasons, the numpy results differ from each other. If I'm converting the dataframe into a list and than use np.median(li) I've got 1.0791015625 as a result. However if I'm using np.median(df['diesel'])I've got 1.079 as a result. Interestingly using statistics.median() works for both versions (using a list or a dataframe). Does anyone know what I did wrong or what could caused this problem?
import pandas as pd
import numpy as np
import statistics
import math
df = pd.read_csv("2020-08-09-prices.csv",sep=',', usecols=['diesel'], dtype={'diesel': np.float16})
df.info()
li=df['diesel'].tolist()
print(df.describe())
print(np.median(li))
print(statistics.median(df['diesel']))
print(np.median(df['diesel']))
This is where I got the csv file from: https://dev.azure.com/tankerkoenig/_git/tankerkoenig-data?path=%2Fprices%2F2020%2F08

How do i import datasets in Python?

I try to import some datasets in my code. I need help, because I tried a lot of tutorials and web pages and I am still gettting errors. I use Spyder IDE and python 3.7:
import numpy as np
import pandas as pd
import tensorflow as tf
import os
dts1=pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
dts1
This works for me. If you are still experiencing errors, please post them.
import pandas as pd
# Read data from file 'sample_submission.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
data = pd.read_csv(r"C:\Users\Cucu\Desktop\sample_submission.csv")
# Preview the first 5 lines of the loaded data
print(data.head())
Try using other approaches :
pd.read_csv("C:\\Users\\Cucu\\Desktop\\sample_submission.csv")
pd.read_csv("C:/Users/Cucu/Desktop/sample_submission.csv")

How to read a Time-series data in Python 3.7

So today, I started with time-series data using Python. First, I tried reading the time series data from a CSV file by using the panda library pd
Unfortunately, I keep getting this error? Any help on this would be highly appreciated.
PS: I am using Python 3.73
address = 'C:/Users/Anih John/Desktop/Python-workstation/ff/Superstore-Sales.csv'
Superstore = pd.read_csv(address, index_col='Order Date',parse_dates=True)
print(Superstore)
Then I get the following error:
Unable to open 'parsers.pyx': Unable to read file (Error: File not found (c:\users\anih john\desktop\python-workstation\ff\pandas\_libs\parsers.pyx)).
I would try with the following commands:
import pandas as pd
address = 'C:\\Users\\Anih John\\Desktop\\Python-workstation\\ff\\Superstore-Sales.csv'
Superstore = pd.read_csv(address, index_col='Order Date',parse_dates=True, engine='python',sep=',')
print(Superstore)
EDIT: Or simply using the file in the read_csv function.
import pandas as pd
Superstore = pd.read_csv('C:\\Users\\Anih John\\Desktop\\Python-workstation\\ff\\Superstore-Sales.csv', index_col='Order Date',parse_dates=True, engine='python',sep=',')
print(Superstore)

How to convert a csv file to a dataframe in Python 3.6 [duplicate]

This question already exists:
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
Closed 4 years ago.
Im trying to tackle the Kaggle Titanic challenge. Bear with me, as Im fairly new to data science. I was previously struggling to get the following syntax to work: my previous question(Reading CSV files in Python 3.6, using IntelliJ IDEA)
Reading CSV files in Python, using Jupyter Notebook through IntelliJ IDEA
import numpy as np
import pandas as pd
from pandas import Series,Dataframe
titanic_df = pd.read_csv('train.csv')
titanic.head()
However, using the below code, I am able to open the file and read it/print its contents, but i need to convert the data to a dataframe so that it can be worked with. Any suggestions?
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
with open(file_path) as train_fp:
for line in train_fp:
# print(line)
This above code was able to print out the data but when I tried passing
'file_path' to:
titanic_df = pd.read_csv('file_path.csv')
i received the same error as before. Not sure what Im doing wrong. I KNOW the file 'train.csv' exists in that location because 1) i put it there and 2) its contents can be printed when pointed to its location.
So what the heck am I doing wrong??? :/
read_csv will create a Pandas DataFrame. So, as long as your file path is right, this following code should work. Also, make sure to use the file_path variable and not the string "file_path.csv"
import pandas as pd
file_path = '/Volumes/LACIE SETUP/Data_Science/Data_Analysis_Viz_InPython/Example_Projects/train.csv'
titanic_df = pd.read_csv(file_path)
titanic_df.head()

Editing a csv file with python

Ok so I'm looking to create a program that will interact with an excel spreadsheet. The idea that seemed to work the most is converting it to a csv file. I've managed to make a program that prints the data but I want it to edit it and thus change the results in the csv file itself.
Sorry if it's a bit confusing as my programming skills aren't great.
Heres the code:
import csv
with open('wert.csv') as csvfile:
freq=csv.reader(csvfile, delimiter=',')
for row in freq:
print(row[0],row[1],row[2])
If anyone has a better idea on how to make this program work then it would be greatly appreciated.
Thanks
You could try using the pandas package, a widely used data analysis/manipulation library.
import pandas as pd
data = pd.read_csv('foo.csv')
#change data here, see pandas documentation
data.to_csv('bar.csv')
You can find the docs here
If you csv file is composed of just numbers (floats) or numbers and a header, you can try reading it with:
import numpy as np
data=np.genfromtxt('name.csv',delimiter=',',skip_header=1)
Then modify your data in python, and save it with:
data_modified=data**2 #for example
np.savetxt('name_modified.csv',data_modified,delimiter=',',header='whaterverheader,you,want')
You can read the excel file directly using pandas and do the processing directly
import pandas
measured_data = pandas.read_excel(filename)

Categories

Resources