I'm trying to import some values from a csv-file to a numpy array in python.
So far I've read the CSV-file with pandas but I can't succeed with creating a numpy array with the values from the csv columns.
Just found the answer. Just had to use DataFrame.values
Related
So I've been tasked with creating a suitable 2D array to contain all of the data from a csv with data on rainfall from the whole year. In the csv file, the rows represent the weeks of the year and the columns represent the day of the week.
I'm able to display the date I want using the following code.
import csv
data = list(csv.reader(open("rainfall.csv")))
print(data[1][2])
My issue is I'm not sure how to store this data in a 2D array.
I'm not sure how to go about doing this. Help would be appreciated, thanks!
You could use numpy for that. It seems to me, that you have created a list of lists in data. With that you can directly create a 2D numpy-array:
import numpy as np
2d_data = np.array(data)
Or you could even try to directly read the file with numpy:
import numpy as np
# Use the appropriate delimiter here
2d_data = np.genfromtxt("rainfall.csv", delimiter=",")
With pandas:
import pandas as pd
# Use the appropriate delimiter here
2d_data = pd.read_csv("rainfall.csv")
I am trying to import a csv file as an array in python using the ""numpy.loadtxt"" method. It keeps returning ""ValueError: could not convert string to float: ''"" despite there not being any blank cells in the csv file. Here is my code
import csv
import torch
import numpy as np
import pandas as pd
array = np.loadtxt("HIP Only 2.csv", dtype=np.float32, delimiter=",", skiprows=1)
It seems that there are some Non-numerics in the cells, which may be related to specified strings or errors created due to unsuccessful formula in some cells e.g. #DIV/0! in excel files which is appeared when corresponding cells have not filled or numbers divided by zero. numpy.loadtxt is for using when no data is missed. If getting array is the main goal, not using numpy.loadtxt, numpy.genfromtxt is more flexible and could be used instead e.g.:
array = np.genfromtxt("HIP Only 2.csv", dtype=np.float32, delimiter=",", skip_header=1)
Hope it e helpful.
I wanna find the median of a dataset using np.median . But for unexpected reasons, the numpy results differ from each other. If I'm converting the dataframe into a list and than use np.median(li) I've got 1.0791015625 as a result. However if I'm using np.median(df['diesel'])I've got 1.079 as a result. Interestingly using statistics.median() works for both versions (using a list or a dataframe). Does anyone know what I did wrong or what could caused this problem?
import pandas as pd
import numpy as np
import statistics
import math
df = pd.read_csv("2020-08-09-prices.csv",sep=',', usecols=['diesel'], dtype={'diesel': np.float16})
df.info()
li=df['diesel'].tolist()
print(df.describe())
print(np.median(li))
print(statistics.median(df['diesel']))
print(np.median(df['diesel']))
This is where I got the csv file from: https://dev.azure.com/tankerkoenig/_git/tankerkoenig-data?path=%2Fprices%2F2020%2F08
I have a couple of 1D arrays and a 2D array, that I want to view in an excel file. I am generating and manipulating these arrays in python but I want to view them finally in an excel file.
Is there a way I can just export the arrays to excel, rather than copying these arrays element by element using xlsxwriter, like they've shown it here?
I would agree with the above as the trick is to use pandas. My example below shows the creation of an excel file with each array in a different sheet.
import numpy as np
import pandas as pd
# create two 1 D arrays
A1d1 = np.full((5),fill_value=1)
B1d2 = np.full((5),fill_value=2)
# create one 2 D array
C2d3 = np.full((5,5),fill_value=3)
# convert to pandas DataFrames
A1d1_df = pd.DataFrame(A1d1)
B1d2_df = pd.DataFrame(B1d2)
C2d3_df = pd.DataFrame(C2d3)
# Use pandas Excel Writer to create one Excel file with
# a sheet for each array
with pd.ExcelWriter('yourexcelfile.xlsx') as writer:
A1d1_df.to_excel(writer, sheet_name='A1d1')
B1d2_df.to_excel(writer, sheet_name='B1d2')
C2d3_df.to_excel(writer, sheet_name='C2d3')
One solution could be using the pandas package and then creating a CSV.
import pandas
dataframe_array= pandas.DataFrame(your_array)
dataframe_array.to_csv(your_path)
and then just view the csv in excel
I try get mean from csv line. I get data from csv in string list, further i convert it to array with numpy. Its work perfect when i try plot some graphics.
But when i calculate mean i get some errors with my data.
If i use NumPy i get:
TypeError: cannot perform reduce with flexible type
If i use statistics library i get:
TypeError: can't convert type 'string' to numerator/denominator
If i check my array with comand 'type' on iPython i see that it numpy.ndarray type.
Whats wrong with my array? Can you explain, why convert numpy.asarray for matplotlib work perfect, but get wrong type for different operation.
import csv
import numpy as np
import statistics as stat
life_exp=[]
with open('country.csv') as csvfile:
datareader = csv.reader(csvfile)
for row in datareader:
if datareader.line_num!=1:
life_exp.append(row[1])
array_life_exp = np.asarray(life_exp)
print(stat.mean(array_life_exp))
print(np.mean(array_life_exp))
Try this:
from pandas import read_csv
data = read_csv('country.csv')
print(data.iloc[:,1].mean())
This code will convert your csv to pandas dataframe with automatic type conversion and print mean of the second column.