I just want to import graphs from external sources into python and read corresponding x and y-values.
Is it possible in with any python module and if possible what format can the graphs be imported?
I searched for such modules but could only find articles for plotting graphs
you can use pandas library for loading .csv, .xls, .xlsx and some other files with it.
you can install it using pip install pandas, this is the example for loading files:
import pandas as pd
df = pd.read_csv("file.csv")
df.head()
for loading csv file into python list you can use:
import pandas as pd
df = pd.read_csv("file.csv")
x_list = df["x"].tolist() # reading the column named "x" and convert it to list
y_list = df["y"].tolist() # reading the column named "y" and convert it to list
and for loading them into numpy you can use:
import pandas as pd
df = pd.read_csv("file.csv")
x_list = df["x"].to_numpy() # reading the column named "x" and convert it to numpy array
y_list = df["y"].to_numpy() # reading the column named "y" and convert it to numpy array
Related
I have a CSV that I want to graph.
However, to get this graph, I need to first assign a column to a list (or array) and then go on from there. I need to assign the first column to said list. In the said column, there are many repeats of the numbers 1 through 45 (so in code that would be range(1,46)).
Currently, I have written this so far:
for weekly sales against Date
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
NOTE: walmart (the value associated to the dataset) has already been read here by pd.read_csv. It works and an output has been made.
I do not know what to do from here. I want to graph this as well based on the store.
The data set can be found: https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction
There are many ways to do this but the easiest that comes to mind is using pandas dataframe
First you need to install it in your environment. I see you tagged anaconda so this would be something like:
$ conda install pandas
Then import them in your python file (presumingly Jupyter notebook)
import pandas as pd
Then you would import the csv into a dataframe using the build in read_csv function (you can do many cool things with it so checkout the docs)
In your case assume you want to import just columns say number 3 and 5 and then plot them. If the first row in your csv contains the header (say 'col3'and 'col5') this should be read automatically and stored as the column name(If you want to skip the header reading add the option skiprows=1, if you want the columns to be named something else use the option names=['newname3', 'newname5']
data = pd.read_csv('path/to/my.csv', usecols=[3,5], names=['col1', 'col2'])
Then you can access the columns by name and plot them using data['colname']:
import matplotlib.pyplot as plt
plt.scatter(data['col1'], data['col2'])
plt.show()
Or you can use the built in function of pandas dataframes:
data.plot.scatter(x='col1', y='col2)
I have found out what I need to do to get this to work. The following code describes my situation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
w = b[b.Store == printval]
ws = w["Weekly_Sales"]
tp = w["Date"]
plt.scatter(tp, ws)
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.title('Store_' + str(printval))
plt.savefig('Store_'+ str(printval) + '.png') #To save the file if needed
plt.show()
Again, I have already imported the CSV file, and associated it to walmart. There was no error when doing that.
Again, the dataset can be found in https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction.
I have a (theoretically) simple task. I need to pull out a single column of 4000ish names from a table and use it in another table.
I'm trying to extract the column using pandas and I have no idea what is going wrong. It keeps flagging an error:
TypeError: string indices must be integers
import pandas as pd
file ="table.xlsx"
data = file['Locus tag']
print(data)
You have just add file name and define the path . But you cannot load the define pandas read excel function . First you have just the read excel function from pandas . That can be very helpful to you read the data and extract the column etc
Sample Code
import pandas as pd
import os
p = os.path.dirname(os.path.realpath("C:\Car_sales.xlsx"))
name = 'C:\Car_sales.xlsx'
path = os.path.join(p, name)
Z = pd.read_excel(path)
Z.head()
Sample Code
import pandas as pd
df = pd.read_excel("add the path")
df.head()
I read my arff dataframe from here https://archive.ics.uci.edu/ml/machine-learning-databases/00426/ like this:
from scipy.io import arff
import pandas as pd
data = arff.loadarff('Autism-Adult-Data.arff')
df = pd.DataFrame(data[0])
df.head()
But my dataframe has b' in all values in all columns:
How to remove it?
When i try this, it doesn't work as well:
from scipy.io import arff
import pandas as pd
data = arff.loadarff('Autism-Adult-Data.arff')
df = pd.DataFrame(data[0].str.decode('utf-8'))
df.head()
It says AttributeError: 'numpy.ndarray' object has no attribute 'str'
as you see .str.decode('utf-8') from Removing b'' from string column in a pandas dataframe didn't solve a problem
This doesn't work as well:
df.index = df.index.str.encode('utf-8')
A you see its both string and and numbers are bytes object
I was looking at the same dataset and had a similar issue. I did find a workaround and am not sure if this post will be helpful? So rather than use the from scipy.io import arff, I used another library called liac-arff. So the code should be like
pip install liac-arff
Or whatever the pip command that works for your operating system or IDE, and then
import arff
import pandas as pd
data = arff.loads('Autism-Adult-Data.arff')
Data returns a dictionary. To find what columns that dictionary has, you do
data.keys()
and you will find that all arff files have the following keys
['description', 'relation', 'attributes', 'data']
Where data is the actual data and attributes has the column names and the unique values of those columns. So to get a data frame you need to do the following
colnames = []
for i in range(len(data['attributes'])):
colnames.append(data['attributes'][i][0])
df = pd.DataFrame.from_dict(data['data'])
df.columns = colnames
df.head()
So I went overboard here with all creating the dataframe and all but this returns a data frame with no issues with a b', and the key is using import arff.
So the GitHub for the library I used can be found here.
Although Shimon shared an answer, you could also give this a try:
df.apply(lambda x: x.str.decode('utf8'))
I have loaded an arff file to python using this code:
import pandas as pd, scipy as sp
from scipy.io import arff
datos,meta = arff.loadarff(open('selectividad.arff', 'r'))
d = pd.DataFrame(datos)
When I use head function to see the data frame, this is how it looks:
However, those 'b' are not present in the arff file as we can see below:
https://gyazo.com/3123aa4c7007cb4d6f99241b1fc41bcb
What is the problem here? Thank you very much
For one column, apply the following code:
data['name_column'] = data['name_column'].str.decode('utf-8')
For a dataframe, apply:
str_df = df.select_dtypes([np.object])
str_df = str_df.stack().str.decode('utf-8').unstack()
I was able to load the .arff file using the following commands. But I was not able to extract the data from the object and convert the object into a dataframe format. I need this to do apply machine learning algorithms on this dataframe.
Command:-
import arff
dataset = pd.DataFrame(arff.load(open('Training Dataset.arff')))
print(dataset)
Please help me to convert the data from here into a dataframe.
import numpy as np
import pandas as pd
from scipy.io.arff import loadarff
raw_data = loadarff('Training Dataset.arff')
df_data = pd.DataFrame(raw_data[0])
Try this. Hope it helps
from scipy.io.arff import loadarff
import pandas as pd
data = loadarff('Training Dataset.arff')
df = pd.DataFrame(data[0])
Similar to answer above, but no need to import numpy