I'm trying to display the Descriptive statistic of a data frame with 12 variables.For readability my goal is to have all the variables on the same line.The following is my code
# Descriptive statistic
from pandas import read_csv
from pandas import set_option
filename = 'winequality-red.csv'
data = read_csv(filename,sep = ';')
set_option('display.max_columns',500)
descriptions = data.describe()
print(descriptions)
the output is giving me 2 lines for the variables
Please I need help to achieve my goal
thanks
In case of you are using IDE, or notebook, it's going to hurt the readability, and never help.
Anyway, it's could be done by using transpose descriptions.T.
If you are using IDE you could use IPython.display instead of print
from IPython.display import display
display(descriptions) # OR - display(descriptions.T)
Related
I have some sourcecode to dynamically generate tables from python with some headline and write it into some latex file. It looks like this:
import pandas as pd
import numpy as np
def generate_tables():
with open('tables.tex', 'w') as f:
# header
f.write("% !TeX TS-program = lualatex\n")
f.write("\\documentclass{article}\n")
f.write("\\usepackage{booktabs}\n")
f.write("\\usepackage{unicode-math}\n")
f.write("\\begin{document}\n")
fooder = "\\end{document}\n"
for i in range(10):
df = pd.DataFrame(np.random.random((5, 5)))
latex_table = df.to_latex(index=False,header=False,escape=False)
f.write(f"Table {i}:")
f.write("\n\\\\\n")
f.write(latex_table)
f.write("\\\\[2\\baselineskip]\n")
f.write(fooder)
generate_tables()
Is it possible to prevent the following warning:
FutureWarning: In future versions `DataFrame.to_latex` is expected to utilise the base implementation of `Styler.to_latex` for formatting and rendering. The arguments signature may therefore change. It is recommended instead to use `DataFrame.style.to_latex` which also contains additional functionality.
latex_table = df.to_latex(index=False,header=False,escape=False)
without loosing the parameters index=False,header=False,escape=False. In df.style.to_latex the paramter which can set are very different from df.to_latex
Found some solution
Replace
latex_table = df.to_latex(index=False,header=False,escape=False)
by
latex_table = tbl_cpy.style.hide(axis='index').hide(axis='columns').to_latex()
escape=False does not necessarily need to be replaced cause the escape symbols like $ are automatically imported.
i have a project on the university of making a decision tree, i already have the code that creates the tree but i want to print it, can anyone help me?
#IMPORT ALL NECESSARY LIBRARIES
import Chefboost as chef
import pandas as pd
archivo = input("INSERT FILE NAMED FOLLOWED BY .CSV:\n")
# READ THE DATA SET FROM THE CSV FILE
df = pd.read_csv(str(archivo))
df.columns = ['ph', 'soil_temperature', 'soil_moisture', 'illuminance', 'env_temperature','env_humidity','Decision']
# print(df.head(10)) #UNCOMMENT IF WANT FIRST 10 ROWS PRINTED OUT
config = {'algorithm':'ID3'} # CONFIGURE THE ALGORITH. CHOOSE BETWEEN ID3, C4.5, CART, Regression
model = chef.fit(df.copy(), config) #CREATE THE DECISION TREE BASED OF THE CONFIGURATION ABOVE
resultados = pd.DataFrame(columns = ["Real", "Predicción"]) #CREATE AN EMPTY PANDAS DATAFRAME
# SAVE ALL REAL VS ESTIMATED VALUES IN THE ABOVE DATAFRAME
for i in range(1,372):
l = []
l.append(df.iloc[i]['Decision'])
feature = df.iloc[i]
prediction = chef.predict(model, feature)
l.append(prediction)
resultados.loc[i] = l
print(l)
Not knowing the Chefboost library, I can't directly answer your question, but when I am working with a new library, I will often use a few tools to help me understand what the library is giving me. Use dir(object) to get a listing of the attributes and methods of the object.
You might also get a little more specific about what you want to see when you "Print the decision tree." Are you trying to print the model, or the predictions? What trouble are you having or what errors are you seeing?
Hope this helps.
[m ][1]
QUESTION #1) I am new to python and coding in general. I want to take my data from a CSV which has a column labeled "U.S. OSHA Recordable?". In that column every answer is either "yes" or "no". I want to display a plot.bar that shows "23 yes's" and "7 No's". Essentially adding up the total of "yes's" and "no's in the column, then displaying the total in 1 clean bar graphs. It will display 2 bars with the total number on top of both bars.... The problem is, the bar graph has a single line on the X axis right now and each line says "no, yes, no, yes, yes, no" about 27 individual times. I want the users to easily see 1 bar graph showing only 2 bars with the total on top like this image.
This is my code, I am not sure what i would need to sum up the Yes and No in the column.
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
data.head() # this will give the first row that you want it to read the header
data.plot.bar(x='U.S. OSHA Recordable?') #creates a plot in pandas
plt.show() # shows the plot to the user
df['Val'].value_counts().plot(kind='bar')
Here Val is the name of the column that contains 'Yes' & 'No'
import pandas as pd # powerful data visualization library
import numpy as np
import matplotlib.pyplot as plt # allows us to plot things
import csv # allows us to import and use CSV commands which are simple but effective
import seaborn as sns # it counts everything for you and outputs it exactly like I want
# This website saved my life https://www.pythonforengineers.com/introduction-to-pandas/
# use this to check the available styles: plt.style.available
data = pd.read_csv(r'C:\Users\rmond\Downloads\PS_csvFile.csv', encoding="ISO-8859-1", skiprows=6) #skiprows allows you to skip the comments on top... & ecoding allows pandas to work on this CSV
sns.set(style="whitegrid")
ax = sns.countplot(x='U.S. OSHA Recordable?', data=data)
plt.show() # shows the plot to the user
So interestingly enough I found out about "seaborn" I pip installed it and gave it a shot. It is supposed to pull data from a URL, but after viewing a few other pages on stack overflow I found a great suggestion. Anyways, this works great and it does everything for me. I am so happy with this solution. Now onto the next problem lol. I hope this helps someone else in the future.
My graph looks exactly like the one posted by SH-SF btw. Works great
I am not a statistician or anything like that. I am working on a project where I got an excel file and I need to replicate the same actions that are made in the file to an html table.
I got most of the file right but am stuck on a function called FDIST which as I tried to understand means the function probability distribution. Now I tried to look for something that does the same thing in python (because I am using django as the server side) I came across the scipy library which helped a lot in the other actions I needed to do, but still I can't find something that does what FDIST in excel do. I found a function f.pdf but turns out it is not the same.
Can someone suggest a way to get the same result?
thanks.
You can read this to know more about F distribtion in general.
If you use the parameters x = 2.510, dfn = 3, dfd = 48 in Excel, you get:
Note that FDIST is available for compatibility with Excel 2007 and earlier, and was replaced by F.DIST (with Cumulative = True)
Using scipy.stats you get the same results:
>>>from scipy.stats import f
>>>x = 2.510
>>>dfn = 3
>>>dfd = 48
>>>f.cdf (x, dfn, dfd)
0.930177201089
>>>1- f.cdf (x, dfn, dfd)
0.0698227989112
Hope this helps.
Must start that I am very new to Python and very bad at it still, but believe that it will be worth it to learn eventually.
My problem is that I have this device that prints out the values in a .txt but seperated by tabs instead of commas. Ex: 50\t50\t66\t0\t4...
And what I want is just plot a simple Histogram with that data.
I do realise that it should be the simplest thing but somehow I am having trouble with it finding a solution from my python nooby lectures nor can I really word this well enough to hit a search online.
import matplotlib.pyplot as plt
#import numpy as np
d = open('.txt', 'r')
d.read()
plt.hist(d)
plt.show()
PS: numpy is just a remainder from one of my previous exercises
No worries, everyone must start somewhere. You are on the right track, and are correct Python is a great language to learn. There are many was this can be accomplished, but here is one way. With the way this example written, it will generate one histogram graph per line in the file. You can modify or change that behavior if needed.
Please note that the CSV module will take care of converting the data in the file to floats by passing the quoting=csv.QUOTE_NONNUMERIC to the constructor of reader. This is probably the preferred method to handling number conversion in a CSV / TSV file.
import csv
import matplotlib.pyplot as plt
data_file = open('testme.txt')
tsv_reader = csv.reader(data_file, delimiter='\t',
quoting=csv.QUOTE_NONNUMERIC)
for row in tsv_reader:
plt.hist(row)
plt.show()
I've left out some things such as proper exception handling, and using a context manager to open to file as is best practice and demonstrated in the csv module documentation.
Once you learn more about the language, I'd suggest digging into those subjects further.
Assign the string result of read() to a variable s:
s = d.read()
split will break your string s into a list of strings:
s = s.split("\t")
map will apply a function to every element of a list:
s = map(float, s)
If you study csv you can handle the file with delimiter='\t' as one of the options. This will change the expected delimiter from ',' to '\t' (tab. All the examples that you study that use the ',' will be handled in the same way.