Print beauty dataframe in Jupyter Notebook - python

I would like to beautify my print output for a dataframe.
d = {'Carid ': [1, 2, 3], 'Carname': ['Mercedes-Benz', 'Audi', 'BMW'], 'model': ['S-Klasse AMG 63s', 'S6', 'X6 M-Power']}
df = pd.DataFrame(data=d)
print(df.head())
df.head()
As you can see the print-outpot not beauty. The last statement of df.head() is beauty.
Is there any option to get the same result in the print-statement in Jupyter Notebook?

Use display instead of print.
display(df.head())

Sure there is. You'll want to use Jupyter's built-in display, which you'll need to import first:
from IPython.display import display
Then, instead of print, use display:
display(df.head())
Source: https://tdhopper.com/blog/printing-pandas-data-frames-as-html-in-jupyter-notebooks

Related

qgrid not showing output Python

I am trying to run the below mentioned query. It was also executed successfully but is not showing any kind of output. I am totally clueless about the same thing.
Why is this happening?
import pandas as pd
import qgrid
df = pd.DataFrame({'A': [1.2, 'foo', 4], 'B': [3, 4, 5]})
df = df.set_index(pd.Index(['bar', 7, 3.2]))
view = qgrid.show_grid(df, grid_options={'fullWidthRows': True}, show_toolbar=True)
view
also see My attached Screen shot for the same.

Python Pandas .str.extract method fails when indexing

I'd like to set values on a slice of a DataFrame using .loc using pandas str extract method .str.extract() however, it's not working due to indexing errors. This code works perfectly if I swap extract with contains.
Here is a sample frame:
import pandas as pd
df = pd.DataFrame(
{
'name': [
'JUNK-0003426', 'TEST-0003435', 'JUNK-0003432', 'TEST-0003433', 'TEST-0003436',
],
'value': [
'Junk', 'None', 'Junk', 'None', 'None',
]
}
)
Here is my code:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)")
How can I set the None values to the extracted regex string
Hmm the problem seems to be that .str.extract returns a pd.DataFrame, you can .squeeze it to turn it into a series and it seems to work fine:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)").squeeze()
indexing alignment takes care of the rest.
Instead of trying to get the group, you can replace the rest with the empty string:
df.loc[df['value']=='None', 'value'] = df.loc[df['value']=='None', 'name'].str.replace('TEST-\d{3}', '')
Was this answer helpful to your problem?
Here is a way to do it:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)").loc[:,0]
Output:
name value
0 JUNK-0003426 Junk
1 TEST-0003435 3435
2 JUNK-0003432 Junk
3 TEST-0003433 3433
4 TEST-0003436 3436

How to convert datatype of the columns?

I picked up part of the code from here and expanded a bit. However, I am not able to convert the datatypes of Basket & Count columns for further processing.
for e.g., Basket and Count columns are int64, I would like to change them to float64.
import ipywidgets as widgets
from IPython.display import display, clear_output
# creating a DataFrame
df = pd.DataFrame({'Basket': [1, 2, 3],
'Name': ['Apple', 'Orange',
'Count'],
'id': [111, 222,
333]})
vardict = df.columns
select_variable = widgets.Dropdown(
options=vardict,
value=vardict[0],
description='Select variable:',
disabled=False,
button_style=''
)
def get_and_plot(b):
clear_output
s = select_variable.value
col_dtype = df[s].dtypes
print(col_dtype)
display(select_variable)
select_variable.observe(get_and_plot, names='value')
Thanks in advance.

Extract the country zip code based on the full country code - DataFrame Python

I have in my data information about place as full post code for example CZ25145. I would like to create new column for this with value CZ. How to do this?
I have this:
import pandas as pd
df = pd.DataFrame({
'CODE_LOAD_PLACE' : ['PL43100', 'CZ25905', 'DE29333', 'DE29384', 'SK92832']
},)
I would like to get it like below:
df = pd.DataFrame({
'CODE_LOAD_PLACE' : ['PL43100', 'CZ25905', 'DE29333', 'DE29384', 'SK92832'],
'COUNTRY_LOAD_PLACE' : ['PL', 'CZ', 'DE', 'DE', 'SK']
},)
I try use .factorize and .groupby but no positive final effect.
Use .str and select the first 2 characters:
df["COUNTRY_LOAD_PLACE"] = df["CODE_LOAD_PLACE"].str[:2]

Extract data from specific format in Pandas DF

I have a raw data in csv format which looks like this:
product-name brand-name rating
["Whole Wheat"] ["bb Royal"] ["4.1"]
Expected output:
product-name brand-name rating
Whole Wheat bb Royal 4.1
I want this to affect every entry in my dataset. I have 10,000 rows of data. How can I do this using pandas?
Can we do this using regular expressions? Not sure how to do it.
Thank you.
Edit 1:
My data looks something like this:
df = {
'product-name': [
[""'Whole Wheat'""], [""'Milk'""] ],
'brand-name': [
[""'bb Royal'""], [""'XYZ'""] ],
'rating': [
[""'4.1'""], [""'4.0'""] ]
}
df_p = pd.DataFrame(data=df)
It outputs like this: ["bb Royal"]
PS: Apologies for my programming. I am quite new to programming and also to this community. I really appreciate your help here :)
IIUC select first values of lists:
df = df.apply(lambda x: x.str[0])
Or if values are strings:
df = df.replace('[\[\]]', '', regex=True)
You can use the explode function
df = df.apply(pd.Series.explode)

Categories

Resources