Is there a way to put text in the top left box of a dataframe display? Does that field have a name? See below:
import pandas as pd
raw_data = {'Regiment': ['Nighthawks', 'Raptors'],
'Company': ['1st', '2nd'],
'preTestScore': [4, 24],
'postTestScore': [25, 94]}
pd.DataFrame(raw_data, columns = ['Regiment', 'Company', 'preTestScore', 'postTestScore']).set_index('Regiment')
Yes. That space is used for the name of the columns. It can be filled in by doing
df.columns.name = 'your name'
Related
Ive got a little issue while coding a script that takes a CSV string and is supposed to select a column name and value based on the input. The CSV string contains Names of NBA players, their Universities etc. Now when the input is "name" && "Andre Brown", it should search for those values in the given CSV string. I have a rough code laid out - but I am unsure on how to implement the where method. Any ideas?
import csv
import pandas as pd
import io
class MySelectQuery:
def __init__(self, table, columns, where):
self.table = table
self.columns = columns
self.where = where
def __str__(self):
return f"SELECT {self.columns} FROM {self.table} WHERE {self.where}"
csvString = "name,year_start,year_end,position,height,weight,birth_date,college\nAlaa Abdelnaby,1991,1995,F-C,6-10,240,'June 24, 1968',Duke University\nZaid Abdul-Aziz,1969,1978,C-F,6-9,235,'April 7, 1946',Iowa State University\nKareem Abdul-Jabbar,1970,1989,C,7-2,225,'April 16, 1947','University of California, Los Angeles\nMahmoud Abdul-Rauf,1991,2001,G,6-1,162,'March 9, 1969',Louisiana State University\n"
df = pd.read_csv(io.StringIO(csvString), error_bad_lines=False)
where = "name = 'Alaa Abdelnaby' AND year_start = 1991"
df = df.query(where)
print(df)
The CSV string is being transformed into a pandas Dataframe, which should then find the values based on the input - however I get the error "name 'where' not defined". I believe everything until the df = etc. part is correct, now I need help implementing the where method. (Ive seen one other solution on SO but wasnt able to understand or figure that out)
# importing pandas
import pandas as pd
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78]}
# create a dataframe
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream', 'Percentage'])
print("Given Dataframe :\n", dataframe)
options = ['Math', 'Science']
# selecting rows based on condition
rslt_df = dataframe[(dataframe['Age'] == 21) &
dataframe['Stream'].isin(options)]
print('\nResult dataframe :\n', rslt_df)
Output:
Source: https://www.geeksforgeeks.org/selecting-rows-in-pandas-dataframe-based-on-conditions/
Sometimes Googling does the trick ;)
You need the double = there. So should be:
where = "name == 'Alaa Abdelnaby' AND year_start == 1991"
I want to print a data frame as a png image, and followed the following approach.
import pandas as pd
import dataframe_image as dfi
data = {'Type': ['Type 1', 'Type 2', 'Type 3', 'Total'], 'Value': [20, 21, 19, 60]}
df = pd.DataFrame(data)
dfi.export(df, 'table.png')
I however want to also print a date stamp above the table on the image - with the intention of creating a series of images on consecutive days. If possible I would also like to format the table with a horizontal line indicating the summation of values for the final 'Total' row.
Is this possible with the above package? Or is there a better approach to do this?
You can add the line df.index.name = pd.Timestamp('now').replace(microsecond=0) to add the timestamp on the first row:
To add the line you can use .style.set_table_styles:
data = {'Type': ['Type 1', 'Type 2', 'Type 3'], 'Value': [20, 21, 19]}
df = pd.DataFrame(data)
df.index.name = pd.Timestamp('now').replace(microsecond=0)
df.loc[len(df)] = ['Total',df['Value'].sum()]
test = df.style.set_table_styles([{'selector' : '.row3','props' : [('border-top','3px solid black')]}])
dfi.export(test, 'table.png')
I want to convert this dict into a pandas dataframe where each key becomes a column and values in the list become the rows:
my_dict:
{'Last updated': ['2021-05-18T15:24:19.000Z', '2021-05-18T15:24:19.000Z'],
'Symbol': ['BTC', 'BNB', 'XRP', 'ADA', 'BUSD'],
'Name': ['Bitcoin', 'Binance Coin', 'XRP', 'Cardano', 'Binance USD'],
'Rank': [1, 3, 7, 4, 25],
}
The lists in my_dict can also have some missing values, which should appear as NaNs in dataframe.
This is how I'm currently trying to append it into my dataframe:
df = pd.DataFrame(columns = ['Last updated',
'Symbol',
'Name',
'Rank',]
df = df.append(my_dict, ignore_index=True)
#print(df)
df.to_excel(r'\walletframe.xlsx', index = False, header = True)
But my output only has a single row containing all the values.
The answer was pretty simple, instead of using
df = df.append(my_dict)
I used
df = pd.DataFrame.from_dict(my_dict).T
Which transposes the dataframe so it doesn't has any missing values for columns.
Credits to #Ank who helped me find the solution!
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 2 years ago.
I am beginner of programming language, so it would be appreciated you help and support.
Here is DataFrame and one column' data is JSON type? of data.
ID, Name, Information
1234, xxxx, '{'age': 25, 'gender': 'male'}'
2234, yyyy, '{'age': 34, 'gender': 'female'}'
3234, zzzz, '{'age': 55, 'gender': 'male'}'
I would like to covert this DataFrame as below.
ID, Name, age, gender
1234, xxxx, 25, male
2234, yyyy, 34, female
3234, zzzz, 55, male
I found that ast.literal_eval() can convert str to dict type, but I have no idea how to write code of this issue.
Would you please give some example of a code which can solve this issue?
Given test.csv
ID,Name,Information
1234,xxxx,"{'age': 25, 'gender': 'male'}"
2234,yyyy,"{'age': 34, 'gender': 'female'}"
3234,zzzz,"{'age': 55, 'gender': 'male'}"
Read the file in with pd.read_csv and use the converters parameter with ast.literal_eval, which will convert the data in the Information column from a str type to dict type.
Use pd.json_normalize to unpack the dict with keys as column headers and values in the rows
.join the normalized columns with df
.drop the Information column
import pandas as pd
from ast import literal_eval
df = pd.read_csv('test.csv', converters={'Information': literal_eval})
df = df.join(pd.json_normalize(df.Information))
df.drop(columns=['Information'], inplace=True)
# display(df)
ID Name age gender
0 1234 xxxx 25 male
1 2234 yyyy 34 female
2 3234 zzzz 55 male
If the data is not from a csv file
import pandas as pd
from ast import literal_eval
data = {'ID': [1234, 2234, 3234],
'Name': ['xxxx', 'yyyy', 'zzzz'],
'Information': ["{'age': 25, 'gender': 'male'}", "{'age': 34, 'gender': 'female'}", "{'age': 55, 'gender': 'male'}"]}
df = pd.DataFrame(data)
# apply literal_eval to Information
df.Information = df.Information.apply(literal_eval)
# normalize the Information column and join to df
df = df.join(pd.json_normalize(df.Information))
# drop the Information column
df.drop(columns=['Information'], inplace=True)
If third column is a JSON string, ' is not valid, it should be ", so we need to fix this.
If the third column is a string representation of python dict, you can use eval to convert it.
A sample of code to split third column of type dict and merge into the original DataFrame:
data = [
[1234, 'xxxx', "{'age': 25, 'gender': 'male'}"],
[2234, 'yyyy', "{'age': 34, 'gender': 'female'}"],
[3234, 'zzzz', "{'age': 55, 'gender': 'male'}"],
]
df = pd.DataFrame().from_dict(data)
df[2] = df[2].apply(lambda x: json.loads(x.replace("'", '"'))) # fix the data and convert to dict
merged = pd.concat([df[[0, 1]], df[2].apply(pd.Series)], axis=1)
I have a data frame with multiple columns and I want to select the subset of columns and remove the duplicate values from it.
I do not want to remove rows, Only want to remove particular column duplicate values.
My data frame looks like:
I want to remove duplicates from these columns ["PLACEMENT # NAME", "IMPRESSIONS","ENGAGEMENTS","DPEENEGAGEMENTS"], so my out will look like.
Here's some of your data
import pandas as pd
df = pd.DataFrame({'PLACEMENT # NAME': ['Blend of Vdx Display', 'Blend of Vdx Display',
'Blend of Vdx Display', 'Blend of Vdx Display'],
'PRODUCT': ['Display', 'Display', 'Mobile', 'Mobile'],
'VIDEONAME': ['Features', 'TVC', 'video1', 'video2'],
'COST_TYPE': ['CPE', 'CPE', 'CPE', 'CPE'],
'Views': [1255, 10479, 156, 20],
'50_pc_video': [388, 2402, 38, 10],
'75_pc_cideo_10': ['', '', '', ''],
'IMPRESSIONS': [778732,778732,778732,778732],
'ENGAGEMENTS': [13373, 13373, 13373, 13373],
'DPEENGAGEMENTS': [7142, 7142, 7142, 7142]})
You can accomplish what you want with .loc + .duplicated()
dup_cols = ['PLACEMENT # NAME', 'IMPRESSIONS', 'ENGAGEMENTS', 'DPEENGAGEMENTS']
df.loc[df.duplicated(dup_cols), dup_cols] = ''