Let's say I have the following data of a match in a CSV file:
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4
I'm writing a python program. Somewhere in my program I have scores collected for a match stored in a list, say x = [1,0,4]. I have found where in the data these scores exist using pandas and I can print "found" or "not found". However I want my code to print out to which name these scores correspond to. In this case the program should output "charlie" since charlie has all these values [1,0,4]. how can I do that?
I will have a large set of data so I must be able to tell which name corresponds to the numbers I pass to the program.
Yes, here's how to compare entire rows in a dataframe:
df[(df == x).all(axis=1)].index # where x is the pd.Series we're comparing to
Also, it makes life easiest if you directly set name as the index column when you read in the CSV.
import pandas as pd
from io import StringIO
df = """\
name,match1,match2,match3
Alice,2,4,3
Bob,2,3,4
Charlie,1,0,4"""
df = pd.read_csv(StringIO(df), index_col='name')
x = pd.Series({'match1':1, 'match2':0, 'match3':4})
Now you can see that doing df == x, or equivalently df.eq(x), is not quite what you want because it does element-wise compare and returns a row of True/False. So you need to aggregate those rows with .all(axis=1) which finds rows where all comparison results were True...
df.eq(x).all(axis=1)
df[ (df == x).all(axis=1) ]
# match1 match2 match3
# name
# Charlie 1 0 4
...and then finally since you only want the name of such rows:
df[ (df == x).all(axis=1) ].index
# Index(['Charlie'], dtype='object', name='name')
df[ (df == x).all(axis=1) ].index.tolist()
# ['Charlie']
which is what you wanted. (I only added the spaces inside the expression for clarity).
You need to use DataFrame.loc which would work like this:
print(df.loc[(df.match1 == 1) & (df.match2 == 0) & (df.match3 == 4), 'name'])
Maybe try something like this:
import pandas as pd
import numpy as np
# Makes sample data
match1 = np.array([2,2,1])
match2 = np.array([4,4,0])
match3 = np.array([3,3,4])
name = np.array(['Alice','Bob','Charlie'])
df = pd.DataFrame({'name': id, 'match1': match1, 'match2':match2, 'match3' :match3})
df
# example of the list you want to get the data from
x=[1,0,4]
#x=[2,4,3]
# should return the name Charlie as well as the index (based on the values in the list x)
df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])]
# Makes a new dataframe out of the above
mydf = pd.DataFrame(df['name'].loc[(df['match1'] == x[0]) & (df['match2'] == x[1]) & (df['match3'] ==x[2])])
# Loop that prints out the name based on the index of mydf
# Assuming there are more than one name, it will print all. if there is only one name, it will print only that)
for i in range(0,len(mydf)):
print(mydf['name'].iloc[i])
you can use this
here data is your Data frame ,you can change accordingly your data frame name,
and
considering [1,0,4] is int type
data = data[(data['match1']== 1)&(data['match2']==0)&(data['match3']== 4 ).index
print(data[0])
if data is object type then use this
data = data[(data['match1']== "1")&(data['match2']=="0")&(data['match3']== "4" ).index
print(data[0])
I am a very beginner in programming and trying to learn to code. so please bear with my bad coding. I am using pandas to find a string from a column (Combinations column in the below code ) in the data frame and print the entire row containing the string . Find the code below. Basically I need to find all the instances where the string occurs , and print the entire row .find my code below . I am not able to figure out how to find that particular instance of the column and print it .
import pandas as pd
data = pd.read_csv("signallervalues.csv",index_col=False)
data.head()
data['col1'] = data['col1'].astype(str)
data['col2'] = data['col2'].astype(str)
data['col3'] = data['col3'].astype(str)
data['col4'] = data['col4'].astype(str)
data['col5']= data['col5'].astype(str)
data.head()
combinations= data['Col1']+data['col2'] + data['col3'] + data['col4'] + data['col5']
data['combinations']= combinations
print(data.head())
list_of_combinations = data['combinations'].to_list()
print(list_of_combinations)
for i in list_of_combinations:
if data['combinations'].str.contains(i).any():
print(i+ 'data occurs in row' )
# I need to print the row containing the string here
else:
print(i +'is occuring only once')
my data frame looks like this
import pandas as pd
data=pd.DataFrame()
# recreating your data (more or less)
data['signaller']= pd.Series(['ciao', 'ciao', 'ciao'])
data['col6']= pd.Series(['-1-11-11', '11', '-1-11-11'])
list_of_combinations=['11', '-1-11-11']
data.reset_index(inplace=True)
# group by the values of column 6 and counting how many times they occur
g=data.groupby('col6')['index']
count= pd.DataFrame(g.count())
count=count.rename(columns={'index':'occurences'})
count.reset_index(inplace=True)
# create a df that keeps only the rows in the list 'list_of_combinations'
count[~count['col6'].isin(list_of_combinations)== False]
My result
I have dataframe like this.
import pandas as pd
#create dataframe
df= pd.DataFrame({"Date":range(0,22),
"Country":["USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA",],
"Number":[0,0,0,0,0,1,1,3,5,6,4,6,7,8,7,10,25,50,75,60,45,100]
"Number is Corrected":[0,0,0,0,0,1,1,3,5,6,6,6,7,7,7,10,25,50,50,60,60,100]})
But this dataframe is have a problem. Some numbers are wrong.
Previous number always has to be smaller than next number(6,4,6,,7,8,7...50,75,60,45,100)
I don't use df.sort because it's not about sorting it's about correction.
Edit: I added corrected numbers in "number is corrected" column.
guessing from your 'Number corrected' list, you could probably use this:
import pandas as pd
#create dataframe
df= pd.DataFrame({"Date":range(0,22),
"Country":["USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA","USA",],
"Number":[0,0,0,0,0,1,1,3,5,6,4,6,7,8,7,10,25,50,75,60,45,100]})
# "Number is Corrected":[0,0,0,0,0,1,1,3,5,6,6,6,7,7,7,10,25,50,50,60,60,100]})
def correction():
df['Number is Corrected'] = df['Number']
cache = 0
for num, content in enumerate(df['Number is Corrected'], start=0):
if(df['Number is Corrected'][num] < cache):
df['Number is Corrected'][num] = cache
else:
cache = df['Number is Corrected'][num]
print(df)
if __name__ == "__main__":
correction()
but there is some inconsistency, like your conversation with jezrael. Evtl. you'll need to update the logic of the code, if it gets clearer, what the output you wished. Good luck.
I am new to Python so please bear with me.
I am trying to convert what I think may be a nested dictionary into a csv that I can export. Below is my code:
import pandas as pd
import os
from fbprophet import Prophet
# Read in File
df1 = pd.read_csv('File_Path.csv')
#Create Loop to Forecast Multiple SKUs
def get_prediction(df):
prediction = {}
df1 = df.rename(columns={'Date': 'ds','qty_ordered': 'y', 'item_no': 'item'})
list_items = df1.item.unique()
for item in list_items:
item_df = df1.loc[df1['item'] == item]
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(yearly_seasonality= True, seasonality_prior_scale=1.0)
my_model.fit(item_df)
future_dates = my_model.make_future_dataframe(periods=12, freq='M')
forecast = my_model.predict(future_dates)
prediction[item] = forecast
return prediction
# Save predictions to dictionary
df2 = get_prediction(df1)
# Convert dictionary
df3 = pd.DataFrame.from_dict(df3, index='columns)
So the last part of the code is where I am struggling. I need to convert the df2 dictionary to a dataframe (df3) so I can export it to a csv. But it looks as if it is a nested dictionary? Not sure if I need to update my function or not.
This is what a snippet of the dictionary looks like
I need to export it so it will look like this
Any help would be greatly appreciated!
The following code should help flattening df2 (dictionary of dataframes if I understand correctly).
def flatten(dict_of_df):
# insert column 'item'
for key, value in dict_of_df.items():
value['item'] = key
# return vertically concatenated dataframe with all the items
return pd.concat(dict_of_df.values())
I have a pandas file with 3 different columns that I turn into a dictionary with to_dict, the result is a list of dictionaries:
df = [
{'HEADER1': 'col1-row1', 'HEADER2: 'col2-row1', 'HEADER3': 'col3-row1'},
{'HEADER1': 'col1-row2', 'HEADER2: 'col2-row2', 'HEADER3': 'col3-row2'}
]
Now my problem is that I need the value of 'col2-rowX' and 'col3-rowX' to build an URL and use requests and bs4 to scrape the websties.
I need my result to be something like the following:
requests.get("'http://www.website.com/' + row1-col2 + 'another-string' + row1-col3 + 'another-string'")
And i need to do that for every dictionary in the list.
I have tried iterating over the dictionaries using for-loops.
something like:
import pandas as pd
import os
os.chdir('C://Users/myuser/Desktop')
df = pd.DataFrame.from_csv('C://Users/myuser/Downloads/export.csv')
#Remove 'Code' column
df = df.drop('Code', axis=1)
#Remove 'Code2' as index
df = df.reset_index()
#Rename columns for easier manipulation
df.columns = ['CB', 'FC', 'PO']
#Convert to dictionary for easy URL iteration and creation
df = df.to_dict('records')
for row in df:
for key in row:
print(key)
You only ever iterate twice, and short-circuit out of the nested for loop every time it is executed by having a return statement there. Looking up the necessary information from the dictionary will allow you to build up your url's. One possible example:
def get_urls(l_d):
l=[]
for d in l_d:
l.append('http://www.website.com/' + d['HEADER2'] + 'another-string' + d['HEADER3'] + 'another-string')
return l
df = [{'HEADER1': 'col1-row1', 'HEADER2': 'col2-row1', 'HEADER3': 'col3-row1'},{'HEADER1': 'col1-row2', 'HEADER2': 'col2-row2', 'HEADER3': 'col3-row2'}]
print get_urls(df)
>>> ['http://www.website.com/col2-row1another-stringcol3-row1another-string', 'http://www.website.com/col2-row2another-stringcol3-row2another-string']