Unable to get Value against the key in the dictionary python - python

this is my dictionary with dataI have two dataframes A & B having one column in common of type string (str) i.e. country_name
I converted dataframe B into a dictionary object which has only two columns of string type. that means both key and value columns are of string type.
My task is to find the value based on the key from the dataframe B. The key belongs to dataframe A since I have a common column.
Here is my code...
I have tried multiple options but nothing works for me like
name = count_list.get(key, "")
name = count_list['Value'][key]
A= pd.DataFrame(columns:{'index','name','score'})
B= pd.DataFrame(columns:{'name','Value'})
B= B.to_dict()
for index, row in A.iterrows():
try:
key = str(row['name']).lower()
name = B.get(key, "")
score1.append(row['score'])
country_list.append(name)
except TypeError:
print(row)
except IndexError:
print(row)
I want to get the exact value against a key in the data frame B.
both key and Value columns are of string type.

Related

How to check pandas column names and then append to row data efficiently?

I have a dataframe with several columns, some of which have names that match the keys in a dictionary. I want to append the value of the items in the dictionary to the non null values of the column whos name matches the key in said dictionary. Hopefully that isn't too confusing.
example:
realms = {}
realms['email'] = '<email>'
realms['android'] = '<androidID>'
df = pd.DataFrame()
df['email'] = ['foo#gmail.com','',foo#yahoo.com]
df['android'] = [1234567,,55533321]
how could I could I append '<email>' to 'foo#gmail.com' and 'foo#yahoo.com'
without appending to the empty string or null value too?
I'm trying to do this without using iteritems(), as I have about 200,000 records to apply this logic to.
expected output would be like 'foo#gmail.com<email>',,'foo#yahoo.com<email>'
for column in df.columns:
df[column] = df[column].astype(str) + realms[column]
>>> df
email android
0 foo#gmail.com<email> 1234567<androidID>
1 foo#yahoo.com<email> 55533321<androidID>

Split a column of a dataframe into two separate columns

I'd like to split a column of a dataframe into two separate columns. Here is how my dataframe looks like (only the first 3 rows):
I'd like to split the column referenced_tweets into two columns: type and id in a way that for example, for the first row, the value of the type column would be replied_to and the value of id would be 1253050942716551168.
Here is what I've tried:
df[['type', 'id']] = df['referenced_tweets'].str.split(',', n=1, expand=True)
but I get the error:
ValueError: Columns must be the same length as key
(I think I get this error because the type in the referenced_tweets column is NOT always replied_to (e.g., it can be retweeted, and therefore, the lengths would be different)
Why not get the values from the dict and add it two new columns?
def unpack_column(df_series, key):
""" Function that unpacks the key value of your column and skips NaN values """
return [None if pd.isna(value) else value[0][key] for value in df_series]
df['type'] = unpack_column(df['referenced_tweets'], 'type')
df['id'] = unpack_column(df['referenced_tweets'], 'id')
or in a one-liner:
df[['type', 'id']] = df['referenced_tweets'].apply(lambda x: (x[0]['type'], x[0]['id']))

Store Value From df to Variable

I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1

Selecting Pandas dataframe column

I am trying to use pandas data-frame as a parameter table which is loaded in the beginning of my application run.
Structure of the csv that is being loaded into the data-frame is as below :
param_name,param_value
source_dir,C:\Users\atiwari\Desktop\EDIFACT\source_dir
So the column names would be param_name and param_values.
How do i go about selecting the value from param_value where param_name == 'source_dir'?
I tried the below but it returns a data-frame with index not a string value:
param_df.loc[param_df['param_name']=='source_dir']['param_value']
It return Series:
s = param_df.loc[param_df['param_name']=='source_dir', 'param_value']
But if need DataFrame:
df = param_df.loc[param_df['param_name']=='source_dir', ['param_value']]
For scalar need convert Series by selecting by [] - select first value by 0. Also works iat.
Series.item need Series with values else get error if empty Series:
val = s.values[0]
val = s.iat[0]
val = s.item()

selecting a specific value from a data frame

I am trying to select a value from a dataframe. But the problem is the output is with data type and column name.
Here is my data frame which i am reading from a csv file,
Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2
And here is my testing code-
dataFrame=pd.read_csv("test.csv")
value = dataFrame.loc[dataFrame['Name'] == 'rasberry']['Code']
print(value)
strvalue=str(value)
if(strvalue=="1"):
print("got it")
The expected ouput of value would be 1 but it is
2 1\nName: Code, dtype: int64
and that's why the if condition is not working. How can I get the specific value?
I am using pandas
The value you get is a Series object. You can use .iloc to extract the value from it:
value.iloc[0]
# 1
Or you can use .values to extract the underlying numpy array and then use index to extract the value:
value.values[0]
# 1
Break It Down
dataFrame['Name'] returns a pd.Series
dataFrame['Name'] == 'rasberry' returns a pd.Series with dtype bool
dataFrame.loc[dataFrame['Name'] == 'rasberry'] uses the boolean pd.Series to slice dataFrame returning a pd.DataFrame that is a subset of dataFrame
dataFrame.loc[dataFrame['Name'] == 'rasberry']['code'] is a pd.Series that is the column named 'code' in the sliced dataframe from step 3.
If you expect the elements in the 'Name' column to be unique, then this will be a one row pd.Series.
You want the element inside but at this point it's the difference between 'value' and ['value']
Setup
from io import StringIO
txt = """Name,Code
blackberry,1
wineberry,2
rasberry,1
blueberry,1
mulberry,2"""
Solution(s)
use iloc to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iloc[0]
print(value)
use iat to grab first value
dataFrame=pd.read_csv(StringIO(txt))
value = dataFrame.query('Name == "rasberry"').Code.iat[0]
print(value)
specify index column when reading in csv and use loc
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.loc['rasberry', 'Code']
print(value)
specify index column when reading in csv and use at
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.at['rasberry', 'Code']
print(value)
specify index column when reading in csv and use get_value
dataFrame=pd.read_csv(StringIO(txt), index_col='Name')
value = dataFrame.get_value('rasberry', 'Code')
print(value)
specify the index column when reading the csv and squeeze into a series if only one non index column exists
series=pd.read_csv(StringIO(txt), index_col='Name', squeeze=True)
value = series.rasberry
print(value)

Categories

Resources