pandas how to replace only empty list data?

pandas how to replace only empty list data? - python

my dataframe look like this:
variations_list1 variations_list2
["yellow","ornage"] []
["xl","xxl"] []
["Burger","pizza"] ["$25","$30"]
expected dataframe:
variations_list1 variations_list2
["yellow","ornage"] ["yellow","ornage"] #filling emty list with current row data
["xl","xxl"] ["xl","xxl"]
["Burger","pizza"] ["$25","$30"]

You can just do
df.loc[~df['variations_list2'].astype(bool),'variations_list2'] = df['variations_list1']
You have the same issue like before, list is not list
df.loc[df['variations_list2']=='[]','variations_list2'] = df['variations_list1']

Related

pandas: while loop to simultaneously advance through multiple lists and call functions

I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.

Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.

def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()

#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]

How to create a dataframe in the for loop?

I want to create a dataframe that consists of values obtained inside the for loop.
columns = ['BIN','Date_of_registration', 'Tax','TaxName','KBK',
'KBKName','Paynum','Paytype', 'EntryType','Writeoffdate', 'Summa']
df = pd.DataFrame(columns=columns)
I have this for loop:
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
print("hello")
tax = elements.find('{http://xmlns.kztc-cits/sign}TaxOrgCode').text
tax_name_ru = elements.find('{http://xmlns.kztc-cits/sign}NameTaxRu').text
kbk = elements.find('{http://xmlns.kztc-cits/sign}KBK').text
kbk_name_ru = elements.find('{http://xmlns.kztc-cits/sign}KBKNameRu').text
paynum = elements.find('{http://xmlns.kztc-cits/sign}PayNum').text
paytype = elements.find('{http://xmlns.kztc-cits/sign}PayType').text
entry_type = elements.find('{http://xmlns.kztc-cits/sign}EntryType').text
writeoffdate = elements.find('{http://xmlns.kztc-cits/sign}WriteOffDate').text
summa = elements.find('{http://xmlns.kztc-cits/sign}Summa').text
print(tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype, entry_type, writeoffdate, summa)
How can I append acquired values to the initially created(outside for loop) dataframe?

A simple way if you only need the dataframe after the loop is completed is to append the data to a list of lists and then convert to a dataframe. Caveat: Responsibility is on you to make sure the list ordering matches the columns, so if you change your columns in the future you have to reposition the list.
list_of_rows = []
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
list_of_rows.append([
tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype,entry_type, writeoffdate, summa])
df = pd.DataFrame(columns=columns, data=list_of_rows)

How to grab all items from all DataFrames in Pandas

I have this issue where all the rows in mt Dataframe contain more than one item. I would like to iterate throughout the whole Dataframe and append each row item into a new list but I'm unsure on how to do this as of now.
IPs
0 [172.16.254.1, 192.168.1.15, 255.255.255.0]
1 [192.0.2.1, 255.255.255.0, 192.0.2.1]
2 [172.16.254.1]
3 [0.0.0.0]
This is my current output - and I would like to take each item per row in the Dataframe and append to a list
curled_ips_list = []
ip_addresses_found = []
ip_address_format = (r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b')
with open(website_file_path, 'r', encoding='utf-8-sig') as curled_ips_file:
found_ips_reader = pd.read_csv(curled_ips_file, names=['IPs'], delimiter='\n', quoting=csv.QUOTE_NONE, engine='c')
found_ips_reader = pd.Series(found_ips_reader['IPs'])
curled_ips_list = found_ips_reader[found_ips_reader.str.contains(ip_address_format)]
curled_ips_list = curled_ips_list.str.findall(ip_address_format)
curled_ips_list = pd.DataFrame(curled_ips_list)
curled_ips_file.close()
Not receiving any error messages as of yet, but unsure on how to go about it

Since you have not mentioned your output that you need, I am assuming you need the following.
#Load your IPs in a dataframe as the one that you have mentioned above.
iplist = df['IPs']
[ip for sublist in iplist for ip in sublist]
['172.16.254.1',
'192.168.1.15',
'255.255.255.0',
'192.0.2.1',
'255.255.255.0',
'192.0.2.1',
'172.16.254.1',
'0.0.0.0']

dataframe from dict resulting in empty dataframe

Hi I wrote some code that builds a default dictionary
def makedata(filename):
with open(filename, "r") as file:
for x in features:
previous = []
count = 0
for line in file:
var_name = x
regexp = re.compile(var_name + r'.*?([0-9.-]+)')
match = regexp.search(line)
if match and (match.group(1)) != previous:
previous = match.group(1)
count += 1
if count > wlength:
count = 1
target = str(str(count) + x)
dict.setdefault(target, []).append(match.group(1))
file.seek(0)
df = pd.DataFrame.from_dict(dict)
The dictionary looks good but when I try to convert to dataframe it is empty. I can't figure it out
dict:
{'1meanSignalLenght': ['0.5305184', '0.48961428', '0.47203177', '0.5177274'], '1amplCor': ['0.8780955002105448', '0.8634431017504487', '0.9381169983046714', '0.9407036427333355'], '1metr10.angle1': ['0.6439386643584522', '0.6555194964997434', '0.9512436169922103', '0.23789348400794422'], '1syncVar': ['0.1344131181025432', '0.08194580887223515', '0.15922251165913678', '0.28795644612520327'], '1linVelMagn': ['0.07062673289287498', '0.08792496681784517', '0.12603999663935528', '0.14791253129369603'], '1metr6.velSum': ['0.17850601560734558', '0.15855169971072014', '0.21396496345720045', '0.2739525279330513']}
df:
Empty DataFrame
Columns: []
Index: []
{}

I think part of your issue is that you are using the keyword 'dict', assuming it is a variable
make a dictionary in your function, call it something other than 'dict'. Have your function return that dictionary. Then when you make a dataframe use that return value. Right now, you are creating a data frame from an empty dictionary object.

df = pd.DataFrame(dict)
This should make a dataframe from the dictionary.

You can either pass a list of dicts simply using pd.DataFrame(list_of_dicts) (use pd.DataFrame([dict]) if your variable is not a list) or a dict of list using pd.DataFrame.from_dict(dict). In this last case dict should be something like dict = {a:[1,2,3], "b": ["a", "b", "c"], "c":...}.
see: Pandas Dataframe from dict with empty list value

Why does square brackets appear in Pandas DataFrame only when I append to list?

I have a loop that extracts some info from several files, each containing a list of actions taken in a game. If I run the following code, I get a dataframe with the correct output (the GameId without brackets).
GameId = ['23456']
dfWant = pd.DataFrame({'GameId': GameId})
print(dfWant)
But If I loop through the files and try to add each item to a list, the resulting dataframe contains square brackets. Why does this output differ? And how can I make the code below output the GameId without square brackets in the DataFrame?
GameId = ['12345']
GameIdList = []
GameIdList.append(Game)
dfHave = pd.DataFrame({'GameId': GameIdList})
print(dfHave)

This works great on python 3.7 :

The follwing solved my problem.
GameId = ['12345']
GameIdList = []
GameIdList.extend(Game)
a = {'GameId': GameIdList}
dfHave = pd.DataFrame.from_dict(a, orient = 'index')
dfHave = dfHave.transpose()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas how to replace only empty list data? - python

You can just do df.loc[~df['variations_list2'].astype(bool),'variations_list2'] = df['variations_list1'] You have the same issue like before, list is not list df.loc[df['variations_list2']=='[]','variations_list2'] = df['variations_list1']

Related

pandas: while loop to simultaneously advance through multiple lists and call functions

How to create a dataframe in the for loop?

How to grab all items from all DataFrames in Pandas

dataframe from dict resulting in empty dataframe

Why does square brackets appear in Pandas DataFrame only when I append to list?

Categories

Resources