I have a list of string dataframes that I want to turn into a list of dataframes.
temp_df = {}
temp_df['mdf1'] = df1[df1['b']<=0.4]
temp_df['mdf2'] = df1[df1['a']<=0.58]
def get_list(temp_df):
return [*temp_df]
temp_list = get_list(temp_df)
temp_list
Doing this I get the stringed list:
output: ['mdf1', 'mdf2']
However, I also want a list of the two dataframes itself.
For the desirable output of:
output: [mdf1, mdf2]
I've tried this but it doesn't give me what I want:
temp_df.keys()
output: dict_keys(['mdf1', 'mdf2'])
Check with locals then
variables = locals()
variables["mdf1"] = df1[df1['b']<=0.4]
variables["mdf2"] = df1[df1['a']<=0.58]
Related
my dataframe look like this:
variations_list1 variations_list2
["yellow","ornage"] []
["xl","xxl"] []
["Burger","pizza"] ["$25","$30"]
expected dataframe:
variations_list1 variations_list2
["yellow","ornage"] ["yellow","ornage"] #filling emty list with current row data
["xl","xxl"] ["xl","xxl"]
["Burger","pizza"] ["$25","$30"]
You can just do
df.loc[~df['variations_list2'].astype(bool),'variations_list2'] = df['variations_list1']
You have the same issue like before, list is not list
df.loc[df['variations_list2']=='[]','variations_list2'] = df['variations_list1']
I want my code to:
read data from a CSV and make a dataframe: "source_df"
see if the dataframe contains any columns specified in a list:
"possible_columns"
call a unique function to replace the values in each column whose header is found in the "possible_columns" the list, then insert the modified values in a new dataframe: "destination_df"
Here it is:
import pandas as pd
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
#creates destination_df
blanklist = []
destination_df = pd.DataFrame(blanklist)
#create the column header lists for comparison in the while loop
columns = source_df.head(0)
possible_columns = ['yes/no','true/false']
#establish the functions list and define the functions to replace column values
fix_functions_list = ['yes_no_fix()','true_false_fix()']
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
'''use the counter to call a unique function from the function list to replace the values in each column whose header is found in the "possible_columns" the list, insert the modified values in "destination_df, then advance the counter'''
counter = 0
while counter < len(possible_columns):
if possible_columns[counter] in columns:
destination_df.insert(counter, possible_columns[counter], source_df[possible_columns[counter]])
fix_functions_list[counter]
counter = counter + 1
#see if it works
print(destination_df.head(10))
When I print(destination_df), I see the unmodified column values from source_df. When I call the functions independently they work, which makes me think something is going wrong in my while loop.
Your issue is that you are trying to call a function that is stored in a list as a string.
fix_functions_list[cnt]
This will not actually run the function just access the string value.
I would try and find another way to run these functions.
def yes_no_fix():
destination_df['yes/no'] = destination_df['yes/no fixed'].replace("No","0").replace("Yes","1")
def true_false_fix():
destination_df['true/false'] = destination_df['true/false fixed'].replace('False', '1').replace('True', '0')
fix_functions_list = {0:yes_no_fix,1:true_false_fix}
and change the function calling to like below
fix_functions_list[counter]()
#creates source_df
file = "yes-no-true-false.csv"
data = pd.read_csv(file)
source_df = pd.DataFrame(data)
possible_columns = ['yes/no','true/false']
mapping_dict={'yes/no':{"No":"0","Yes":"1"} ,'true/false': {'False':'1','True': '0'}
old_columns=[if column not in possible_columns for column in source_df.columns]
existed_columns=[if column in possible_columns for column in source_df.columns]
new_df=source_df[existed_columns]
for column in new_df.columns:
new_df[column].map(mapping_dict[column])
new_df[old_columns]=source_df[old_columns]
I want to create a dataframe that consists of values obtained inside the for loop.
columns = ['BIN','Date_of_registration', 'Tax','TaxName','KBK',
'KBKName','Paynum','Paytype', 'EntryType','Writeoffdate', 'Summa']
df = pd.DataFrame(columns=columns)
I have this for loop:
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
print("hello")
tax = elements.find('{http://xmlns.kztc-cits/sign}TaxOrgCode').text
tax_name_ru = elements.find('{http://xmlns.kztc-cits/sign}NameTaxRu').text
kbk = elements.find('{http://xmlns.kztc-cits/sign}KBK').text
kbk_name_ru = elements.find('{http://xmlns.kztc-cits/sign}KBKNameRu').text
paynum = elements.find('{http://xmlns.kztc-cits/sign}PayNum').text
paytype = elements.find('{http://xmlns.kztc-cits/sign}PayType').text
entry_type = elements.find('{http://xmlns.kztc-cits/sign}EntryType').text
writeoffdate = elements.find('{http://xmlns.kztc-cits/sign}WriteOffDate').text
summa = elements.find('{http://xmlns.kztc-cits/sign}Summa').text
print(tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype, entry_type, writeoffdate, summa)
How can I append acquired values to the initially created(outside for loop) dataframe?
A simple way if you only need the dataframe after the loop is completed is to append the data to a list of lists and then convert to a dataframe. Caveat: Responsibility is on you to make sure the list ordering matches the columns, so if you change your columns in the future you have to reposition the list.
list_of_rows = []
for elements in tree.findall('{http://xmlns.kztc-cits/sign}payment'):
list_of_rows.append([
tax, tax_name_ru, kbk, kbk_name_ru, paynum, paytype,entry_type, writeoffdate, summa])
df = pd.DataFrame(columns=columns, data=list_of_rows)
I have defined 10 different DataFrames A06_df, A07_df , etc, which picks up six different data point inputs in a daily time series for a number of years. To be able to work with them I need to do some formatting operations such as
A07_df=A07_df.fillna(0)
A07_df[A07_df < 0] = 0
A07_df.columns = col # col is defined
A07_df['oil']=A07_df['oil']*24
A07_df['water']=A07_df['water']*24
A07_df['gas']=A07_df['gas']*24
A07_df['water_inj']=0
A07_df['gas_inj']=0
A07_df=A07_df[['oil', 'water', 'gas','gaslift', 'water_inj', 'gas_inj', 'bhp', 'whp']]
etc for a few more formatting operations
Is there a nice way to have a for loop or something so I don’t have to write each operation for each dataframe A06_df, A07_df, A08.... etc?
As an example, I have tried
list=[A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
for i in list:
i=i.fillna(0)
But this does not do the trick.
Any help is appreciated
As i.fillna() returns a new object (an updated copy of your original dataframe), i=i.fillna(0) will update the content of ibut not of the list content A06_df, A07_df,....
I suggest you copy the updated content in a new list like this:
list_raw = [A06_df, A07_df, A08_df, A10_df, A11_df, A12_df, A13_df, A15_df, A18_df, A19_df]
list_updated = []
for i in list_raw:
i=i.fillna(0)
# More code here
list_updated.append(i)
To simplify your future processes I would recommend to use a dictionary of dataframes instead of a list of named variables.
dfs = {}
dfs['A0'] = ...
dfs['A1'] = ...
dfs_updated = {}
for k,i in dfs.items():
i=i.fillna(0)
# More code here
dfs_updated[k] = i
Hi I wrote some code that builds a default dictionary
def makedata(filename):
with open(filename, "r") as file:
for x in features:
previous = []
count = 0
for line in file:
var_name = x
regexp = re.compile(var_name + r'.*?([0-9.-]+)')
match = regexp.search(line)
if match and (match.group(1)) != previous:
previous = match.group(1)
count += 1
if count > wlength:
count = 1
target = str(str(count) + x)
dict.setdefault(target, []).append(match.group(1))
file.seek(0)
df = pd.DataFrame.from_dict(dict)
The dictionary looks good but when I try to convert to dataframe it is empty. I can't figure it out
dict:
{'1meanSignalLenght': ['0.5305184', '0.48961428', '0.47203177', '0.5177274'], '1amplCor': ['0.8780955002105448', '0.8634431017504487', '0.9381169983046714', '0.9407036427333355'], '1metr10.angle1': ['0.6439386643584522', '0.6555194964997434', '0.9512436169922103', '0.23789348400794422'], '1syncVar': ['0.1344131181025432', '0.08194580887223515', '0.15922251165913678', '0.28795644612520327'], '1linVelMagn': ['0.07062673289287498', '0.08792496681784517', '0.12603999663935528', '0.14791253129369603'], '1metr6.velSum': ['0.17850601560734558', '0.15855169971072014', '0.21396496345720045', '0.2739525279330513']}
df:
Empty DataFrame
Columns: []
Index: []
{}
I think part of your issue is that you are using the keyword 'dict', assuming it is a variable
make a dictionary in your function, call it something other than 'dict'. Have your function return that dictionary. Then when you make a dataframe use that return value. Right now, you are creating a data frame from an empty dictionary object.
df = pd.DataFrame(dict)
This should make a dataframe from the dictionary.
You can either pass a list of dicts simply using pd.DataFrame(list_of_dicts) (use pd.DataFrame([dict]) if your variable is not a list) or a dict of list using pd.DataFrame.from_dict(dict). In this last case dict should be something like dict = {a:[1,2,3], "b": ["a", "b", "c"], "c":...}.
see: Pandas Dataframe from dict with empty list value