Creating a sequence of dataframes - python

a quick question.
I want to know if there is a way to create a sequence of data frames, by setting a variable inside the name of a data frame. For example:
df_0 = pd.read_csv(file1, sep =',')
b=0
x=1
while (b == 0):
df_+str(x) = pd.merge(df_+str(x-1) , Source, left_on='R_Key', right_on = 'S_Key', how='inner')
if Final_+str(x).empty != 'True':
x = x + 1
else:
b = b + 1
Now when executed, this returns "can't assign to operator" for df_+str(x). Any idea how to fix this?

This is the right time to use a list (a sequence type in Python), so you can refer to exactly as many data frames as you need.
dfs = []
dfs.append(pd.read_csv(file1, sep =',')) # It is now dfs[0]
b=0
x=1
while (b == 0):
dfs.append(pd.merge(dfs[x-1],
Source, left_on='R_Key',
right_on = 'S_Key', how='inner'))
if Final[x].empty != 'True':
x = x + 1
else:
b = b + 1
Now, you never define Final. You'll need to use the same trick there.

Not sure why you want to do this, but I think a clearer and more logical way is just to create a dictionary with dataframe name strings as keys and your generated dataframes as values?

Related

Apply custom function to entire dataframe

I have a function which call another one.
The objective is, by calling function get_substr to extract a substring based on a position of the nth occurence of a character
def find_nth(string, char, n):
start = string.find(char)
while start >= 0 and n > 1:
start = string.find(char, start+len(char))
n -= 1
return start
def get_substr(string,char,n):
if n == 1:
return string[0:find_nth(string,char,n)]
else:
return string[find_nth(string,char,n-1)+len(char):find_nth(string,char,n)]
The function works.
Now I want to apply it on a dataframe by doing this.
df_g['F'] = df_g.apply(lambda x: get_substr(x['EQ'],'-',1))
I get on error:
KeyError: 'EQ'
I don't understand it as df_g['EQ'] exists.
Can you help me?
Thanks
You forgot about axis=1, without that function is applied to each column rather than each row. Consider simple example
import pandas as pd
df = pd.DataFrame({'A':[1,2],'B':[3,4]})
df['Z'] = df.apply(lambda x:x['A']*100,axis=1)
print(df)
output
A B Z
0 1 3 100
1 2 4 200
As side note if you are working with value from single column you might use pandas.Series.apply rather than pandas.DataFrame.apply, in above example it would mean
df['Z'] = df['A'].apply(lambda x:x*100)
in place of
df['Z'] = df.apply(lambda x:x['A']*100,axis=1)

How to count number of entries in particular column

I have an excel file with thousands of entries. I want to count the number of entries in the first column.
import csv
with open('data.csv') as f:
reader = csv.reader(f)
annotated_data = [r for r in reader]
so Now I want to count the entries, I tried doing:
a = 0
b = 0
c = 0
d = 0
e = 0
for i in annotated_data:
if annotated_data[0][i] == A:
a=a+1
if annotated_data[0][i] == B:
b=b+1
if annotated_data[0][i] == C:
//continue until E
print("Total number of A:" +a ) //continue until E
But it told me "list indices must be integers or slices, not list". so I tried doing
for i in range(annotated_data)
and it told me "'list' object cannot be interpreted as an integer"
im not sure what else to do, any help appreciated
Iterating through a list gives you items in the list, not their indices.
So, do this:
for row in annotated_data:
first_cell = row[0]
If you really wanted to have the indices, you would have to pass a number to range, rather than the list, i.e.:
range(len(annotated_data))
But I would not recommend doing that. It only makes things slower, less readible, and not compatible with all container types.
If you really needed both indices and items, you could do this:
for row_number, row in enumerate(annotated_data):
first_cell = row[0]
As a quick fix, you may want to try
if i[0] == A:
a += 1
etc. Or if you are looking for the literal string 'A', then:
if i[0] == 'A':
Install pandas using pip install pandas
Then, you could do something like this.
import pandas as pd
df = pd.read_csv('path to file.csv')
print(len(df) + 1)

Fill pandas data frame using .append()

I have a dataframe with a column containing comma separated strings. What I want to do is separate them by comma, count them and append the counted number to a new data frame. If the column contains a list with only one element, I want to differentiate wheather it is a string or an integer. If it is an integer, I want to append the value 0 in that row to the new df.
My code looks as follows:
def decide(dataframe):
df=pd.DataFrame()
for liste in DataFrameX['Column']:
x=liste.split(',')
if len(x) > 1:
df.append(pd.Series([len(x)]), ignore_index=True)
else:
#check if element in list is int
for i in x:
try:
int(i)
print i
x = []
df.append(pd.Series([int(len(x))]), ignore_index=True)
except:
print i
x = [1]
df.append(pd.Series([len(x)]), ignore_index=True)
return df
The Input data look like this:
C1
0 a,b,c
1 0
2 a
3 ab,x,j
If I now run the function with my original dataframe as input, it returns an empty dataframe. Through the print statement in the try/except statements I could see that everything works. The problem is appending the resulting values to the new dataframe. What do I have to change in my code? If possible, please do not give an entire different solution, but tell me what I am doing wrong in my code so I can learn.
******************UPDATE************************************
I edited the code so that it can be called as lambda function. It looks like this now:
def decide(x):
For liste in DataFrameX['Column']:
x=liste.split(',')
if len(x) > 1:
x = len(x)
print x
else:
#check if element in list is int
for i in x:
try:
int(i)
x = []
x = len(x)
print x
except:
x = [1]
x = len(x)
print x
And I call it like this:
df['Count']=df['C1'].apply(lambda x: decide(x))
It prints the right values, but the new column only contains None.
Any ideas why?
This is a good start, it could be simplified, but I think it works as expected.
#I have a dataframe with a column containing comma separated strings.
df = pd.DataFrame({'data': ['apple, peach', 'banana, peach, peach, cherry','peach','0']})
# What I want to do is separate them by comma, count them and append the counted number to a new data frame.
df['data'] = df['data'].str.split(',')
df['count'] = df['data'].apply(lambda row: len(row))
# If the column contains a list with only one element
df['first'] = df['data'].apply(lambda row: row[0])
# I want to differentiate wheather it is a string or an integer
df['first'] = pd.to_numeric(df['first'], errors='coerce')
# if the element in x is an integer, len(x) should be set to zero
df.loc[pd.notnull(df['first']), 'count'] = 0
# Dropping temp column
df.drop('first', 1, inplace=True)
df
data count
0 [apple, peach] 2
1 [banana, peach, peach, cherry] 4
2 [peach] 1
3 [0] 0

In python, how would I go about linking user input to a index in a list?

I would like to create a binary puzzle with python.
At the moment I already made a 6x6, 8x8 and 10x10 layout which is shown based on the difficulty that the players wishes to play. The purpose of the puzzle can be compared with a game of sudoku, you want to input either 0 or 1 on a given location by the player. Below you will find what I currently have for the layout.
if graad == 1:
easy = [['A', 'B', 'C', 'D', 'E'],
['_','_','_','_','_','_','_'],
[0,1,0,1,0,1,' |1'],
[1,0,1,0,1,0,' |2'],
[0,1,0,1,0,1,' |3'],
[1,0,1,0,1,0,' |4'],
[0,1,0,1,0,1,' |5'],
[1,0,1,0,1,0,' |6']]
i = 0
while i < len(easy):
j = 0
s = ""
while j < len(easy[i]):
s = s + str(easy[i][j]) + " "
j = j + 1
print (s)
i = i + 1
Now the problem that I am facing is, how can I let python know that when a player fills in position 3 on column C and row 5 with a 0 for example?
I was thinking of an IF statement that checks the input on either a A, B, C D, E... Row 1,2,3,4,5.. but that is going to be a lot of if statements.
Edit1: Ok so to clarify.I wanted to post a picture but need more posts.
For example, I have a game board of 6x6 cells. Some of them are filled with a 1 and some of them are filled with 0 and most of them are empty because the goal is to have it look in the end like my layout in the python code.(That's the solution). So you want the user to fill in those empty cells.
Now, let's say that the player wants to fill in A-1 with a 1, how will python know that input A-1 is linked to index [0][0] in the list?
A simple way to convert your letter indices to numbers is to use the ord() function, which returns the numerical code of a single character. Since you are using upper-case letters, with 'A' being the label for the column with index 0, you can do
column = ord(letter) - ord('A')
That will convert 'A' to 0, 'B' to 1, etc.
Here's a short example program vaguely based on the code on your question.
It accepts moves in the form A10 to set location A1 to '1', 'B30' to set location B3 to '0'. It accepts lower case letters, too, so 'd11' is the same as 'D11'. Hit Ctrl-C to exit.
Tested on Python 2.6.6, but it should work correctly on Python 3. (To run it on Python 2, change input() to raw_input()).
#! /usr/bin/env python
def print_grid(g):
gsize = len(g)
base = ord('A')
print(' '.join([chr(base + i) for i in range(gsize)]))
print((gsize * 2) * '-')
for i, row in enumerate(g, 1):
print(' '.join(row) + ' | ' + str(i))
print('\n')
def main():
gsize = 9
rowstr = gsize * '_'
grid = [list(rowstr) for i in range(gsize)]
print_grid(grid)
while True:
move = input('Enter move: ')
letter, number, bit = move.strip()
col = ord(letter.upper()) - ord('A')
row = int(number) - 1
grid[row][col] = bit
print_grid(grid)
if __name__ == "__main__":
main()
If you work with a pandas DataFrame to hold your correct answer of the game you can easily check things. The pandas package has a good documentation (and a lot of Q&A here on stackoverflow).
The setup of your correct answer:
import pandas as pd
data = [[0,1,0,1,0,1],
[1,0,1,0,1,0],
[0,1,0,1,0,1],
[1,0,1,0,1,0],
[0,1,0,1,0,1],
[1,0,1,0,1,0]]
easy = pd.DataFrame(data)
easy.columns = ['A','B','C','D','E','F']
print easy
The item at position 'A',0 (python starts to number from 0) is given by easy['A'][0]. For more information about indexing a pandas DataFrame object visit the documentation.
Another usefull thing, a DataFrame object is printable, making it unnecessary to write a print command yourself.
If using DataFrames is overkill for you, another option is to work with a 'translation' dictionary. This dictionary will use the letters for keys and the corresponding column number as a value.
>>> column = {'A':0, 'B':1, 'C':2, 'D':3, 'E':4, 'F':5}
>>> print column['A']
0

Pandas append filtered row to another DataFrame

I have 2 pandas data frames df and df_min. I apply some filters to df, which results in a single row of data, and I'd like to append that row to df_min. I tried using a loop to traverse df, and tried using loc to append the row to df_min. I keep getting a Incompatible indexer with DataFrame ValueError for the line where I use loc. I guess I am not using loc correctly. What would be the best way to accomplish what I am trying to do?
i = 0
for elem in vehicles:
for state in limit_states:
a = df[(df.VEHICLE == elem) & (df.LIMIT_STATE == state)]
df_min.loc[i] = a[(a.RF == np.min(a.RF))].head(1) #results in a single row
i = i + 1
EDIT: I also tried the following instead of loc, but got the same error:
df_min.ix[i] = a[(a.RF == np.min(a.RF))].head(1)
EDIT 2: Tried the following, got a "first argument must be a list-like of pandas objects, you passed an object of type "DataFrame"" error this time.
for elem in vehicles:
for state in limit_states:
a = df[(df.VEHICLE == elem) & (df.LIMIT_STATE == state)]
df_min = pd.concat(a[(a.RF == np.min(a.RF))].head(1))
probably something like this would be helpful:
df_min = pd.concat([ df[(df.VEHICLE == elem) & (df.LIMIT_STATE == state)]
for elem in vehicles for state in limit_states ])
edit:
xs = [ df[(df.VEHICLE == elem) & (df.LIMIT_STATE == state)]
for elem in vehicles for state in limit_states ]
df_min = pd.concat([ a[(a.RF == np.min(a.RF))].head(1) for a in xs ])
depending on lists vehicles and limit_states you probably can also achieve what you are trying to do using groupby; something like:
fn = lambda a: a[(a.RF == np.min(a.RF))].head(1)
df.groupby( ['VEHICLE', 'LIMIT_STATE'] ).apply( fn )

Categories

Resources