SQLite Python printing in rows? - python

Afternoon, i am trying to retrieve seat numbers from a database using the following code
cur.execute("SELECT * FROM seats")
while True:
row = cur.fetchone()
if row == None:
break
print row[0]
But when i do so, it prints out each individual record one per line like so :
A1
A2
B3 etc..
But i want each row to print out with the same letter if that makes sense such as :
A1 A2 A3 A4 A5 A6 A7 A8
B1 B2 B3 B4 B5 B6 B7
But i cant seem to get it like that ? How would i go about doing this?

Use the itertools.groupby() tool:
from itertools import groupby
for letter, rows in groupby(cur, key=lambda r: r[0][0]):
print ' '.join([r[0] for r in rows])
The groupby() function loops over each row in cur, take the first letter of the first column, and give you a tuples with each (letter, rows) values. The rows value is another iterable, you can loop over that (with a for loop, for example) to list all rows that have that first letter.
This does rely on the rows being sorted already. If your rows alternate between first letters:
A1
A2
B1
B2
A3
A4
it'll print those as separate groups:
A1 A2
B1 B2
A3 A4
You may want to add a ORDER BY firstcolumnname ordering instruction to your query to ensure correct grouping.
This is what I see when I create a test db:
>>> cur.execute("SELECT * FROM seats ORDER BY code")
<sqlite3.Cursor object at 0x10b1a8730>
>>> for letter, rows in groupby(cur, key=lambda r: r[0][0]):
... print ' '.join([r[0] for r in rows])
...
A1 A2 A3 A4 A5 A6 A7 A8
B1 B2 B3 B4 B5 B6 B7 B8
C1 C2 C3 C4 C5 C6 C7 C8

Related

Check if value of one column exists in another column, put a value in another column in pandas

Say I have a data frame like the following:
A B C D E
a1 b1 c1 d1 e1
a2 a1 c2 d2 e2
a3 a1 a2 d3 e3
a4 a1 a2 a3 e4
I want to create a new column with predefined values if a value found in other columns.
Something like this:
A B C D E F
a1 b1 c1 d1 e1 NA
a2 a1 c2 d2 e2 in_B
a3 a1 a2 d3 e3 in_B, in_C
a4 a1 a2 a3 e4 in_B, in_C, in_D
The in_B, in_C could be other string of choice. If values present in multiple columns, then value of F would be multiple. Example, row 3 and 4 of column F (in row 3 there are two values and in row 4 there are three values). So far, I have tried a below:
DF.F=np.where(DF.A.isin(DF.B), DF.A,'in_B')
But it does not give expected result. Any help
STEPS:
Stack the dataframe.
check for the duplicate values.
unstack to get the same structure back.
use dot to get the required result.
df['new_col'] = df.stack().duplicated().unstack().dot(
'In ' + k.columns + ',').str.strip(',')
OUTPUT:
A B C D E new_col
0 a1 b1 c1 d1 e1
1 a2 a1 c2 d2 e2 In B
2 a3 a1 a2 d3 e3 In B,In C
3 a4 a1 a2 a3 e4 In B,In C,In D

Dataframe slicing with string values

I have a string dataframe that I would like to modify. I need to cut off each row of the dataframe at a value say A4 and replace other values after A4 with -- or remove them. I would like to create a new dataframe that has values only upto the string "A4". How would i do this?
import pandas as pd
columns = ['c1','c2','c3','c4','c5','c6']
values = [['A1', 'A2','A3','A4','A5','A6'],['A1','A3','A2','A5','A4','A6'],['A1','A2','A4','A3','A6','A5'],['A2','A1','A3','A4','A5','A6'], ['A2','A1','A3','A4','A6','A5'],['A1','A2','A4','A3','A5','A6']]
input = pd.DataFrame(values, columns)
columns = ['c1','c2','c3','c4','c5','c6']
values = [['A1', 'A2','A3','A4','--','--'],['A1','A3,'A2','A5','A4','--'],['A1','A2','A4','--','--','--'],['A2','A1','A3','A4','--','--'], ['A2','A1','A3','A4','--','--'],['A1','A2','A4','--','--','--']]
output = pd.DataFrame(values, columns)
You can make a small function, that will take an array, and modify the values after your desired value:
def myfunc(x, val):
for i in range(len(x)):
if x[i] == val:
break
x[(i+1):] = '--'
return x
Then you need to apply the function to the dataframe in a rowwise (axis = 1) manner:
input.apply(lambda x: myfunc(x, 'A4'), axis = 1)
0 1 2 3 4 5
c1 A1 A2 A3 A4 -- --
c2 A1 A3 A2 A5 A4 --
c3 A1 A2 A4 -- -- --
c4 A2 A1 A3 A5 A4 --
c5 A2 A1 A4 -- -- --
c6 A1 A2 A4 -- -- --
I assume you will have values more than A4
df.replace('A([5-9])', '--', regex=True)
0 1 2 3 4 5
c1 A1 A2 A3 A4 -- --
c2 A1 A3 A2 -- A4 --
c3 A1 A2 A4 A3 -- --
c4 A2 A1 A3 -- A4 --
c5 A2 A1 A4 A3 -- --
c6 A1 A2 A4 A3 -- --

How to get the difference between two csv by Index using Pandas

Need to get the difference between 2 csv files, kill duplicates and Nan fields.
I am trying this one but it adds them together instead of subtracting.
df1 = pd.concat([df,cite_id]).drop_duplicates(keep=False)[['id','website']]
df is main dataframe
cite_id is dataframe that has to be subtracted.
You can do this efficiently using 'isin'
df.dropna().drop_duplicates()
cite_id.dropna().drop_duplicates()
df[~df.id.isin(cite_id.id.values)]
Or You can merge them and keep only the lines that have a NaN
df[pd.merge(cite_id, df, how='outer').isnull().any(axis=1)]
import pandas as pd
df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df1 = df1.dropna().drop_duplicates()
df2 = df2.dropna().drop_duplicates()
df = df2.loc[~df2.id.isin(df1.id)]
You can concatenate two dataframes as one, after that you can remove all dupicates
df1
ID B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
cite_id
ID B C D
4 A2 B4 C4 D4
5 A3 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
pd.concat([df1,cite_id]).drop_duplicates(subset=['ID'], keep=False)
Out:
ID B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
6 A6 B6 C6 D6
7 A7 B7 C7 D7

Groupby and Sample pandas

I am trying to sample the resulting data after doing a groupby on multiple columns. If the respective groupby has more than 2 elements, I want to take sample 2 records, else take all the records
df:
col1 col2 col3 col4
A1 A2 A3 A4
A1 A2 A3 A5
A1 A2 A3 A6
B1 B2 B3 B4
B1 B2 B3 B5
C1 C2 C3 C4
target df:
col1 col2 col3 col4
A1 A2 A3 A4 or A5 or A6
A1 A2 A3 A4 or A5 or A6
B1 B2 B3 B4
B1 B2 B3 B5
C1 C2 C3 C4
I have mentioned A4 or A5 or A6 because, when we take sample, either of the three might return
This is what i have tried so far:
trial = pd.DataFrame(df.groupby(['col1', 'col2','col3'])['col4'].apply(lambda x: x if (len(x) <=2) else x.sample(2)))
However, in this I do not get col1, col2 and col3
I think need double reset_index - first for remove 3.rd level of MultiIndex and second for convert MultiIndex to columns:
trial= (df.groupby(['col1', 'col2','col3'])['col4']
.apply(lambda x: x if (len(x) <=2) else x.sample(2))
.reset_index(level=3, drop=True)
.reset_index())
Or reset_index with drop for remove column level_3:
trial= (df.groupby(['col1', 'col2','col3'])['col4']
.apply(lambda x: x if (len(x) <=2) else x.sample(2))
.reset_index()
.drop('level_3', 1))
print (trial)
col1 col2 col3 col4
0 A1 A2 A3 A4
1 A1 A2 A3 A6
2 B1 B2 B3 B4
3 B1 B2 B3 B5
4 C1 C2 C3 C4
There is no need to convert this to a pandas dataframe its one by default
trial=df.groupby(['col1', 'col2','col3'])['col4'].apply(lambda x: x if (len(x) <=2) else x.sample(2))
And this should add the col1,2,3
trial.reset_index(inplace=True,drop=False)

Taking last characters of a column of objects and making it the column on a dataframe - pandas python

I have a dataframe like the following:
df =
A B D
a1 b1 9052091001A
a2 b2 95993854906
a3 b3 93492480190
a4 b4 93240941993
What I want:
df_resp =
A B D
a1 b1 001A
a2 b2 4906
a3 b3 0190
a4 b4 1993
What I tried:
for i in (0,len(df['D'])):
df['D'][i]= df['D'][i][-4:]
Error I got:
KeyError: 4906
Also, it takes a really long time and I think there should be a quicker way with pandas.
Use pd.Series.str string accessor for vectorized string operations. These are preferred over using apply.
If D elements are already strings
df.assign(D=df.D.str[-4:])
A B D
0 a1 b1 001A
1 a2 b2 4906
2 a3 b3 0190
3 a4 b4 1993
If not
df.assign(D=df.D.astype(str).str[-4:])
A B D
0 a1 b1 001A
1 a2 b2 4906
2 a3 b3 0190
3 a4 b4 1993
You can change in place with
df['D'] = df.D.str[-4:]
Use the apply() method of pandas.Series, it will be way faster than iterating with a for loop...
This should work (provided the column contains only strings):
df_resp = df.copy()
df_resp['D'] = df_resp['D'].apply(lambda x : x[-4:])
As for the KeyError, it probably comes from your DataFrame's index, since calling df['D'][i] is equivalent to df.loc[i]['D'], i.e. i refers to the index's label, not its position. It would (probably) work if you replaced it with df.loc[i]['D'], which refers to the index at position i.
I hope this helps!

Categories

Resources