Splitting a float tuple to multiple columns in python dataframe

Splitting a float tuple to multiple columns in python dataframe - python

I used regex to extract patterns from a csv document, mainly the pattern is (qty x volume in L), eg: 2x2L or 3x4L. (Note that 1 cell can have more than 1 pattern, eg: I want 2x4L and 3x1L)
0 []
1 [(2, x1L), (2, x4L)]
2 [(1, x1L), (1, x4L)]
3 [(2, x4L)]
4 [(1, x4L), (1, x1L)]
...
95 [(1, x2L)]
96 [(1, x1L), (1, x4L)]
97 [(2, x1L)]
98 [(6, x1L)]
99 [(6, x1L), (4, x2L), (4, x4L)]
Name: cards__name, Length: 100, dtype: object
I want to create 3 columns called "1L" "2L" and "4L" and then for every item, take the quantity and add it to the relevant row under the relevant column.
As such
1L 2L 4L
2 0 2
1 0 1
0 0 4
1 0 1
However I am not able to index to index the tuple in order to extract the quantity and the volume size for every item.
Any ideas?

Before you will be able to use pivot you have to normalize your columns, e.g. this way:
df['multiplier_1'] = df['order_1'].apply(lambda r: r[0])
df['base_volume_1'] = df['order_1'].apply(lambda r: r[1])
In such a way you will be able to ungroup the orders and eventually split into multiple base volumes.

Related

Python nested for loops, ordering

I want to make a dataframe containing 3 columns. I have three different lists containing the values that need to be in the dataframe in a certain order, so I want to loop over the lists to combine them and create the dataframe.
List F, contains 9 values
List P, contains 3 values
List A, contains 3 values
The final dataframe will be exported in Excel and should look like this:
|F |P |A |
|----|----|----|
|F(0)|P(0)|A(0)|
|F(1)|P(0)|A(1)|
|F(2)|P(0)|A(2)|
|F(3)|P(1)|A(0)|
|F(4)|P(1)|A(1)|
|F(5)|P(1)|A(2)|
|F(6)|P(2)|A(0)|
|F(7)|P(2)|A(1)|
|F(8)|P(2)|A(2)|
To achieve this, I wanted to first create a list with these values and split that in a dataframe.
I tried this to obtain the list:
df_test3 = []
for f in F:
df_test3.append(f)
for p in P:
for a in D:
df_test3.append(p)
df_test3.append(a)
List P and A are in the correct order, but I can't match it with the outer loop F. I know I have to do something with break to return to the outer loop, but I can't see how.
It returns this now:
list = [F0, P0, A0, P0, A1, P0, A2, P1, A0, etc.]
and continues to the next value of F after the inner loops are completed. How can I get all the values in the right order in the list? Or am I handling this the wrong way and should I create the dataframe right away?

Try this...
F = [1,2,3,4,5,6,7,8,9]
P = [11,22,33]
A = [111,222,333]
P1 = []
num1 = int(len(F)/len(P))
for p in P:
P1 = P1 + [p]*num1
num2 = int(len(F)/len(A))
A1 = A*num2
df_result = pd.DataFrame({"F":F,"P":P1,"A":A1})
# Output of df_result...
F P A
0 1 11 111
1 2 11 222
2 3 11 333
3 4 22 111
4 5 22 222
5 6 22 333
6 7 33 111
7 8 33 222
8 9 33 333
Hope this Helps...

You can use cycle from itertools.
For example:
from itertools import cycle
F =[ 0,1,2,3,4,5,6,7,8]
P = [0,1,2]
A = [0,1,2]
df_test3 = []
zip_list = zip(F, cycle(P),cycle(A))
print(list(zip_list))
the result is: [(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 0, 0), (4, 1, 1), (5, 2, 2), (6, 0, 0), (7, 1, 1), (8, 2, 2)]
You can go through it. Maybe it could help you to get a solution.

pandas dataframe conditional selection

There is a pandas data frame.
One of columns named Exceptions.
Row represent entries. In Exceptions i store tuples.
i need to do a conditional selection of rows (there are other conditions which need to be &ed for further selection)
>>>print(dataframe.Exceptions)
0
1
2 (sfm, sfmp)
4
3
Name: Exceptions, dtype: object
>>>'sfm' not in dataframe.Expections
True
How to do this conditional selection with Tuples unpacked.
Appreciate your suggestions.

Here's an example showing how to get tuples that have 1 in the second position.
import pandas as pd
df = pd.DataFrame({
'tups': [(0, 0), (0, 1), (0, 2), (1, 1)]
})
filtered = df[df['tups'].apply(lambda tup: tup[1] == 1)]
print(filtered)
Output:
tups
1 (0, 1)
3 (1, 1)
Is this what you're looking for?

Pandas: how to drop rows if contains more that 2 entries?

I have a dataframe like the following
df
entry
0 (5, 4)
1 (4, 2, 1)
2 (0, 1)
3 (2, 7)
4 (9, 4, 3)
I would like to keep only the entry that contains two values
df
entry
0 (5, 4)
1 (0, 1)
2 (1, 7)

If there are tuples use Series.str.len for lengths and compare by Series.le for <= and filter in boolean indexing:
df1 = df[df['entry'].str.len().le(2)]
print (df1)
entry
0 (5, 4)
2 (0, 1)
3 (2, 7)
If there are strings compare number of , and compare by Series.lt for <:
df2 = df[df['entry'].str.count(',').lt(2)]
print (df2)
entry
0 (5,4)
2 (0,1)
3 (2,7)

Repeating the same process for the whole dataset

Given DataFrame df:
1 1.1 2 2.1 ... 1600 1600.1
0 45.1024 7.2365 45.8769 7.1937 34.1072 8.4643
1 43.1024 8.9645 32.5798 7.7500 33.1072 9.3564
2 42.1024 6.7498 25.1027 7.3496 26.1072 6.3665
I did the following operation: I chose first(1 and 1.1) couple and created an array. Then I did the same with following couple (2 and 2.1).
x = df['1']
y = df['1.1']
P = np.array([x, y])
and
q = df['2']
w = df['2.1']
Q = np.array([q, w])
Final operation was:
Q_final = list(zip(Q[0], Q[1]))
P_final = list(zip(P[0], P[1]))
Now I want to do it for the whole dataset. But it will take a lot of time. Any idea how to iterate this in a short way?
EDITED
After all I'm doing
df = similaritymeasures.frechet_dist(P_final, Q_final)
So I want to get a new dataset (maybe) with all columns combinations

A simple way is to use agg across axis 1
def f(s):
s = iter(s)
return list(zip(s,s))
agg = df.agg(f,1)
Then to retrieve, use .str. For example,
agg.str[0] # P_final
agg.str[1] # Q_final
.
.
.
Also, can groupby across axis=1, assuming you want every couple of columns
df.groupby(np.arange(len(df.columns))//2, axis=1).apply(lambda s: s.agg(list,1))

You probably don't want to create 1600 individual variables. Store this in a container, like a dict, where the keys reference the original column handles:
{idx: list(zip(gp.iloc[:, 0], gp.iloc[:, 1]))
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
# or
{idx: [*map(tuple, gp.to_numpy())]
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
Sample
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame((np.random.randint(1,10,(5,6))))
df.columns = ['1', '1.1', '2', '2.1', '3', '3.1']
# 1 1.1 2 2.1 3 3.1
#0 7 4 8 5 7 3
#1 7 8 5 4 8 8
#2 3 6 5 2 8 6
#3 2 5 1 6 9 1
#4 3 7 4 9 3 5
{idx: list(zip(gp.iloc[:, 0], gp.iloc[:, 1]))
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
#{'1': [(7, 4), (7, 8), (3, 6), (2, 5), (3, 7)],
# '2': [(8, 5), (5, 4), (5, 2), (1, 6), (4, 9)],
# '3': [(7, 3), (8, 8), (8, 6), (9, 1), (3, 5)]}

Writing to Excel CSV with a for loop using Pandas

I have a list in Python
numbers_list = [(2,5), (3,4), (2,6), (3,5)...]
I want to copy the list to an Excel CSV called NumberPairings but I want each combination to be in a different row and each number in the row in different columns.
So I want the excel file to look like this:
Num1 Num2
2 5
3 4
2 6
3 5
I think I should use a for loop that begins with
for item in numbers_list:
But I need help with using Pandas to write to the file in the way I want it. If you think there is an easier way than Pandas, I'm open to it as well.

You can separate the tuples into individual columns like this:
df = pd.DataFrame(data={'tuples': numbers_list})
df
tuples
0 (2, 5)
1 (3, 4)
2 (2, 6)
3 (3, 5)
df['Num1'] = df['tuples'].str[0]
df['Num2'] = df['tuples'].str[1]
df
tuples Num1 Num2
0 (2, 5) 2 5
1 (3, 4) 3 4
2 (2, 6) 2 6
3 (3, 5) 3 5
# optional create csv
df.drop(['tuples'], axis=1).to_csv(path)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a float tuple to multiple columns in python dataframe - python

Related

Python nested for loops, ordering

pandas dataframe conditional selection

Pandas: how to drop rows if contains more that 2 entries?

Repeating the same process for the whole dataset

Writing to Excel CSV with a for loop using Pandas

Categories

Resources