How to reshape a list of tuples in Python? - python

I have a list of tuples that looks like this: B=[('dict1', 'dict2'), (1, 5), (2, 6), (3, 7), (4, 8)]. Of course dict1 and dict2 refer to two dictionaries which values are shown in the table-like view below.
I want to reshape it so that a table-like view is produced, with the purpose of later writing it to a csv file:
dict1 dict2
1 5
2 6
3 7
4 8
I have tried with data=B.reshape(2,5) but to no avail since this is not the way to reshape a list.
How could this be done in a pythonic way? Thanks!

Try
In [22]: import pandas as pd
In [23]: B=[('dict1', 'dict2'), (1, 5), (2, 6), (3, 7),]
In [24]: pd.DataFrame(B).to_csv("my_file.csv", header=False, index=False, sep="\t")
Result:
$ cat my_file.csv
dict1 dict2
1 5
2 6
3 7

If you want to write in a csv file try:
import csv
with open('file.csv', 'wb') as f:
writer = csv.writer(f, delimiter='\t', quoting=csv.QUOTE_NONE)
writer.writerows(B)
Result:
dict1 dict2
1 5
2 6
3 7
4 8

Related

Repeating the same process for the whole dataset

Given DataFrame df:
1 1.1 2 2.1 ... 1600 1600.1
0 45.1024 7.2365 45.8769 7.1937 34.1072 8.4643
1 43.1024 8.9645 32.5798 7.7500 33.1072 9.3564
2 42.1024 6.7498 25.1027 7.3496 26.1072 6.3665
I did the following operation: I chose first(1 and 1.1) couple and created an array. Then I did the same with following couple (2 and 2.1).
x = df['1']
y = df['1.1']
P = np.array([x, y])
and
q = df['2']
w = df['2.1']
Q = np.array([q, w])
Final operation was:
Q_final = list(zip(Q[0], Q[1]))
P_final = list(zip(P[0], P[1]))
Now I want to do it for the whole dataset. But it will take a lot of time. Any idea how to iterate this in a short way?
EDITED
After all I'm doing
df = similaritymeasures.frechet_dist(P_final, Q_final)
So I want to get a new dataset (maybe) with all columns combinations
A simple way is to use agg across axis 1
def f(s):
s = iter(s)
return list(zip(s,s))
agg = df.agg(f,1)
Then to retrieve, use .str. For example,
agg.str[0] # P_final
agg.str[1] # Q_final
.
.
.
Also, can groupby across axis=1, assuming you want every couple of columns
df.groupby(np.arange(len(df.columns))//2, axis=1).apply(lambda s: s.agg(list,1))
You probably don't want to create 1600 individual variables. Store this in a container, like a dict, where the keys reference the original column handles:
{idx: list(zip(gp.iloc[:, 0], gp.iloc[:, 1]))
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
# or
{idx: [*map(tuple, gp.to_numpy())]
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
Sample
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame((np.random.randint(1,10,(5,6))))
df.columns = ['1', '1.1', '2', '2.1', '3', '3.1']
# 1 1.1 2 2.1 3 3.1
#0 7 4 8 5 7 3
#1 7 8 5 4 8 8
#2 3 6 5 2 8 6
#3 2 5 1 6 9 1
#4 3 7 4 9 3 5
{idx: list(zip(gp.iloc[:, 0], gp.iloc[:, 1]))
for idx, gp in df.groupby(df.columns.str.split('.').str[0], axis=1)}
#{'1': [(7, 4), (7, 8), (3, 6), (2, 5), (3, 7)],
# '2': [(8, 5), (5, 4), (5, 2), (1, 6), (4, 9)],
# '3': [(7, 3), (8, 8), (8, 6), (9, 1), (3, 5)]}

matching two different arrays and making a new array in python

I have two two-dimensional arrays, and I have to create a new array filtering through the 2nd array where 1st column indexes match. The arrays are of different size.
basically the idea is as follow:
file A
#x y
1 2
3 4
2 2
5 4
6 4
7 4
file B
#x1 y1
0 1
1 1
11 1
5 1
7 1
My expected output 2D array should look like
#newx newy
1 1
5 1
7 1
I tried it following way:
match =[]
for i in range(len(x)):
if x[i] == x1[i]:
new_array = x1[i]
match.append(new_array)
print match
This does not seem to work. Please suggest a way to create the new 2D array
Try np.isin.
arr1 = np.array([[1,3,2,5,6,7], [2,4,2,4,4,4]])
arr2 = np.array([[0,1,11,5,7], [1,1,1,1,1]])
arr2[:,np.isin(arr2[0], arr1[0])]
array([[1, 5, 7],
[1, 1, 1]])
np.isin(arr2[0], arr1[0]) checks whether each element of arr2[0] is in arr1[0]. Then, we use the result as the boolean index array to select elements in arr2.
If you make a set out of the first element in A, then it is fairly easy to find the elements in B to keep like:
Code:
a = ((1, 2), (3, 4), (2, 2), (5, 4), (6, 4), (7, 4))
b = ((0, 1), (1, 1), (11, 1), (5, 1), (7, 1))
in_a = {i[0] for i in a}
new_b = [i for i in b if i[0] in in_a]
print(new_b)
Results:
[(1, 1), (5, 1), (7, 1)]
Output results to file as:
with open('output.txt', 'w') as f:
for value in new_b:
f.write(' '.join(str(v) for v in value) + '\n')
#!/usr/bin/env python3
from io import StringIO
import pandas as pd
fileA = """x y
1 2
3 4
2 2
5 4
6 4
7 4
"""
fileB = """x1 y1
0 1
1 1
11 1
5 1
7 1
"""
df1 = pd.read_csv(StringIO(fileA), delim_whitespace=True, index_col="x")
df2 = pd.read_csv(StringIO(fileB), delim_whitespace=True, index_col="x1")
df = pd.merge(df1, df2, left_index=True, right_index=True)
print(df["y1"])
# 1 1
# 5 1
# 7 1
https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
If you use pandas:
import pandas as pd
A = pd.DataFrame({'x': pd.Series([1,3,2,5,6,7]), 'y': pd.Series([2,4,2,4,4,4])})
B = pd.DataFrame({'x1': pd.Series([0,1,11,5,7]), 'y1': 1})
C = A.join(B.set_index('x1'), on='x')
Then if you wanted to drop the unneeded row/columns and rename the columns:
C = A.join(B.set_index('x1'), on='x')
C = C.drop(['y'], axis=1)
C.columns = ['newx', 'newy']
which gives you:
>>> C
newx newy
0 1 1.0
3 5 1.0
5 7 1.0
If you are going to work with arrays, dataframes, etc - pandas is definitely worth a look: https://pandas.pydata.org/pandas-docs/stable/10min.html
Assuming that you have (x, y) pairs in your 2-D arrays, a simple loop may work:
arr1 = [[1, 2], [3, 4], [2, 2]]
arr2 = [[0, 1], [1, 1], [11, 1]]
result = []
for pair1 in arr1:
for pair2 in arr2:
if (pair1[0] == pair2[0]):
result.append(pair2)
print(result)
Not the best solution for smaller arrays, but for really large arrays, works fast -
import numpy as np
import pandas as pd
n1 = np.transpose(np.array([[1,3,2,5,6,7], [2,4,2,4,4,4]]))
n2 = np.transpose(np.array([[0,1,11,5, 7], [1,1,1,1,1]]))
np.array(pd.DataFrame(n1).merge(pd.DataFrame(n2), on=0, how='inner').drop('1_x', axis=1))

Writing to Excel CSV with a for loop using Pandas

I have a list in Python
numbers_list = [(2,5), (3,4), (2,6), (3,5)...]
I want to copy the list to an Excel CSV called NumberPairings but I want each combination to be in a different row and each number in the row in different columns.
So I want the excel file to look like this:
Num1 Num2
2 5
3 4
2 6
3 5
I think I should use a for loop that begins with
for item in numbers_list:
But I need help with using Pandas to write to the file in the way I want it. If you think there is an easier way than Pandas, I'm open to it as well.
You can separate the tuples into individual columns like this:
df = pd.DataFrame(data={'tuples': numbers_list})
df
tuples
0 (2, 5)
1 (3, 4)
2 (2, 6)
3 (3, 5)
df['Num1'] = df['tuples'].str[0]
df['Num2'] = df['tuples'].str[1]
df
tuples Num1 Num2
0 (2, 5) 2 5
1 (3, 4) 3 4
2 (2, 6) 2 6
3 (3, 5) 3 5
# optional create csv
df.drop(['tuples'], axis=1).to_csv(path)

How to read a text file and group them in tuple?

I am new to python and trying to do the following in python 3
I have a text file like this
1 2 3
4 5 6
7 8 9
.
.
I wanted this to be converted into groups of tuple like this
((1,2,3),(4,5,6),(7,8,9),...)
I have tried using
f = open('text.txt', 'r')
f.readlines()
but this is giving me a list of individual words.
could any one help me with this?
A method using csv module -
>>> import csv
>>> f = open('a.txt','r')
>>> c = csv.reader(f,delimiter='\t') #Use the delimiter from the file , if a single space, use a single space, etc.
>>> l = []
>>> for row in c:
... l.append(tuple(map(int, row)))
...
>>> l = tuple(l)
>>> l
(('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'))
Though if you do not really need the tuples , do not use them, it may be better to just leave them at list.
Both row and l in above code are initially lists.
You may try this,
>>> s = '''1 2 3
4 5 6
7 8 9'''.splitlines()
>>> tuple(tuple(int(j) for j in i.split()) for i in s)
((1, 2, 3), (4, 5, 6), (7, 8, 9))
For your case,
tuple(tuple(int(j) for j in i.split()) for i in f.readlines())

Loading a table in numpy with row- and column-indices, like in R?

I would like to load a table in numpy, so that the first row and first column would be considered text labels. Something equivalent to this R code:
read.table("filename.txt", row.header=T)
Where the file is a delimited text file like this:
A B C D
X 5 4 3 2
Y 1 0 9 9
Z 8 7 6 5
So that read in I will have an array:
[[5,4,3,2],
[1,0,9,9],
[8,7,6,5]]
With some sort of:
rownames ["X","Y","Z"]
colnames ["A","B","C","D"]
Is there such a class / mechanism?
Numpy arrays aren't perfectly suited to table-like structures. However, pandas.DataFrames are.
For what you're wanting, use pandas.
For your example, you'd do
data = pandas.read_csv('filename.txt', delim_whitespace=True, index_col=0)
As a more complete example (using StringIO to simulate your file):
from StringIO import StringIO
import pandas as pd
f = StringIO("""A B C D
X 5 4 3 2
Y 1 0 9 9
Z 8 7 6 5""")
x = pd.read_csv(f, delim_whitespace=True, index_col=0)
print 'The DataFrame:'
print x
print 'Selecting a column'
print x['D'] # or "x.D" if there aren't spaces in the name
print 'Selecting a row'
print x.loc['Y']
This yields:
The DataFrame:
A B C D
X 5 4 3 2
Y 1 0 9 9
Z 8 7 6 5
Selecting a column
X 2
Y 9
Z 5
Name: D, dtype: int64
Selecting a row
A 1
B 0
C 9
D 9
Name: Y, dtype: int64
Also, as #DSM pointed out, it's very useful to know about things like DataFrame.values or DataFrame.to_records() if you do need a "raw" numpy array. (pandas is built on top of numpy. In a simple, non-strict sense, each column of a DataFrame is stored as a 1D numpy array.)
For example:
In [2]: x.values
Out[2]:
array([[5, 4, 3, 2],
[1, 0, 9, 9],
[8, 7, 6, 5]])
In [3]: x.to_records()
Out[3]:
rec.array([('X', 5, 4, 3, 2), ('Y', 1, 0, 9, 9), ('Z', 8, 7, 6, 5)],
dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8'), ('C', '<i8'), ('D', '<i8')])

Categories

Resources