Random order of one pandas.DataFrame with respect of other - python

I have the following structure:
data_Cnx = pd.read_csv(path_Connection,sep='\t',header=None)
data_Cnx.columns = ["ConnectionID"]
data_Srv = pd.read_csv(path_Service,sep='\t',header=None)
data_Srv.columns = ["ServiceID"]
that can be visualized as the following:
print(data_Cnx)
ConnectionID
0 CN0120
1 CN0121
2 CN0122
3 CN0123
4 CN0124
... ...
20 CN0166
21 CN0167
22 CN0168
23 CN0171
24 CN0172
[25 rows x 1 columns]
print(data_Srv)
ServiceID
0 ST030
1 ST030
2 ST030
3 ST030
4 ST030
... ...
20 ST040
21 ST040
22 ST040
23 ST050
24 ST050
[25 rows x 1 columns]
Literally, each element from data_Cnx corresponds to a parallel element in data_Srv, respecting the order. For instance:
CN0120 corresponds to ST030
CN0121 corresponds to ST030
....
CN0166 corresponds to ST040
CN0167 corresponds to ST040
...
CN0171 corresponds to ST050
...
I would like to have another structure or different data_Cnx and data_Srv in which the order of data_Cnx can be randomized, but always in respect of what corresponds in data_Srv. For instance:
The data_Cnx and data_Srv can be visualized as the following:
print(data_Cnx)
ConnectionID
0 CN0120
1 CN0168
2 CN0156
3 CN0133
4 CN0161
... ...
20 CN0121
21 CN0143
22 CN0127
23 CN0151
24 CN0132
print(data_Srv)
[25 rows x 1 columns] ServiceID
0 ST030
1 ST040
2 ST070
3 ST010
4 ST040
... ...
20 ST030
21 ST050
22 ST030
23 ST070
24 ST010
I was thinking of using randn, but obviously it uses integers as parameters. Do you have an easier idea of how can this be implemented?

I simply found the following works:
bigdata = pd.concat([data_Srv,data_Cnx], axis=1)
bigdata.sample(n = 20)
If someone suggested a better idea, I would be opened to try it :)

Related

How to add values in a data frame with specific conditions

I have a code where it outputs the amount of times a product is bought in a specific month in all stores; however, I was wondering how I would be able to have the sum of 3 conditions, where python would add the products from a specific month and a specific store.
This is my code so far:
df = df.groupby(['Month_Bought'])['Amount_Bought'].sum()
print(df)
Output:
01-2020 27
02-2020 26
03-2020 24
04-2020 23
05-2020 31
06-2020 33
07-2020 26
08-2020 30
09-2020 33
10-2020 26
11-2020 30
12-2020 30
Need to separate the data to make the dataframe look like this:
Store1 Store2
01-2020 3 24
02-2020 4 22
03-2020 8 16
04-2020 4 19
05-2020 10 21
06-2020 11 21
07-2020 12 14
08-2020 10 20
09-2020 3 30
10-2020 14 12
11-2020 21 9
12-2020 9 21
Assuming your data is long (a column contains values for which store a product was purchased), you could group by store and month:
import pandas as pd
records = [
{'Month_Bought':'01-2020', 'Amount_Bought':1, 'Store': 'Store1'},
{'Month_Bought':'01-2020', 'Amount_Bought':2, 'Store': 'Store2'},
{'Month_Bought':'02-2020', 'Amount_Bought':2, 'Store': 'Store1'},
{'Month_Bought':'02-2020', 'Amount_Bought':4, 'Store': 'Store2'}
]
df = pd.DataFrame.from_records(records)
# Initial dataframe
Month_Bought Amount_Bought Store
0 01-2020 1 Store1
1 01-2020 2 Store2
2 02-2020 2 Store1
3 02-2020 4 Store2
# Now groupby store and month
df_agg = df.groupby(['Store', 'Month_Bought'], as_index=False)['Amount_Bought'].sum()
# Convert from long to wide:
df_agg_pivot = df_agg.pivot(index='Month_Bought', columns='Store', values='Amount_Bought')
# Reformat
df_agg_pivot = df_agg_pivot.reset_index()
df_agg_pivot.columns.name = None
# Final result:
Month_Bought Store1 Store2
0 01-2020 1 2
1 02-2020 2 4

Euclidean Distance over 2 dataframes

I have 2 Dataframes
DF1-
Name X Y
0 Astonished 0.430 0.890
1 Excited 0.700 0.720
2 Expectant 0.320 0.067
3 Passionate 0.333 0.127
[47 rows * 3 columns]
DF2-
Id X Y
0 1 -0.288453 0.076105
1 4 -0.563453 -0.498895
2 5 -0.788453 -0.673895
3 6 -0.063453 -0.373895
4 7 0.311547 0.376105
[767 rows * 3 columns]
Now what I want to achieve is -
Take the X,Y from first entry from DF2, iterate it over DF1, calculate Euclidean Distance between each value of X,Y in DF2.
Find the minimum of all the Euclidean Distance obtained between the two points, save the minimum result somewhere along with the corresponding entry under the name column.
Example-
Say for any tuple of X,Y in DF2, the minimum Euclidean distance is corresponding to the X,Y value in the row 0 of DF1, then the result should be, the distance and name Astonished.
My Attempt-
import pandas as pd
import numpy as np
import csv
mood = pd.read_csv("C:/Users/Desktop/DF1.csv")
song_value = pd.read_csv("C:/Users/Desktop/DF2.csv")
df_temp = mood.loc[:, ['Arousal','Valence']]
df_temp1 = song_value.loc[:, ['Arousal','Valence']]
import scipy
from scipy import spatial
ary = scipy.spatial.distance.cdist(mood.loc[:, ['Arousal','Valence']], song_value.loc[:, ['Arousal','Valence']], metric='euclidean')
print (ary)
Result Obtained -
[[1.08563344 1.70762362 1.98252253 ... 0.64569366 0.47426051 0.83656989]
[1.17967807 1.75556794 2.03922435 ... 0.59326275 0.2469077 0.79334076]
[0.60852124 1.04915517 1.33326431 ... 0.1848471 0.53293637 0.08394834]
...
[1.26151359 1.5500629 1.81168766 ... 0.74070027 0.70209658 0.75277205]
[0.69085994 1.03764923 1.31608627 ... 0.33265268 0.61928227 0.21397822]
[0.84484398 1.11428893 1.38222899 ... 0.48330291 0.69288125 0.3886008 ]]
I have no clue how I should proceed now.
Please suggest something.
EDIT - 1
I converted the array in another data frame using
new_series = pd.DataFrame(ary)
print (new_series)
Result -
0 1 2 ... 764 765 766
0 1.085633 1.707624 1.982523 ... 0.645694 0.474261 0.836570
1 1.179678 1.755568 2.039224 ... 0.593263 0.246908 0.793341
2 0.608521 1.049155 1.333264 ... 0.184847 0.532936 0.083948
3 0.623534 1.093331 1.378075 ... 0.124156 0.479393 0.109057
4 0.791926 1.352785 1.636748 ... 0.197403 0.245908 0.398619
5 0.740038 1.260768 1.545785 ... 0.092072 0.304926 0.281791
6 0.923284 1.523395 1.803676 ... 0.415540 0.293217 0.611312
7 1.202447 1.679660 1.962823 ... 0.554256 0.247391 0.703298
8 0.824898 1.343684 1.628727 ... 0.177560 0.222666 0.360980
9 1.191411 1.604942 1.883150 ... 0.570771 0.395957 0.668736
10 0.822236 1.456863 1.708469 ... 0.706252 0.787271 0.823542
11 0.741683 1.371996 1.618916 ... 0.704496 0.835235 0.798964
12 0.346244 0.967891 1.240839 ... 0.376504 0.715617 0.359700
13 0.526096 1.163209 1.421820 ... 0.520190 0.748265 0.579333
14 0.435992 0.890291 1.083229 ... 0.937048 1.254437 0.884499
15 0.600338 1.162469 1.375755 ... 0.876228 1.116301 0.891714
16 0.634254 1.059083 1.226407 ... 1.088393 1.373536 1.058550
17 0.712227 1.284502 1.498187 ... 0.917272 1.117806 0.956957
18 0.194387 0.799728 1.045745 ... 0.666713 1.013563 0.597524
19 0.456000 0.708741 0.865870 ... 1.068296 1.420654 0.973234
20 0.633776 0.632060 0.709202 ... 1.277083 1.645173 1.157765
21 0.192291 0.597749 0.826602 ... 0.831713 1.204117 0.716746
22 0.522033 0.526969 0.645998 ... 1.170316 1.546040 1.041762
23 0.668148 0.504480 0.547920 ... 1.316602 1.698041 1.176933
24 0.718440 0.285718 0.280984 ... 1.334008 1.727796 1.166364
25 0.759187 0.265412 0.217165 ... 1.362786 1.757580 1.190132
26 0.598326 0.113459 0.380513 ... 1.087573 1.479296 0.896239
27 0.676841 0.263613 0.474246 ... 1.074911 1.456515 0.875707
28 0.865641 0.365394 0.462742 ... 1.239941 1.612779 1.038790
29 0.463623 0.511737 0.786284 ... 0.719525 1.099122 0.519226
30 0.780386 0.550483 0.750532 ... 0.987863 1.336760 0.788449
31 1.077559 0.711697 0.814205 ... 1.274933 1.602953 1.079529
32 1.020408 0.497152 0.522999 ... 1.372444 1.736938 1.170889
33 0.963911 0.367018 0.336035 ... 1.398444 1.778496 1.198905
34 1.092763 0.759612 0.873457 ... 1.256086 1.574565 1.063570
35 0.903631 0.810449 1.018501 ... 0.921287 1.219046 0.740134
36 0.728728 0.795942 1.045868 ... 0.695317 1.009043 0.512147
37 0.738314 0.600405 0.822742 ... 0.895225 1.239125 0.697393
38 1.206901 1.151385 1.343654 ... 1.064721 1.273002 0.922962
39 1.248530 1.293525 1.508517 ... 0.988508 1.137608 0.880669
40 0.988777 1.205968 1.463036 ... 0.622495 0.776919 0.541414
41 0.941001 1.043940 1.285215 ... 0.732293 0.960420 0.595174
42 1.242508 1.321327 1.544222 ... 0.947970 1.080069 0.851396
43 1.262534 1.399453 1.633948 ... 0.900340 0.989603 0.830024
44 1.261514 1.550063 1.811688 ... 0.740700 0.702097 0.752772
45 0.690860 1.037649 1.316086 ... 0.332653 0.619282 0.213978
46 0.844844 1.114289 1.382229 ... 0.483303 0.692881 0.388601
[47 rows x 767 columns]
Moreover, is this the best approach? Sorry, but am not sure, that's why am putting this up.
Say df_1 and df_2 are your dataframes, first extract your pairs as shown below:
pairs_1 = list(tuple(zip(df_1.X, df_1.Y)))
pairs_2 = list(tuple(zip(df_2.X, df_2.Y)))
Then iterate over pairs as per your use case and get the index of minimum distance for the iterated points:
from scipy import spatial
min_distances = []
closest_pairs = []
names = []
for i in pairs_2:
min_dist = scipy.spatial.distance.cdist([i], pairs_1, metric='euclidean').min()
index_min = scipy.spatial.distance.cdist([i], pairs_1, metric='euclidean').argmin()
min_distances.append(min_dist)
closest_pairs.append(df_1.loc[index_min, ['X', 'Y']])
names.append(df_1.loc[index_min, 'Name'])
Insert results to df_2:
df_2['min_distance'] = min_distances
df_2['closest_pairs'] = [tuple(i.values) for i in closest_pairs]
df_2['name'] = names
df_2
Output:
Id X Y min_distance closest_pairs name
0 1 -0.288453 0.076105 0.608521 (0.32, 0.067) Expectant
1 4 -0.563453 -0.498895 1.049155 (0.32, 0.067) Expectant
2 5 -0.788453 -0.673895 1.333264 (0.32, 0.067) Expectant
3 6 -0.063453 -0.373895 0.584316 (0.32, 0.067) Expectant
4 7 0.311547 0.376105 0.250027 (0.33, 0.127) Passionate
I have added min_distance and closest_pairs as well, you can exclude these columns if you want to.

Python Pandas: Shape of passed values is (126, 5), indices imply (84, 5) [duplicate]

This question already has answers here:
Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2
(7 answers)
Closed 2 years ago.
I have 2 dataframes with 84 rows, clearly the same lengths, but when i want to concat them to 1 df (concat by column - to have the name, Edge and Offset to the right of Latitude and Longitude), i get this error,.
what is going on?
Latitude Longitude
0 45.403538 -75.735729
1 45.403506 -75.735699
2 45.409095 -75.722588
3 45.409069 -75.722552
4 45.413496 -75.714184
.. ... ...
79 45.415609 -75.644769
80 45.416073 -75.645726
81 45.416193 -75.638802
82 45.416172 -75.638223
[84 rows x 2 columns]
name Edge Offset
0 TUN-W 1 3000
1 TUN-E 2 3000
2 BAY-W 5 102510
3 BAY-E 6 102579
4 PIM-W 5 186035
.. ... ... ...
37 PTSTTW 33 52710
38 PTSTTE 34 18997
39 PAG11 40 24362
40 PAG14 50 9927
41 PHND15 177 11662
[84 rows x 3 columns]
This is my code liner
output_df = pd.concat( [output_df, input_df], axis=1)
I got it:
name Edge Offset
0 TUN-W 1 3000
1 TUN-E 2 3000
2 BAY-W 5 102510
3 BAY-E 6 102579
4 PIM-W 5 186035
.. ... ... ...
37 PTSTTW 33 52710
38 PTSTTE 34 18997
39 PAG11 40 24362
40 PAG14 50 9927
41 PHND15 177 11662
The indexing was messed up. Somewhere in the middle, the index resets to 0 instead of counting up to 84.
So i did a segments_df = segments_df_start.append(segments_df_end).reset_index() (was in earlier part of my code) to fix the indexing for that dataframe before I pass it over.
So ALWAYS remember to check your indexes and .reset_index() , when troubleshooting!!

pd.concat keys to separate column

I have several tables imported from an Excel file:
df = pd.read_excel(ffile, 'Constraints', header = None, names = range(13))
table_names = ['A', ...., 'W']
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
This is the first time I've tried to read multiple tables from a single sheet, so I'm not sure if this is the best manner. If printed like this:
for k,v in tables.items():
print("table:", k)
print(v)
print()
The output is:
table: A
0 1 2 ... 10 11 12
2 Sxxxxxx Dxxx 21 20 ... 22 19 22
3 Rxxx Sxxxx / Lxxx Cxxxxxxxxxxx 7 7 ... 7 7 7
4 AVG Sxxxx per xxx # xx% Pxxxxxxxxxxxx 5 X 5.95 5.95 ... 5.95 5.95 5.95
...
...
...
table: W
0 1 2 ... 10 11 12
6 Sxxxxxx Dxxx 21 20 ... 22 19 22
7 Rxxx Sxxxx / Lxxx Cxxxxxxxxxxx 30 30 ... 30 30 30
8 AVG Sxxxx per xxx # xx% Pxxxxxxxxxxxx 5 x 28.5 28.5 ... 28.5 28.5 28.5
I tried to combine them all into one DataFrame using dfa = pd.DataFrame(tables['A'])
for each table, and then using fdf = pd.concat([dfa,...,dwf], keys =['A', ... 'W']).
The keys are hierarchically placed, but the autonumbered index column inserts itself after the keys and before the first column:
0 1 2 ... 10 11 12
A 2 Sxxxxxx Dxxx 21 20 ... 22 19 22
3 Rxxx Sxxxx / Lxxx Cxxxxxxxxxxx 7 7 ... 7 7 7
4 AVG Sxxxx per xxx # xx% Pxxxxxxxxxxxx 5 X 5.95 5.95 ... 5.95 5.95 5.95
I would like to convert the keys to an actual column and switch places with the pandas numbered index, but I'm not sure how to do that. I've tried pd.reset_index() in various configurations, but am wondering if I maybe constructed the tables wrong in the first place?
If any of this information is not necessary, please let me know and I will remove it. I'm trying to follow the MCV guidelines and am not sure how much people need to know.
After you get the your tables, Just do
pd.concat(tables)

Pandas - formatting a NxN matrix

I have to deal with a square matrix (N x N) (N will change depending on the system, but the matrix will always be a square matrix).
Here is an example:
0 1 2 3 4
0 5.1677124550E-001 5.4962112499E-005 3.2484393256E-002 -1.8901697652E-001 -6.7156804753E-003
1 5.5380106796E-005 5.6159927753E-001 -1.9000545049E-003 -1.4737748792E-002 -7.2598453774E-002
2 3.2486915835E-002 -1.8996351539E-003 5.6791783316E-001 7.2316374186E-002 1.5013066446E-003
3 -1.8901411495E-001 -1.4737367075E-002 7.2315825338E-002 6.2721160365E-001 3.1553528602E-002
4 -6.7136454124E-003 -7.2597907350E-002 1.5007743348E-003 3.1554372311E-002 2.7318109331E-001
5 6.6738948243E-002 1.4102132238E-003 -1.2689244944E-001 4.7666038803E-002 1.8559074897E-002
6 -2.5293332676E-002 3.7536452002E-002 -1.3453018251E-002 -1.3177136905E-001 6.8262612506E-002
7 5.0951492945E-003 2.1082303893E-005 2.2599127408E-004 1.0287898189E-001 -1.1117916184E-001
8 1.0818230191E-003 -1.2435319909E-002 8.1008075834E-003 -4.2864102001E-002 4.2865913226E-002
9 -1.8399671295E-002 -2.1579653853E-002 -8.3073582356E-003 -2.1848513510E-001 -7.3408914625E-002
10 3.4566032399E-003 -4.0687639382E-003 1.3769999130E-003 -1.1873434189E-001 -3.3274201039E-002
11 6.6093238125E-003 1.7153435473E-002 4.9392012712E-003 -8.4590814134E-002 -4.3601041176E-002
12 -1.1418316960E-001 -1.1241625427E-001 -3.2263873516E-002 -1.9323129435E-002 -2.6233049625E-002
13 -1.1352899039E-001 -2.2898299860E-001 -5.3035072561E-002 7.4480651562E-004 6.3778892206E-004
14 -3.2197359289E-002 -5.3404040557E-002 -6.2530142462E-002 9.6648204015E-003 1.5382174347E-002
15 -1.2210509335E-001 1.1380412205E-001 -3.8374895516E-002 -1.2823165326E-002 2.3865200517E-002
16 1.1478157080E-001 -2.1487971631E-001 5.9955334103E-002 -1.2803721235E-003 -2.2477259002E-004
17 -3.9162044498E-002 6.0167325377E-002 -6.7692892326E-002 6.3814569032E-003 -1.3309923267E-002
18 -5.1386866211E-002 -1.1483215267E-003 -3.8482481829E-002 2.2227734790E-003 2.4860195290E-004
19 -1.8287048910E-003 -4.5442287955E-002 -7.6787332291E-003 7.6970470456E-004 -1.8456603178E-003
20 -3.4812676792E-002 -7.8376169613E-003 -3.1205975353E-001 -2.8005140005E-003 3.9792109835E-004
21 2.6908361866E-003 3.7102890907E-004 2.8494060446E-002 -4.8904422930E-002 -5.8840348122E-004
22 -1.6354677061E-003 2.2592828188E-003 1.6591434361E-004 -4.9992263663E-003 -4.3243295112E-002
23 -1.4297833794E-003 -1.7830154308E-003 -1.1426700328E-002 1.7125095395E-003 -1.2016863398E-002
24 1.6271802154E-003 1.6383303957E-003 -7.8049656555E-004 3.7177399735E-003 -1.0472268655E-002
25 -4.1949740427E-004 1.5301971185E-004 -9.8681335931E-004 -2.2257204483E-004 -5.1722898203E-003
26 1.0290471110E-003 9.3255502541E-004 7.7166886713E-004 4.5630851485E-003 -4.3761358485E-003
27 -7.0031784470E-004 -3.5205332654E-003 -1.6311730073E-003 -1.2805479632E-002 -6.5565487971E-003
28 7.4046927792E-004 1.9332629981E-003 3.7374682636E-004 3.9965654817E-003 -6.2275912806E-003
29 -3.4680278867E-004 -2.3027344089E-003 -1.1338817043E-003 -1.2023581780E-002 -5.4242202971E-003
5 6 7 8 9
0 6.6743285428E-002 -2.5292337123E-002 5.0949675928E-003 1.0817408844E-003 -1.8399704662E-002
1 1.4100215877E-003 3.7536256943E-002 2.1212526899E-005 -1.2435482773E-002 -2.1579384876E-002
2 -1.2689432485E-001 -1.3453164785E-002 2.2618690004E-004 8.1008703937E-003 -8.3084039605E-003
3 4.7663851818E-002 -1.3181118094E-001 1.0290976691E-001 -4.2887391630E-002 -2.1847562123E-001
4 1.8558453001E-002 6.8311145594E-002 -1.1122358467E-001 4.2891711956E-002 -7.3413776745E-002
5 6.5246209445E-001 -3.7960754525E-002 5.8439215647E-002 -9.0620367134E-002 -8.4164313206E-002
6 -3.7935271881E-002 1.9415336793E-001 -6.8115262349E-002 5.0899890760E-002 -3.3687874555E-002
7 5.8422477033E-002 -6.8128901087E-002 3.9950499633E-001 -4.4336879147E-002 -4.0665928103E-002
8 -9.0612201567E-002 5.0902528870E-002 -4.4330072001E-002 1.2680415316E-001 1.7096405711E-002
9 -8.4167028549E-002 -3.3690056890E-002 -4.0677875424E-002 1.7097273427E-002 5.2579065978E-001
10 -6.4841142152E-002 -5.4453858464E-003 -2.4697277476E-001 8.5069643903E-005 1.8744016178E-001
11 -1.0367060076E-001 1.5864203200E-002 -1.6074822795E-002 -5.5265410413E-002 -7.3152548403E-002
12 -9.0665723957E-003 3.3027526012E-003 1.8484849938E-003 -7.5841163489E-004 -3.3700244298E-003
13 4.7717318460E-004 -1.8118719766E-003 1.6014630540E-003 -2.3830908057E-004 2.1049292570E-003
14 4.3836856576E-003 -1.7242302777E-003 -1.2023546553E-003 4.0533783460E-004 1.4850814596E-003
15 -1.2402059167E-002 -7.4793143461E-003 -3.8769252328E-004 3.9551076185E-003 1.0737706641E-003
16 -9.3076805579E-005 -1.6074185601E-003 1.7551579833E-003 -5.1663470094E-004 1.1072804383E-003
17 4.6817349747E-003 3.6900011954E-003 -8.6155331565E-004 -9.1007768778E-005 -7.3899260162E-004
18 3.2959550689E-002 3.0400921147E-003 3.9724187499E-004 -1.9220339108E-003 1.8075790317E-003
19 7.0905456379E-004 -5.0949208181E-004 -4.6021500516E-004 -7.9847500945E-004 1.4079850530E-004
20 -1.8687467448E-002 -6.3913023759E-004 -7.3566296037E-004 2.3726543730E-003 -1.0663719038E-003
21 3.6598966411E-003 -8.2335128379E-003 7.5645765132E-004 -2.1824880567E-002 -3.5125687811E-003
22 -1.6198130808E-002 8.4576317115E-003 -6.2045498682E-004 3.3460766491E-002 3.2638760335E-003
23 -3.2057393808E-001 -1.1315081941E-002 3.4822885510E-003 -5.8263446092E-003 2.9508421818E-004
24 -2.6366856593E-002 -5.8331954255E-004 1.1995976399E-003 3.4813904521E-003 -5.0942740761E-002
25 6.5474742063E-003 -5.7681583908E-003 -2.2680039574E-002 -3.3264360995E-002 4.8925407218E-003
26 -1.1288074542E-002 -4.5938216710E-003 -1.9339903561E-003 1.0812058656E-002 2.3005958417E-002
27 1.8937006089E-002 6.5590668002E-003 -2.9973042787E-003 -9.1466195902E-003 -2.0027029530E-001
28 -5.0006834397E-003 -3.1011487603E-002 -2.1071980031E-002 1.5171078954E-002 -6.3286786806E-002
29 1.0199591553E-002 -7.9372677248E-004 3.0157129340E-003 3.3043947441E-003 1.2554933598E-001
10 11 12 13 14
0 3.4566170422E-003 6.6091516193E-003 -1.1418209846E-001 -1.1352717720E-001 -3.2196213169E-002
1 -4.0687114857E-003 1.7153538295E-002 -1.1241515840E-001 -2.2897846552E-001 -5.3401852861E-002
2 1.3767476381E-003 4.9395834885E-003 -3.2262805417E-002 -5.3032729716E-002 -6.2527093260E-002
3 -1.1874067860E-001 -8.4586993618E-002 -1.9322697616E-002 7.4504831410E-004 9.6646936748E-003
4 -3.3280804952E-002 -4.3604931512E-002 -2.6232842935E-002 6.3789697287E-004 1.5382093474E-002
5 -6.4845769217E-002 -1.0366990398E-001 -9.0664935892E-003 4.7719667654E-004 4.3835884630E-003
6 -5.4306282394E-003 1.5863464756E-002 3.3027917727E-003 -1.8118646089E-003 -1.7242102753E-003
7 -2.4687457565E-001 -1.6075394559E-002 1.8484728466E-003 1.6014634135E-003 -1.2023496466E-003
8 8.5962912652E-005 -5.5265657567E-002 -7.5843145596E-004 -2.3831274033E-004 4.0533385644E-004
9 1.8744386918E-001 -7.3152643002E-002 -3.3700964189E-003 2.1048865009E-003 1.4850822567E-003
10 4.2975054072E-001 1.0364270794E-001 -1.5875283846E-003 6.7147216913E-004 1.2875627684E-004
11 1.0364402707E-001 6.0712435750E-001 5.1492123223E-003 8.2705404716E-004 -1.8653698814E-003
12 -1.5875318643E-003 5.1492269487E-003 1.2662026379E-001 1.2488481495E-001 3.3008712754E-002
13 6.7147489686E-004 8.2705994225E-004 1.2488477299E-001 2.4603749137E-001 5.7666439818E-002
14 1.2875157882E-004 -1.8653719810E-003 3.3008614344E-002 5.7666322609E-002 6.3196096154E-002
15 1.1375173141E-003 -1.2188735107E-003 9.5708352328E-003 -1.3282223067E-002 5.3571128896E-003
16 2.1319373893E-004 -2.6367828437E-004 1.4833724552E-002 -2.0115235494E-002 7.8461850894E-003
17 2.3051283757E-004 3.4044831571E-004 4.9262824289E-003 -6.6151918659E-003 1.1684894610E-003
18 -5.6658408835E-004 1.5710333316E-003 -2.6543076573E-003 1.0490950154E-003 -1.5676208892E-002
19 1.0005496308E-003 1.0400419914E-003 -2.7122935995E-003 -5.3716049248E-005 -2.6747366947E-002
20 3.1068907684E-004 5.3348953665E-004 -4.7934824223E-004 4.4853558686E-004 -6.0300656596E-003
21 2.7080517882E-003 -1.9033626829E-002 8.8615570289E-004 -3.7735646663E-004 -7.4101143501E-004
22 -2.9622921796E-003 -2.4159082408E-002 6.6943323966E-004 1.1154593780E-004 1.5914682394E-004
23 3.2842560830E-003 -6.2612752482E-003 1.5738434272E-004 4.6284599959E-004 4.0588132107E-004
24 1.6971737369E-003 2.4217812563E-002 4.3246402884E-004 9.5059931011E-005 3.5484698283E-004
25 -7.4868993750E-002 -8.7332668698E-002 -6.0147742690E-005 -4.8099146029E-005 1.1509155506E-004
26 -9.3177706949E-002 -2.9315061874E-001 2.1287190612E-004 5.0813661565E-005 2.6955715462E-004
27 -7.0097859908E-002 1.2458191360E-001 -1.2846480258E-003 1.2192486380E-004 4.6853704861E-004
28 -6.9485493530E-002 4.8763866344E-002 7.7223643475E-004 1.3853535883E-004 5.4636752811E-005
29 4.8961381968E-002 -1.5272337445E-001 -8.8648769643E-004 -4.4975303480E-005 5.9586006091E-004
15 16 17 18 19
0 -1.2210501176E-001 1.1478027359E-001 -3.9162145749E-002 -5.1389252158E-002 -1.8288904037E-003
1 1.1380272374E-001 -2.1487588526E-001 6.0165774430E-002 -1.1487007778E-003 -4.5441546655E-002
2 -3.8374694597E-002 5.9953296524E-002 -6.7691825286E-002 -3.8484030260E-002 -7.6800715249E-003
3 -1.2822729286E-002 -1.2805898275E-003 6.3813065178E-003 2.2220841872E-003 7.6991955181E-004
4 2.3864994996E-002 -2.2470892452E-004 -1.3309838494E-002 2.4851560674E-004 -1.8460620529E-003
5 -1.2402212045E-002 -9.2994801153E-005 4.6817064931E-003 3.2958166488E-002 7.0866732024E-004
6 -7.4793278406E-003 -1.6074103229E-003 3.6899979002E-003 3.0392561951E-003 -5.0946020505E-004
7 -3.8770026733E-004 1.7551659565E-003 -8.6155605026E-004 3.9692465089E-004 -4.6038088334E-004
8 3.9551171890E-003 -5.1663991899E-004 -9.1008948343E-005 -1.9220277566E-003 -7.9837924658E-004
9 1.0738350084E-003 1.1072790098E-003 -7.3897453645E-004 1.8057852560E-003 1.4013275714E-004
10 1.1375075076E-003 2.1317640112E-004 2.3050639764E-004 -5.6673414945E-004 1.0005316579E-003
11 -1.2189105982E-003 -2.6367792495E-004 3.4043235164E-004 1.5732522246E-003 1.0407973658E-003
12 9.5708232459E-003 1.4833737759E-002 4.9262816092E-003 -2.6542614308E-003 -2.7122986789E-003
13 -1.3282260152E-002 -2.0115238348E-002 -6.6152067653E-003 1.0491248568E-003 -5.3705750675E-005
14 5.3571028398E-003 7.8462085672E-003 1.1684872139E-003 -1.5676176683E-002 -2.6747374282E-002
15 1.3378635756E-001 -1.2613361119E-001 4.2401828623E-002 -2.6595403473E-003 1.9873360401E-003
16 -1.2613349126E-001 2.3154756121E-001 -6.5778628114E-002 -2.2828335280E-003 1.4601821131E-003
17 4.2401749392E-002 -6.5778591727E-002 6.8187241643E-002 -1.6653902450E-002 2.5505038138E-002
18 -2.6595920073E-003 -2.2828074980E-003 -1.6653942562E-002 5.4855247002E-002 2.4729783529E-003
19 1.9873415121E-003 1.4601899329E-003 2.5505058190E-002 2.4729967206E-003 4.4724663284E-002
20 -3.8366743828E-004 -8.8746730931E-004 -6.4420927497E-003 3.6656962180E-002 8.1224860664E-003
21 9.2845385141E-004 3.6802433505E-004 -9.5040708316E-004 -5.1941208846E-003 -1.2444625713E-004
22 -5.0318487549E-004 1.4342911215E-004 2.8985859503E-004 2.0416113478E-004 9.1951318240E-004
23 7.4036073171E-004 -3.4730013615E-004 -1.3351566400E-004 2.3474188588E-003 1.3102362758E-005
24 -2.7749145090E-004 4.7724454321E-005 5.5527644806E-005 -1.7302886151E-004 -1.7726879169E-004
25 -2.5090250470E-004 2.1741519930E-005 2.7208805916E-004 -2.5982303487E-004 -1.9668228900E-004
26 -1.4489113997E-004 -3.0397727583E-005 2.7239543481E-005 -6.0050637375E-004 -2.9892198193E-005
27 -1.6519482597E-005 1.6435294198E-004 5.0961893634E-005 1.4077278097E-004 -1.9027010603E-005
28 -2.3547595249E-004 7.6124571826E-005 1.0117983985E-004 -1.1534040559E-004 -1.0579685787E-004
29 7.0507166233E-005 1.1552377841E-004 -4.5931305760E-005 -2.0007797315E-004 -1.3505340062E-004
20 21 22 23 24
0 -3.4812101478E-002 2.6911592086E-003 -1.6354152863E-003 -1.4301333227E-003 1.6249964844E-003
1 -7.8382610347E-003 3.7103408229E-004 2.2593110441E-003 -1.7829862164E-003 1.6374435740E-003
2 -3.1205423941E-001 2.8493671639E-002 1.6587990556E-004 -1.1426237591E-002 -7.8189111866E-004
3 -2.8004725758E-003 -4.8903739721E-002 -4.9988134121E-003 1.7100983514E-003 3.7179545055E-003
4 3.9806443322E-004 -5.8790208912E-004 -4.3242458298E-002 -1.2016207108E-002 -1.0472139534E-002
5 -1.8686790048E-002 3.6592865292E-003 -1.6198931842E-002 -3.2057224847E-001 -2.6367531700E-002
6 -6.3919412091E-004 -8.2335246704E-003 8.4576155591E-003 -1.1315054733E-002 -5.8369163532E-004
7 -7.3581915791E-004 7.5646519519E-004 -6.2047477465E-004 3.4823216513E-003 1.1991380964E-003
8 2.3726528036E-003 -2.1824763131E-002 3.3460717579E-002 -5.8262172949E-003 3.4812921433E-003
9 -1.0665296285E-003 -3.5124206435E-003 3.2639684654E-003 2.9530797749E-004 -5.0943824872E-002
10 3.1067613876E-004 2.7079189356E-003 -2.9623459983E-003 3.2841200274E-003 1.6984442797E-003
11 5.3351732140E-004 -1.9033427571E-002 -2.4158940046E-002 -6.2609613281E-003 2.4221378111E-002
12 -4.7937892256E-004 8.8611314755E-004 6.6939922854E-004 1.5740024716E-004 4.3249394082E-004
13 4.4851926804E-004 -3.7736678097E-004 1.1153694999E-004 4.6284806253E-004 9.5077824774E-005
14 -6.0300787410E-003 -7.4096053004E-004 1.5918637627E-004 4.0586523098E-004 3.5485782222E-004
15 -3.8368712363E-004 9.2843754228E-004 -5.0316845184E-004 7.4036906127E-004 -2.7745851356E-004
16 -8.8745240886E-004 3.6801936222E-004 1.4342995270E-004 -3.4729860789E-004 4.7711904531E-005
17 -6.4420819427E-003 -9.5038506002E-004 2.8983698019E-004 -1.3352326563E-004 5.5544671478E-005
18 3.6656852373E-002 -5.1941195232E-003 2.0415783452E-004 2.3474119607E-003 -1.7153048632E-004
19 8.1224361521E-003 -1.2444681834E-004 9.1951236579E-004 1.3097434442E-005 -1.7668019335E-004
20 3.3911554853E-001 2.8652507893E-003 -6.8339696880E-005 3.7476484447E-004 8.3606654277E-004
21 2.8652527558E-003 6.1967615286E-002 -3.2455918220E-003 7.8074203872E-003 -1.5351890960E-003
22 -6.8340068690E-005 -3.2455946984E-003 4.1826230856E-002 6.5337193429E-003 -3.1932674182E-003
23 3.7476336333E-004 7.8073802579E-003 6.5336763366E-003 3.4246747567E-001 -2.2590437719E-005
24 8.3515185725E-004 -1.5351889308E-003 -3.1932682244E-003 -2.2585651674E-005 4.7006835231E-002
25 5.3158843621E-007 1.0652535047E-003 1.4954902777E-003 2.4073368793E-004 1.1954474977E-003
26 5.5963948637E-004 -4.4872582333E-004 -1.4772351943E-003 6.3199701928E-004 -2.1389718034E-002
27 -1.7619372799E-004 9.0741766644E-004 9.8175835796E-004 -2.9459682310E-004 7.2835611826E-004
28 2.5127782091E-004 -9.3298199434E-004 6.8787235133E-005 1.2732690365E-004 7.9688727422E-003
29 2.6201943695E-004 1.7128017387E-004 1.2934748675E-003 3.4008367645E-004 1.9615268308E-002
25 26 27 28 29
0 -4.2035299977E-004 1.0294528397E-003 -7.0032537135E-004 7.4047266192E-004 -3.4678947810E-004
1 1.5264932827E-004 9.3263518942E-004 -3.5205362458E-003 1.9332600101E-003 -2.3027335108E-003
2 -9.8735571502E-004 7.7177183895E-004 -1.6311830663E-003 3.7374078263E-004 -1.1338849320E-003
3 -2.2267753982E-004 4.5631164845E-003 -1.2805227755E-002 3.9967067646E-003 -1.2023590679E-002
4 -5.1722782688E-003 -4.3757731112E-003 -6.5561880794E-003 -6.2274289617E-003 -5.4242286711E-003
5 6.5472637324E-003 -1.1287788747E-002 1.8937046693E-002 -5.0006811267E-003 1.0199602824E-002
6 -5.7685226078E-003 -4.5935456207E-003 6.5591405092E-003 -3.1011377655E-002 -7.9382348181E-004
7 -2.2680665405E-002 -1.9338350120E-003 -2.9972765688E-003 -2.1071947728E-002 3.0156847654E-003
8 -3.3264515239E-002 1.0812126530E-002 -9.1466888768E-003 1.5170890552E-002 3.3044094214E-003
9 4.8928775025E-003 2.3007654009E-002 -2.0026482543E-001 -6.3285758846E-002 1.2554808336E-001
10 -7.4869041758E-002 -9.3178724533E-002 -7.0098856149E-002 -6.9485640501E-002 4.8962839723E-002
11 -8.7330564494E-002 -2.9314613543E-001 1.2458021507E-001 4.8763534298E-002 -1.5272144228E-001
12 -6.0132426168E-005 2.1286995818E-004 -1.2846479090E-003 7.7223667108E-004 -8.8648784383E-004
13 -4.8090893023E-005 5.0813447259E-005 1.2192474211E-004 1.3853537972E-004 -4.4975512069E-005
14 1.1509828375E-004 2.6955725919E-004 4.6853708025E-004 5.4636589826E-005 5.9585997916E-004
15 -2.5088560837E-004 -1.4490239429E-004 -1.6517113547E-005 -2.3547725232E-004 7.0506301073E-005
16 2.1741623849E-005 -3.0396484786E-005 1.6435437640E-004 7.6123660238E-005 1.1552303684E-004
17 2.7209709129E-004 2.7234932342E-005 5.0963084246E-005 1.0117936124E-004 -4.5931984725E-005
18 -2.5882735848E-004 -6.0031848430E-004 1.4070861538E-004 -1.1535910049E-004 -2.0001808065E-004
19 -1.9638025822E-004 -2.9919459983E-005 -1.9047914816E-005 -1.0580143635E-004 -1.3503643634E-004
20 8.4829116415E-007 5.5948891149E-004 -1.7619563318E-004 2.5127749619E-004 2.6202088722E-004
21 1.0652521780E-003 -4.4872868033E-004 9.0739586785E-004 -9.3299673048E-004 1.7126146660E-004
22 1.4954902653E-003 -1.4772362211E-003 9.8175151528E-004 6.8801505444E-005 1.2934673074E-003
23 2.4072903510E-004 6.3199689136E-004 -2.9460500091E-004 1.2731327319E-004 3.4007600115E-004
24 1.1952923145E-003 -2.1389995888E-002 7.2832026293E-004 7.9688600183E-003 1.9615297182E-002
25 9.4289717269E-002 1.0562741426E-001 -1.7552990896E-004 7.0060843371E-003 8.7782610441E-003
26 1.0562750999E-001 3.0308674016E-001 -1.6382699707E-003 -5.5832273099E-003 -1.1726448645E-002
27 -1.7551353029E-004 -1.6382784849E-003 2.0673701256E-001 8.2101212014E-002 -1.3115219203E-001
28 7.0060896795E-003 -5.5832572276E-003 8.2101377926E-002 8.7668224780E-002 -5.4259499038E-002
29 8.7782416309E-003 -1.1726450275E-002 -1.3115216547E-001 -5.4259354736E-002 1.5092602943E-001
This should be a 30x30 matrix and I'm trying:
data = pd.read_fwf('C:/Users/henri/Documents/Projects/Python-Lessons/ORCA/orca.hess',
widths=[9, 19, 19, 19, 19, 19])
But it reads as 185x6. I'd like to ignore the first column (numbering the lines) from 0-29 and I'm not using the columns indexes (from 0-29 too) to perform any mathematical operation. Also, Pandas is rounding my numbers and I'd like to keep the original format.
Here is a snip of my output:
Unnamed: 0 0 1 2 3 4
0 0.0 5.167712e-01 0.000055 0.032484 -0.189017 -0.006716
1 1.0 5.538011e-05 0.561599 -0.001900 -0.014738 -0.072598
2 2.0 3.248692e-02 -0.001900 0.567918 0.072316 0.001501
Any help is much appreciated, guys.
import pandas as pd
filename = 'data'
df = pd.read_fwf(filename, widths=[9, 19, 19, 19, 19, 19])
df = df.rename(columns={'Unnamed: 0':'row'})
df = df.dropna(subset=['row'], how='any')
df['col'] = df.groupby('row').cumcount()
df = df.pivot(index='row', columns='col')
df = df.dropna(how='any', axis=1)
df.columns = range(len(df.columns))
print(df.head())
yields
0 1 2 3 4 5 6 \
row
0.0 0.516771 0.066743 0.003457 -0.122105 -0.034812 -0.000420 0.000055
1.0 0.000055 0.001410 -0.004069 0.113803 -0.007838 0.000153 0.561599
2.0 0.032487 -0.126894 0.001377 -0.038375 -0.312054 -0.000987 -0.001900
3.0 -0.189014 0.047664 -0.118741 -0.012823 -0.002800 -0.000223 -0.014737
4.0 -0.006714 0.018558 -0.033281 0.023865 0.000398 -0.005172 -0.072598
7 8 9 ... 20 21 22 \
row ...
0.0 -0.025292 0.006609 0.114780 ... -0.113527 -0.051389 -0.001430
1.0 0.037536 0.017154 -0.214876 ... -0.228978 -0.001149 -0.001783
2.0 -0.013453 0.004940 0.059953 ... -0.053033 -0.038484 -0.011426
3.0 -0.131811 -0.084587 -0.001281 ... 0.000745 0.002222 0.001710
4.0 0.068311 -0.043605 -0.000225 ... 0.000638 0.000249 -0.012016
23 24 25 26 27 28 29
row
0.0 0.000740 -0.006716 -0.018400 -0.032196 -0.001829 0.001625 -0.000347
1.0 0.001933 -0.072598 -0.021579 -0.053402 -0.045442 0.001637 -0.002303
2.0 0.000374 0.001501 -0.008308 -0.062527 -0.007680 -0.000782 -0.001134
3.0 0.003997 0.031554 -0.218476 0.009665 0.000770 0.003718 -0.012024
4.0 -0.006227 0.273181 -0.073414 0.015382 -0.001846 -0.010472 -0.005424
[5 rows x 30 columns]
After parsing the file with
df = pd.read_fwf(filename, widths=[9, 19, 19, 19, 19, 19])
df = df.rename(columns={'Unnamed: 0':'row'})
the column headers can be identified by have a df['row'] value of NaN.
So they can be removed with
df = df.dropna(subset=['row'], how='any')
Now the row numbers keep repeating from 0 to 29. If we group by the row
value, then we can assign an intra-group "cumulative count" to the rows within
each group. That is, the first row of the group gets assigned the value 0, the
next row 1, etc. -- within that group -- and the process is repeated for each
group.
df['col'] = df.groupby('row').cumcount()
# row 0 1 2 3 4 col
# 0 0.0 5.167712e-01 0.000055 0.032484 -0.189017 -0.006716 0
# 1 1.0 5.538011e-05 0.561599 -0.001900 -0.014738 -0.072598 0
# 2 2.0 3.248692e-02 -0.001900 0.567918 0.072316 0.001501 0
# ...
# 182 27.0 -1.755135e-04 -0.001638 0.206737 0.082101 -0.131152 5
# 183 28.0 7.006090e-03 -0.005583 0.082101 0.087668 -0.054259 5
# 184 29.0 8.778242e-03 -0.011726 -0.131152 -0.054259 0.150926 5
Now the desired DataFrame can be obtained by pivoting:
df = df.pivot(index='row', columns='col')
and relabeling the columns:
df.columns = range(len(df.columns))
A more NumPy-based approach might look like this:
import numpy as np
import pandas as pd
filename = 'data'
df = pd.read_csv(filename, delim_whitespace=True)
arr = df.values
N = df.index.max()+1
arr = np.delete(arr, np.arange(N, len(arr), N+1), axis=0)
chunks = np.split(arr, np.arange(N, len(arr), N))
result = pd.DataFrame(np.hstack(chunks)).dropna(axis=1)
print(result)
This will also work for any sized matrix.

Categories

Resources