My source data is appended in the following array:
x,y,w,h = track_window
listing.append([x,y]):
where
x = [300, 300, 300, 296, 291, 294, 299, 284, 303, 323, 334, 343, 354, 362, 362, 362, 360, 361, 351]
and
y = [214, 216, 214, 214, 216, 216, 215, 219, 217, 220, 218, 218, 222, 222, 222, 223, 225, 224, 222]
So x values should be written to a text file and after that y values without comma and brackets like this form, where there is an empty space between every two numbers and (8) numbers in each row.
x:
300 300 300 296 291 294 299 284
303 323 334 343 354 362 362 362
360 361 351
y:
300 300 300 296 291 294 299 284
303 323 334 343 354 362 362 362
360 361 351
How can i achieve it?
What i did:
with open('text.txt', 'rb') as f:
val = pickle.load(f)
for i in range(2):
if i==0:
#def Extract(lst):
a = [item[0] for item in val]
#return a
if i==1:
#def Extract(lst):
b = [item[1] for item in val]
#return b
#print(val)
#print(Extract(val))
print(a)
print(b)
f.close()
Thank you
It would be good to see what you have tried. Nonetheless, this will work for you.
def write_in_chunks(f, lst, n):
for i in range(0, len(lst), n):
chunk = lst[i : i+n]
f.write(" ".join(str(val) for val in chunk) + "\n")
x = [300, 300, 300, 296, 291, 294, 299, 284, 303, 323, 334, 343, 354, 362, 362, 362, 360, 361, 351]
y = [214, 216, 214, 214, 216, 216, 215, 219, 217, 220, 218, 218, 222, 222, 222, 223, 225, 224, 222]
with open("output.txt", "w") as f:
write_in_chunks(f, x, 8)
write_in_chunks(f, y, 8)
Creates output.txt containing
300 300 300 296 291 294 299 284
303 323 334 343 354 362 362 362
360 361 351
214 216 214 214 216 216 215 219
217 220 218 218 222 222 222 223
225 224 222
Adding extra blank lines in the output is left as an exercise for the reader... (hint: see where existing newlines are written).
Related
Here's a sample data frame:
import pandas as pd
sample_dframe = pd.DataFrame.from_dict(
{
"id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
"V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
"V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
"V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
"V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
}
)
The data frame looks like this:
The above sample shape is (22, 5) and has columns id, V1..V4. I need to convert this into a multi index data frame (as a time series), where for a given id, I need to group 5 values (time steps) from each of V1..V4 for a given id.
i.e., it should give me a frame of shape (2, 4, 5) since there are 2 unique id values.
IIUC, you might just want:
sample_dframe.set_index('id').stack()
NB. the output is a Series, for a DataFrame add .to_frame(name='col_name').
Output:
id
123 V1 2552
V2 3434
V3 382
V4 32628
V1 813
...
456 V4 4088920
V1 7279544
V2 4453
V3 58830
V4 6323521
Length: 88, dtype: int64
Or, maybe:
(sample_dframe
.assign(time=lambda d: d.groupby('id').cumcount())
.set_index(['id', 'time']).stack()
.swaplevel('time', -1)
)
Output:
id time
123 V1 0 2552
V2 0 3434
V3 0 382
V4 0 32628
V1 1 813
...
456 V4 10 4088920
V1 11 7279544
V2 11 4453
V3 11 58830
V4 11 6323521
Length: 88, dtype: int64
import itertools
import timeit
from pandas import DataFrame
import numpy as np
import pandas as pd
from datetime import datetime
from pandas import DataFrame
import functools as ft
df= pd.DataFrame.from_dict(
{
"id": [123, 123, 123, 123, 123, 123, 123, 123, 123, 123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456],
"V1": [2552, 813, 496, 401, 4078, 952, 7279, 544, 450,548, 433,4696, 244,9735, 4263,642, 255,2813, 496,401, 4078952, 7279544],
"V2": [3434, 133, 424, 491, 8217, 915, 7179, 5414, 450, 548, 433, 4696, 244, 9735, 4263, 642, 255, 2813, 496, 401, 4952, 4453],
"V3": [382,161, 7237, 7503, 561, 6801, 1072, 9660, 62107, 6233, 5403, 3745, 8613, 6302, 557, 4256, 9874, 3013, 9352, 4522, 3232, 58830],
"V4": [32628, 4471, 4781, 1497, 45104, 8657, 81074, 1091, 370835, 2058, 4447, 7376, 302237, 6833, 48348, 3545, 4263,642, 255,2813, 4088920, 6323521]
}
)
print(df)
"""
id V1 V2 V3 V4
0 123 2552 3434 382 32628
1 123 813 133 161 4471
2 123 496 424 7237 4781
3 123 401 491 7503 1497
4 123 4078 8217 561 45104
5 123 952 915 6801 8657
6 123 7279 7179 1072 81074
7 123 544 5414 9660 1091
8 123 450 450 62107 370835
9 123 548 548 6233 2058
10 456 433 433 5403 4447
11 456 4696 4696 3745 7376
12 456 244 244 8613 302237
13 456 9735 9735 6302 6833
14 456 4263 4263 557 48348
15 456 642 642 4256 3545
16 456 255 255 9874 4263
17 456 2813 2813 3013 642
18 456 496 496 9352 255
19 456 401 401 4522 2813
20 456 4078952 4952 3232 4088920
21 456 7279544 4453 58830 6323521
"""
df = df.set_index('id').stack().reset_index().drop(columns = 'level_1').rename(columns = {0:'V1_new'})
print(df)
"""
id V1_new
0 123 2552
1 123 3434
2 123 382
3 123 32628
4 123 813
.. ... ...
83 456 4088920
84 456 7279544
85 456 4453
86 456 58830
87 456 6323521
"""
I have a dataframe as follows. The values are in a cell, a list of elements. I want to visualize distribution of the values from the "Values" column using histogram"S" stacked in rows OR separated by colours (Area_code).
How can I get the values and construct histogram"S" in plotly? Any other idea also welcome. Thank you.
Area_code Values
0 New_York [999, 54, 231, 43, 177, 313, 212, 279, 199, 267]
1 Dallas [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316]
2 XXX [560]
3 YYY [884, 13]
4 ZZZ [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]
If you reshape your data, this would be a perfect case for px.histogram. And from there you can opt between several outputs like sum, average, count through the histfunc method:
fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()
You haven't specified what kind of output you're aiming for, but I'll leave it up to you to change the argument for histfunc and see which option suits your needs best.
I'm often inclined to urge users to rethink their entire data process, but I'm just going to assume that there are good reasons why you're stuck with what seems like a pretty weird setup in your dataframe. The snippet below contains a complete data munginge process to reshape your data from your setup, to a so-called long format:
Area_code Values
0 New_York 999
1 New_York 54
2 New_York 231
3 New_York 43
4 New_York 177
5 New_York 313
6 New_York 212
7 New_York 279
8 New_York 199
9 New_York 267
10 Dallas 915
11 Dallas 183
12 Dallas 2326
13 Dallas 316
14 Dallas 206
15 Dallas 31
16 Dallas 317
17 Dallas 26
18 Dallas 31
19 Dallas 56
20 Dallas 316
21 XXX 560
22 YYY 884
23 YYY 13
24 ZZZ 203
And this is a perfect format for many of the great functionalites of plotly.express.
Complete code:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
# data input
df = pd.DataFrame({'Area_code': {0: 'New_York', 1: 'Dallas', 2: 'XXX', 3: 'YYY', 4: 'ZZZ'},
'Values': {0: [999, 54, 231, 43, 177, 313, 212, 279, 199, 267],
1: [915, 183, 2326, 316, 206, 31, 317, 26, 31, 56, 316],
2: [560],
3: [884, 13],
4: [203, 1066, 453, 266, 160, 109, 45, 627, 83, 685, 120, 410, 151, 33, 618, 164, 496]}})
# data munging
areas = []
value = []
for i, row in df.iterrows():
# print(row['Values'])
for j, val in enumerate(row['Values']):
areas.append(row['Area_code'])
value.append(val)
df = pd.DataFrame({'Area_code': areas,
'Values': value})
# plotly
fig = px.histogram(df, x = 'Area_code', y = 'Values', histfunc='sum')
fig.show()
I have searched around but have not found an exact way to do this. I have a data frame of several baseball teams and their statistics. Like the following:
RK TEAM GP AB R H 2B 3B HR TB RBI AVG OBP SLG OPS
1 Milwaukee 163 5542 754 1398 252 24 218 2352 711 .252.323 .424 .747
2 Chicago Cubs 163 5624 761 1453 286 34 167 2308 722 .258.333 .410 .744
3 LA Dodgers 163 5572 804 1394 296 33 235 2461 756 .250.333 .442 .774
4 Colorado 163 5541 780 1418 280 42 210 2412 748 .256.322 .435 .757
5 Baltimore 162 5507 622 1317 242 15 188 2153 593 .239.298 .391 .689
I want to be able to plot two teams on the X-axis and then perhaps 3 metrics (ex: R, H, TB) on the Y-axis with the two teams side by side in bar chart format. I haven't been able to figure out how to do this. Any ideas?
Thank you.
My approach was to create a new dataframe which only contains the columns you are interested in plotting:
import pandas as pd
import matplotlib.pyplot as plt
data = [[1, 'Milwaukee', 163, 5542, 754, 1398, 252, 24, 218, 2352, 711, .252, .323, .424, .747],
[2, 'Chicago Cubs', 163, 5624, 761, 1453, 286, 34, 167, 2308, 722, .258, .333, .410, .744],
[3, 'LA Dodgers', 163, 5572, 804, 1394, 296, 33, 235, 2461, 756, .250, .333, .442, .774],
[4, 'Colorado', 163, 5541, 780, 1418, 280, 42, 210, 2412, 748, .256, .322, .435, .757],
[5, 'Baltimore', 162, 5507, 622, 1317, 242, 15, 188, 2153, 593, .239, .298, .391, .689]]
df = pd.DataFrame(data, columns=['RK', 'TEAM', 'GP', 'AB', 'R', 'H', '2B', '3B', 'HR', 'TB', 'RBI', 'AVG', 'OBP', 'SLG', 'OPS'])
dfplot = df[['TEAM', 'R', 'H', 'TB']].copy()
fig = plt.figure()
ax = fig.add_subplot(111)
width = 0.4
dfplot.plot(kind='bar', x='TEAM', ax=ax, width=width, position=1)
plt.show()
This creates the following output:
I have 3 strings
a=38 186 298 345 0.93345
27 198 277 389 0.86006
33 127 293 354 0.89782
Type(a)
len(a) shows it as 22 including splace between 2 numbers
Want to convert them to list
Need as below
b=[[38 186 298 345 0.93345][27 198 277 389 0.86006][33 127 293 354 0.89782]]
Is this what you aim for:
a = '''38 186 298 345 0.93345
27 198 277 389 0.86006
33 127 293 354 0.89782'''
b = [line.split() for line in a.split('\n')]
b
#[['38', '186', '298', '345', '0.93345'],
# ['27', '198', '277', '389', '0.86006'],
# ['33', '127', '293', '354', '0.89782']]
Split them by newline and spaces. Use this python function.
string_name.split(str="")
For more info: https://www.tutorialspoint.com/python/string_split.htm
This is one solution specific to your data.
Note your inputs are not valid Python, I have resolved that below.
a1 = '38 186 298 345 0.93345'
a2 = '27 198 277 389 0.86006'
a3 = '33 127 293 354 0.89782'
res = [[float(j) if float(j) < 1 else int(j) for j in i.split()] \
for i in [a1, a2, a3]]
# [[38, 186, 298, 345, 0.93345],
# [27, 198, 277, 389, 0.86006],
# [33, 127, 293, 354, 0.89782]]
My original dataset is 7049 images(96x96) with following format:
train_x.shape= (7049,)
train_x[:3]
0 238 236 237 238 240 240 239 241 241 243 240 23...
1 219 215 204 196 204 211 212 200 180 168 178 19...
2 144 142 159 180 188 188 184 180 167 132 84 59 ...
Name: Image, dtype: object
I want to split image-string into 96x96 and get the (7049,96,96) array.
I try this method:
def split_reshape(row):
return np.array(row.split(' ')).reshape(96,96)
result = train_x.apply(split_reshape)
Then I still got result.shape=(7049,)
How to reshape to (7049,96,96) ?
Demo:
Source Series:
In [129]: train_X
Out[129]:
0 238 236 237 238 240 240 239 241 241
1 219 215 204 196 204 211 212 200 180
2 144 142 159 180 188 188 184 180 167
Name: 1, dtype: object
In [130]: type(train_X)
Out[130]: pandas.core.series.Series
In [131]: train_X.shape
Out[131]: (3,)
Solution:
In [132]: X = train_X.str \
.split(expand=True) \
.astype(np.int16) \
.values.reshape(len(train_X), 3, 3)
In [133]: X
Out[133]:
array([[[238, 236, 237],
[238, 240, 240],
[239, 241, 241]],
[[219, 215, 204],
[196, 204, 211],
[212, 200, 180]],
[[144, 142, 159],
[180, 188, 188],
[184, 180, 167]]], dtype=int16)
In [134]: X.shape
Out[134]: (3, 3, 3)