I am changing an old question of mine.
I have a file with this format; 4 values per line:
2623 831 6892 0
2353 1803 3425 0
1910 1823 3810 0
1637 1287 2811 0
2803 546 6609 0
1591 2157 2367 0
2167 1906 2665 0
3192 2168 8362 0
3903 1465 2011 0
2355 1801 2004 0
2390 796 5055 0
1703 1044 3441 0
1886 1328 2731 0
1496 1277 3074 0
1827 460 5992 0
1945 1785 2065 0
1983 1963 2818 0
1532 2229 6936 0
2449 5972 1918 0
2699 2007 1581 0
and I want to get this one; 10 values per line:
2623 831 6892 0 2353 1803 3425 0 1910 1823
3810 0 1637 1287 2811 0 2803 546 6609 0
1591 2157 2367 0 2167 1906 2665 0 3192 2168
8362 0 3903 1465 2011 0 2355 1801 2004 0
2390 796 5055 0 1703 1044 3441 0 1886 1328
2731 0 1496 1277 3074 0 1827 460 5992 0
1945 1785 2065 0 1983 1963 2818 0 1532 2229
6936 0 2449 5972 1918 0 2699 2007 1581 0
with open("Read_file") as f1:
with open("Write_file"),"w") as f2:
f2.writelines(itertools.islice(f1, 4, None))
Any tip is appreciated.
Try this:
with open('data.txt') as fp, open('output.txt', 'w') as fw:
data = fp.read().replace('\n', ' ').split()
for i in range(0, len(data) // 10):
fw.write(' '.join(data[i * 10: (i + 1) * 10]) + '\n')
Output:
2623 831 6892 0 2353 1803 3425 0 1910 1823
3810 0 1637 1287 2811 0 2803 546 6609 0
1591 2157 2367 0 2167 1906 2665 0 3192 2168
8362 0 3903 1465 2011 0 2355 1801 2004 0
2390 796 5055 0 1703 1044 3441 0 1886 1328
2731 0 1496 1277 3074 0 1827 460 5992 0
1945 1785 2065 0 1983 1963 2818 0 1532 2229
6936 0 2449 5972 1918 0 2699 2007 1581 0
A version that does not rely on reading the whole file into memory:
def get_words(f):
for line in f:
for word in line.split():
yield word
def chunk_values(iterator, num):
while True:
yield [next(iterator) for _ in range(num)]
with open('input.txt') as fin, open('output.txt', 'w') as fout:
for chunk in chunk_values(get_words(fin), 10):
fout.write(' '.join(chunk) + '\n')
Related
I have similar data as below in my pandas dataframe.
Date
A
B
C
D
01-01-2022
10000
1700
1457
327
02-01-2022
17000
3000
1245
526
03-01-2022
16000
2624
1478
632
04-01-2022
10138
1745
1325
800
05-01-2022
4761
1789
1475
952
06-01-2022
5000
1874
1423
1105
07-01-2022
3000
1965
1421
895
08-01-2022
4000
1847
1420
1410
09-01-2022
3001
1654
1418
564
10-01-2022
3002
1754
1417
1715
11-01-2022
3003
1598
1415
564
12-01-2022
3004
1515
1414
2020
13-01-2022
3005
1433
1412
564
14-01-2022
3006
1350
1411
2325
15-01-2022
3007
1268
1409
456
Table
How can I get separate plots side by side as date vs A, Date vs B, Date Vs C and so on, using python?
I am still learning, new to python and data visualization.
Try this, using pandas plot with subplots equal to True, and layout with (row, column) tuple:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')
df.set_index('Date').plot(subplots=True, layout=(1,4), figsize=(15,7))
Output:
I need to calculate the Conversion Rate, for that I receive two reports which I format and create 2 pivot tables and at the moment of dividing
dfRealTase ['TC'] = dfTasaRealSales ['VolumenTotal'] / dfTasaRealPaso ['Traffic']
I get the following ValueError error: cannot join with no overlapping index names
This I do not know what it can be is the first time that I develop in phyton, I made sure that both columns are of the same type and that they do not have NaN or inf values
dfTasaRealPaso = formatTrafico()
dfTasaRealVentas = formatVentas()
dfTasaReal = pd.DataFrame()
dfTasaRealPaso.dropna(inplace=True)
dfTasaRealVentas.dropna(inplace=True)
dfTasaRealPaso.Trafico = dfTasaRealPaso.Trafico.astype('float64')
print(type(dfTasaRealPaso['Trafico'][0]))
print(type(dfTasaRealVentas['VolumenTotal'][0]))
dfTasaRealPaso = pd.pivot_table(dfTasaRealPaso, index=["ID", "month", "day", "dia_semana"], columns=["Bloque"], values=["Trafico"],fill_value=0)
dfTasaRealVentas = pd.pivot_table(dfTasaRealVentas, index = ["CodigoEstacion", "Mes", "Dia", "Day_Week"], columns=["Hora"], values=["VolumenTotal"],fill_value=0)
print(dfTasaRealPaso.head())
print(dfTasaRealVentas.head())
dfTasaReal['TC'] = dfTasaRealVentas['VolumenTotal']/dfTasaRealPaso['Trafico']
print( pd.pivot_table(dfTasaReal, index = ["ID", "month", "day", "dia_semana"], columns=["Bloque"], values=["TC"], fill_value=0) )
return dfTasaReal
<class 'numpy.float64'>
<class 'numpy.float64'>
Trafico
Bloque 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
ID month day dia_semana
60008 5 1 Saturday 1812 1919 1768 1725 2009 1871 1883 1974 2441 2999 3840 4563 4634 4552 4421 3983 3337 3338 3271 3155 2299 2368 2032 1965
2 Sunday 2040 1892 1753 1652 1910 1720 1839 2024 2425 3295 4349 4292 5552 6447 6554 5618 4716 5302 4033 2177 1598 1544 1228 1043
3 Monday 915 1075 1078 1557 1598 1958 2188 2689 2408 2781 2939 3152 2618 2603 1920 1550 2074 2107 2036 2735 2016 1869 1452 1774
4 Tuesday 1784 1711 1715 1448 1488 1564 1911 2623 2377 3061 3181 3613 3400 3477 3039 2949 2812 4505 4387 3498 2433 1861 1635 1669
5 Wednesday 1721 1492 1278 1180 1265 1224 1622 1974 2258 2920 3254 3279 3260 3307 3503 3833 4150 4003 3968 3073 2518 2112 2234 2016
VolumenTotal ...
Hora 0 1 2 3 4 5 6 7 8 ... 15 16 17 18 19 20 21 22 23
CodigoEstacion Mes Dia Day_Week ...
60008 5 1 Saturday 45.403 0.000 0.0 0.000 0.000 0.000 121.599 302.913 990.649 ... 1335.588 1070.339 948.034 1222.074 1046.493 820.716 424.954 96.385 117.888
2 Sunday 5.464 60.062 0.0 8.255 43.535 19.926 34.935 274.671 662.555 ... 1086.487 1354.115 1601.889 1689.020 1792.355 1414.050 384.982 97.451 46.306
3 Monday 0.000 0.000 0.0 0.000 17.689 80.715 456.376 1213.109 1230.580 ... 1989.980 2290.871 3180.712 1946.372 1745.390 1480.999 692.089 230.541 22.371
4 Tuesday 0.000 0.000 0.0 0.000 0.000 207.012 795.940 1034.076 1398.334 ... 1356.560 1746.760 1501.602 1504.128 1415.234 1294.313 705.502 389.561 66.634
5 Wednesday 7.075 15.330 0.0 0.000 0.000 20.753 303.574 813.140 1714.702 ... 956.816 2039.040 1753.388 1838.117 1708.605 1616.878 822.072 279.777 250.937
[5 rows x 24 columns]
Error:
Traceback (most recent call last):
File "\Automatizacion\automatization.py", line 107, in <module>
main()
File "\Automatizacion\automatization.py", line 12, in main
tcReal()
File "\Automatizacion\automatization.py", line 89, in tcReal
packages\pandas\core\indexes\base.py", line 3739, in _join_multi
raise ValueError("cannot join with no overlapping index names")
ValueError: cannot join with no overlapping index names
I want to make a new column by condition sum.
But I don't know how to make it in this status:
index
Date
Open
High
Low
Close
Volume
Change
target
0
2020-12-14
2205
2205
2150
2185
180466
-0.011312
0
1
2020-12-15
2195
2195
2155
2185
139561
0.000000
0
2
2020-12-16
2195
2430
2180
2370
2909662
0.084668
1
3
2020-12-17
2425
2425
2290
2330
587914
-0.016878
0
4
2020-12-18
2335
2355
2225
2295
374375
-0.015021
0
5
2020-12-21
2250
2350
2250
2305
264192
0.004357
0
6
2020-12-22
2330
2345
2245
2255
327715
-0.021692
0
7
2020-12-23
2260
2300
2185
2220
194277
-0.015521
0
8
2020-12-24
2220
2235
2155
2165
208300
-0.024775
0
9
2020-12-28
2170
2180
2110
2120
201740
-0.020785
0
last = len(df)
df['Gap_up_cnt_3d'] = [df.loc[last-j-3:last-j-1,"Close"].sum() if t< 19 else t in list(df.index.values)> ]
I want:
index(9)["Gap_up_cnt_df'] = sum(index(9).Close ,index(8).Close, index(7).Close)
index(8)["Gap_up_cnt_df'] = sum(index(8).Close ,index(7).Close, index(6).Close)
NB: This is not an exact expression, just to convey meaning
I have the following DataFrame:
Channel Column 1 Column 2 Column 3
Date
12/30/2018 638 4472 487
12/31/2018 868 6985 540
1/1/2019 755 4401 829
1/2/2019 1655 9484 1145
1/3/2019 2002 14212 1158
1/4/2019 1633 9575 1098
1/5/2019 1026 5575 941
1/6/2019 1025 4963 1007
1/7/2019 1944 10685 1246
1/8/2019 2140 9932 1151
1/9/2019 2067 1031 1087
1/10/2019 2168 1005 1074
1/11/2019 2052 9371 909
1/12/2019 1223 5953 895
1/13/2019 1268 4809 827
I would like to return the following result if possible [essentially reduce values between certain dates in a specific column to zero]
Channel Column 1 Column 2 Column 3
Date
12/30/2018 638 4472 487
12/31/2018 868 6985 540
1/1/2019 755 4401 829
1/2/2019 1655 9484 1145
1/3/2019 2002 14212 1158
1/4/2019 1633 9575 1098
1/5/2019 1026 5575 941
1/6/2019 0 4963 1007
1/7/2019 0 10685 1246
1/8/2019 0 9932 1151
1/9/2019 0 1031 1087
1/10/2019 2168 1005 1074
1/11/2019 2052 9371 909
1/12/2019 1223 5953 895
1/13/2019 1268 4809 827
I am trying to filter by a specific column at specific dates, but I can't get it to work properly.
I have tried the following approaches, but I haven't had much luck
df[df['Channel'] == 'Branded Paid Search'].loc['1/6/2019':'1/9/2019']['Sessions'].apply(lambda x: 0 if x < 4000 else 0).to_frame()
This works, but not sure how to get the values back into the original dataframe.
I tried this:
def zero(df):
if df[df['Column 1'] > 0].loc['1/6/2019':'1/9/2019']:
return 0
else:
return 1
df.apply(zero, axis=1)
ValueError: ('The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().')
I tried this:
sessions_df[sessions_df['Column 1'] > 0].loc['1/6/2019':'1/9/2019'] = 0
Nothing changes.
Any help would be appreciated
First create DatetimeIndex by to_datetime and then set values with DataFrame.loc:
df.index = pd.to_datetime(df.index)
df.loc['1/6/2019':'1/9/2019', 'Column 1'] = 0
print (df)
Column 1 Column 2 Column 3
Channel
2018-12-30 638 4472 487
2018-12-31 868 6985 540
2019-01-01 755 4401 829
2019-01-02 1655 9484 1145
2019-01-03 2002 14212 1158
2019-01-04 1633 9575 1098
2019-01-05 1026 5575 941
2019-01-06 0 4963 1007
2019-01-07 0 10685 1246
2019-01-08 0 9932 1151
2019-01-09 0 1031 1087
2019-01-10 2168 1005 1074
2019-01-11 2052 9371 909
2019-01-12 1223 5953 895
2019-01-13 1268 4809 827
When I write the following code I get garbage for an output. It is just a simple program to find prime numbers. It works when the first for loops range only goes up to 1000 but once the range becomes large the program fail's to output meaningful data
output = open("output.dat", 'w')
for i in range(2, 10000):
prime = 1
for j in range(2, i-1):
if i%j == 0:
prime = 0
j = i-1
if prime == 1:
output.write(str(i) + " " )
output.close()
print "writing finished"
This is a known Notepad bug. Check out
http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx
The classic way to trigger this bug is to put "Bush hid the facts" in a file, save it, reopen it, and scream about conspiracy theories, but I guess "2 3 5 7 11 13 17" works too, except that you don't get to scream about conspiracy theories.
You're setting a single variable named prime ten thousand times to 1, then 9998 times possibly setting it to 0, and finally (if it's not been set to 0) outputting one incomplete line (no line-end). I suspect that's not what you want to do! Maybe something like...:
output = open("output.dat", 'w')
for i in range(2, 10000):
prime = 1
for j in range(2, i-1):
if i%j == 0:
prime = 0
break
if prime == 1:
output.write(str(i) + " " )
output.close()
print "writing finished"
Note the very different indentation from what you had posted. I also used break to break out of an inner loop, which I think was what you meant where you wrote j = i - 1 (which would in fact have absolutely no effect since j would just be set to its next natural value in the very next leg of that inner loop, which would still run to the end).
With fixed indentation (which I'll have to assume is a bad paste job, otherwise I don't think it would run) your code outputs fine for me :
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 1009 1013 1019 1021 1031 1033 1039 1049 1051 1061 1063 1069 1087 1091 1093 1097 1103 1109 1117 1123 1129 1151 1153 1163 1171 1181 1187 1193 1201 1213 1217 1223 1229 1231 1237 1249 1259 1277 1279 1283 1289 1291 1297 1301 1303 1307 1319 1321 1327 1361 1367 1373 1381 1399 1409 1423 1427 1429 1433 1439 1447 1451 1453 1459 1471 1481 1483 1487 1489 1493 1499 1511 1523 1531 1543 1549 1553 1559 1567 1571 1579 1583 1597 1601 1607 1609 1613 1619 1621 1627 1637 1657 1663 1667 1669 1693 1697 1699 1709 1721 1723 1733 1741 1747 1753 1759 1777 1783 1787 1789 1801 1811 1823 1831 1847 1861 1867 1871 1873 1877 1879 1889 1901 1907 1913 1931 1933 1949 1951 1973 1979 1987 1993 1997 1999 2003 2011 2017 2027 2029 2039 2053 2063 2069 2081 2083 2087 2089 2099 2111 2113 2129 2131 2137 2141 2143 2153 2161 2179 2203 2207 2213 2221 2237 2239 2243 2251 2267 2269 2273 2281 2287 2293 2297 2309 2311 2333 2339 2341 2347 2351 2357 2371 2377 2381 2383 2389 2393 2399 2411 2417 2423 2437 2441 2447 2459 2467 2473 2477 2503 2521 2531 2539 2543 2549 2551 2557 2579 2591 2593 2609 2617 2621 2633 2647 2657 2659 2663 2671 2677 2683 2687 2689 2693 2699 2707 2711 2713 2719 2729 2731 2741 2749 2753 2767 2777 2789 2791 2797 2801 2803 2819 2833 2837 2843 2851 2857 2861 2879 2887 2897 2903 2909 2917 2927 2939 2953 2957 2963 2969 2971 2999 3001 3011 3019 3023 3037 3041 3049 3061 3067 3079 3083 3089 3109 3119 3121 3137 3163 3167 3169 3181 3187 3191 3203 3209 3217 3221 3229 3251 3253 3257 3259 3271 3299 3301 3307 3313 3319 3323 3329 3331 3343 3347 3359 3361 3371 3373 3389 3391 3407 3413 3433 3449 3457 3461 3463 3467 3469 3491 3499 3511 3517 3527 3529 3533 3539 3541 3547 3557 3559 3571 3581 3583 3593 3607 3613 3617 3623 3631 3637 3643 3659 3671 3673 3677 3691 3697 3701 3709 3719 3727 3733 3739 3761 3767 3769 3779 3793 3797 3803 3821 3823 3833 3847 3851 3853 3863 3877 3881 3889 3907 3911 3917 3919 3923 3929 3931 3943 3947 3967 3989 4001 4003 4007 4013 4019 4021 4027 4049 4051 4057 4073 4079 4091 4093 4099 4111 4127 4129 4133 4139 4153 4157 4159 4177 4201 4211 4217 4219 4229 4231 4241 4243 4253 4259 4261 4271 4273 4283 4289 4297 4327 4337 4339 4349 4357 4363 4373 4391 4397 4409 4421 4423 4441 4447 4451 4457 4463 4481 4483 4493 4507 4513 4517 4519 4523 4547 4549 4561 4567 4583 4591 4597 4603 4621 4637 4639 4643 4649 4651 4657 4663 4673 4679 4691 4703 4721 4723 4729 4733 4751 4759 4783 4787 4789 4793 4799 4801 4813 4817 4831 4861 4871 4877 4889 4903 4909 4919 4931 4933 4937 4943 4951 4957 4967 4969 4973 4987 4993 4999 5003 5009 5011 5021 5023 5039 5051 5059 5077 5081 5087 5099 5101 5107 5113 5119 5147 5153 5167 5171 5179 5189 5197 5209 5227 5231 5233 5237 5261 5273 5279 5281 5297 5303 5309 5323 5333 5347 5351 5381 5387 5393 5399 5407 5413 5417 5419 5431 5437 5441 5443 5449 5471 5477 5479 5483 5501 5503 5507 5519 5521 5527 5531 5557 5563 5569 5573 5581 5591 5623 5639 5641 5647 5651 5653 5657 5659 5669 5683 5689 5693 5701 5711 5717 5737 5741 5743 5749 5779 5783 5791 5801 5807 5813 5821 5827 5839 5843 5849 5851 5857 5861 5867 5869 5879 5881 5897 5903 5923 5927 5939 5953 5981 5987 6007 6011 6029 6037 6043 6047 6053 6067 6073 6079 6089 6091 6101 6113 6121 6131 6133 6143 6151 6163 6173 6197 6199 6203 6211 6217 6221 6229 6247 6257 6263 6269 6271 6277 6287 6299 6301 6311 6317 6323 6329 6337 6343 6353 6359 6361 6367 6373 6379 6389 6397 6421 6427 6449 6451 6469 6473 6481 6491 6521 6529 6547 6551 6553 6563 6569 6571 6577 6581 6599 6607 6619 6637 6653 6659 6661 6673 6679 6689 6691 6701 6703 6709 6719 6733 6737 6761 6763 6779 6781 6791 6793 6803 6823 6827 6829 6833 6841 6857 6863 6869 6871 6883 6899 6907 6911 6917 6947 6949 6959 6961 6967 6971 6977 6983 6991 6997 7001 7013 7019 7027 7039 7043 7057 7069 7079 7103 7109 7121 7127 7129 7151 7159 7177 7187 7193 7207 7211 7213 7219 7229 7237 7243 7247 7253 7283 7297 7307 7309 7321 7331 7333 7349 7351 7369 7393 7411 7417 7433 7451 7457 7459 7477 7481 7487 7489 7499 7507 7517 7523 7529 7537 7541 7547 7549 7559 7561 7573 7577 7583 7589 7591 7603 7607 7621 7639 7643 7649 7669 7673 7681 7687 7691 7699 7703 7717 7723 7727 7741 7753 7757 7759 7789 7793 7817 7823 7829 7841 7853 7867 7873 7877 7879 7883 7901 7907 7919 7927 7933 7937 7949 7951 7963 7993 8009 8011 8017 8039 8053 8059 8069 8081 8087 8089 8093 8101 8111 8117 8123 8147 8161 8167 8171 8179 8191 8209 8219 8221 8231 8233 8237 8243 8263 8269 8273 8287 8291 8293 8297 8311 8317 8329 8353 8363 8369 8377 8387 8389 8419 8423 8429 8431 8443 8447 8461 8467 8501 8513 8521 8527 8537 8539 8543 8563 8573 8581 8597 8599 8609 8623 8627 8629 8641 8647 8663 8669 8677 8681 8689 8693 8699 8707 8713 8719 8731 8737 8741 8747 8753 8761 8779 8783 8803 8807 8819 8821 8831 8837 8839 8849 8861 8863 8867 8887 8893 8923 8929 8933 8941 8951 8963 8969 8971 8999 9001 9007 9011 9013 9029 9041 9043 9049 9059 9067 9091 9103 9109 9127 9133 9137 9151 9157 9161 9173 9181 9187 9199 9203 9209 9221 9227 9239 9241 9257 9277 9281 9283 9293 9311 9319 9323 9337 9341 9343 9349 9371 9377 9391 9397 9403 9413 9419 9421 9431 9433 9437 9439 9461 9463 9467 9473 9479 9491 9497 9511 9521 9533 9539 9547 9551 9587 9601 9613 9619 9623 9629 9631 9643 9649 9661 9677 9679 9689 9697 9719 9721 9733 9739 9743 9749 9767 9769 9781 9787 9791 9803 9811 9817 9829 9833 9839 9851 9857 9859 9871 9883 9887 9901 9907 9923 9929 9931 9941 9949 9967 9973
EDIT the version of indentation I ran:
output = open("output.dat", 'w')
for i in range(2, 10000):
prime = 1
for j in range(2, i-1):
if i%j == 0:
prime = 0
j = i-1
if prime == 1:
output.write(str(i) + " " )
output.close()
print "writing finished"
Your second for should be nested in the first for.
Also, this looks like a homework question. It is not clear how your output is garbage - does it not compute what you want? Or is the output scrambled? Post a copy of the output so we can see!
Don't you want your loops to be nested?
output = open("output.dat", 'w')
for i in range(2, 10000):
prime = 1
for j in range(2, i-1):
if i%j == 0:
prime = 0
j = i-1
if prime == 1:
output.write(str(i) + " " )
output.close()
print "writing finished"
so, you set prime to 1, 9998 times
then you use the final value of i (10000?, 10001?) as an end value
....
to summarize, you have serious indention problems....