Related
What is the meaning of this error Invalid character "\u64e" in token Pylance the read error line under Acc for this code, How can fixed it?
err = calculateCError()
print('Error is:', err, '%')
َAcc = 100 - err
print('َAccuracy is:',َAcc , '%')
Here's how to debug something like this:
s = """err = calculateCError()
print('Error is:', err, '%')
َAcc = 100 - err
print('َAccuracy is:',َAcc , '%')"""
print([hex(ord(c)) for c in s])
['0x65', '0x72', '0x72', '0x20', '0x3d', '0x20', '0x63', '0x61', '0x6c',
'0x63', '0x75', '0x6c', '0x61', '0x74', '0x65', '0x43', '0x45', '0x72',
'0x72', '0x6f', '0x72', '0x28', '0x29', '0xa', '0x70', '0x72', '0x69',
'0x6e', '0x74', '0x28', '0x27', '0x45', '0x72', '0x72', '0x6f', '0x72',
'0x20', '0x69', '0x73', '0x3a', '0x27', '0x2c', '0x20', '0x65', '0x72',
'0x72', '0x2c', '0x20', '0x27', '0x25', '0x27', '0x29', '0xa', '0x64e',
'0x41', '0x63', '0x63', '0x20', '0x3d', '0x20', '0x31', '0x30', '0x30',
'0x20', '0x2d', '0x20', '0x65', '0x72', '0x72', '0xa', '0x70', '0x72',
'0x69', '0x6e', '0x74', '0x28', '0x27', '0x64e', '0x41', '0x63', '0x63',
'0x75', '0x72', '0x61', '0x63', '0x79', '0x20', '0x69', '0x73', '0x3a',
'0x27', '0x2c', '0x64e', '0x41', '0x63', '0x63', '0x20', '0x2c', '0x20',
'0x27', '0x25', '0x27', '0x29']
And sure enough, there are three instances of 0x64E, always appearing before 0x41 (A). In fact, if you look carefully at your A characters, you will notice a faint slanted accent line above the A. This is called Arabic Fatha in Unicode. Here is a 320% zoom from my browser showing it more obviously:
I have issues getting my data to save properly into my csv. I have a few sets of arrays in x[zpeaks] however when i save data it only saves the last array and not all of them.
Like say for example my x[zpeaks] contains [1,2,1 ],[1,4,1],[1,3,5]. but when i wanna save all the arrays in the csv file it will only save the last array being [1,3,5].
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
import pdb
import pandas as pd
t = []
z = []
y = []
x = []
with open("Data1r2.txt", 'r') as f:
for line in f:
parts = line.split(", ")
x.append(float(parts[0][2:]))
y.append(float(parts[1][2:]))
z.append(float(parts[2][2:]))
t.append(float(parts[3][2:]))
zz = np.array(z)
tt = np.array(t)
zminvalue = np.min(zz)
zzz = zz - zminvalue
zpeaks, _ = find_peaks(zzz)
for i in range(len(zpeaks)-1):
print(z[zpeaks[i]:zpeaks[i+1]])
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='w')
My data.txt
X:-241, Y:-31, Z:17, T:73823
X:-241, Y:-31, Z:17, T:73952
X:-240, Y:-30, Z:26, T:74073
X:-240, Y:-30, Z:26, T:74191
X:-240, Y:-30, Z:26, T:74312
X:-240, Y:-32, Z:39, T:74432
X:-240, Y:-32, Z:39, T:74549
X:-240, Y:-32, Z:39, T:74668
X:-239, Y:-21, Z:12, T:74785
X:-239, Y:-21, Z:12, T:74904
X:-239, Y:-21, Z:12, T:75022
X:-246, Y:15, Z:18, T:75142
X:-246, Y:15, Z:18, T:75260
X:-246, Y:15, Z:18, T:75378
X:-250, Y:19, Z:14, T:75498
X:-250, Y:19, Z:14, T:75615
X:-250, Y:19, Z:14, T:75732
X:-239, Y:-5, Z:27, T:75854
X:-239, Y:-5, Z:27, T:75972
X:-239, Y:-5, Z:27, T:76102
X:-236, Y:-19, Z:46, T:76240
X:-236, Y:-19, Z:46, T:76369
X:-236, Y:-19, Z:46, T:76489
X:-235, Y:-14, Z:32, T:76610
X:-235, Y:-14, Z:32, T:76727
X:-235, Y:-14, Z:32, T:76845
X:-244, Y:-16, Z:22, T:76963
X:-244, Y:-16, Z:22, T:77081
X:-244, Y:-16, Z:22, T:77201
X:-220, Y:-25, Z:-3, T:77346
X:-220, Y:-25, Z:-3, T:77464
X:-220, Y:-25, Z:-3, T:77580
X:-229, Y:24, Z:2, T:77699
X:-229, Y:24, Z:2, T:77814
X:-229, Y:24, Z:2, T:77934
X:-248, Y:-20, Z:0, T:78052
X:-248, Y:-20, Z:0, T:78171
X:-248, Y:-20, Z:0, T:78288
X:-242, Y:-15, Z:-35, T:78515
X:-242, Y:-15, Z:-35, T:78630
X:-242, Y:-15, Z:-35, T:78747
X:-235, Y:-12, Z:-63, T:78865
X:-235, Y:-12, Z:-63, T:78982
X:-235, Y:-12, Z:-63, T:79102
X:-226, Y:-35, Z:-145, T:79221
X:-226, Y:-35, Z:-145, T:79340
X:-226, Y:-35, Z:-145, T:79461
X:-205, Y:-47, Z:-156, T:79582
X:-205, Y:-47, Z:-156, T:79702
X:-205, Y:-47, Z:-156, T:79821
X:-208, Y:-39, Z:-149, T:79940
X:-208, Y:-39, Z:-149, T:80061
X:-208, Y:-39, Z:-149, T:80181
X:-235, Y:-16, Z:-99, T:80304
X:-235, Y:-16, Z:-99, T:80432
X:-235, Y:-16, Z:-99, T:80657
X:-247, Y:-10, Z:12, T:80774
X:-247, Y:-10, Z:12, T:80890
X:-247, Y:-10, Z:12, T:81008
X:-242, Y:-1, Z:2, T:81127
X:-242, Y:-1, Z:2, T:81246
X:-242, Y:-1, Z:2, T:81363
X:-239, Y:-8, Z:15, T:81483
X:-239, Y:-8, Z:15, T:81600
X:-239, Y:-8, Z:15, T:81720
X:-241, Y:-13, Z:-11, T:81841
X:-241, Y:-13, Z:-11, T:81958
X:-241, Y:-13, Z:-11, T:82076
X:-242, Y:-5, Z:-37, T:82198
X:-242, Y:-5, Z:-37, T:82315
X:-242, Y:-5, Z:-37, T:82435
X:-215, Y:-43, Z:-128, T:82554
X:-215, Y:-43, Z:-128, T:82699
X:-215, Y:-43, Z:-128, T:82829
X:-207, Y:-48, Z:-153, T:82952
X:-207, Y:-48, Z:-153, T:83072
X:-207, Y:-48, Z:-153, T:83191
X:-198, Y:-37, Z:-166, T:83315
X:-198, Y:-37, Z:-166, T:83453
X:-198, Y:-37, Z:-166, T:83572
X:-218, Y:-33, Z:-134, T:83694
X:-218, Y:-33, Z:-134, T:83812
X:-218, Y:-33, Z:-134, T:83932
X:-228, Y:-15, Z:-80, T:84047
X:-228, Y:-15, Z:-80, T:84166
X:-228, Y:-15, Z:-80, T:84288
X:-243, Y:-8, Z:-4, T:84407
X:-243, Y:-8, Z:-4, T:84524
X:-243, Y:-8, Z:-4, T:84640
X:-238, Y:-4, Z:2, T:84756
X:-238, Y:-4, Z:2, T:84872
X:-238, Y:-4, Z:2, T:84994
X:-252, Y:-7, Z:-16, T:85136
X:-252, Y:-7, Z:-16, T:85265
X:-252, Y:-7, Z:-16, T:85385
X:-243, Y:-3, Z:-28, T:85504
X:-243, Y:-3, Z:-28, T:85618
X:-243, Y:-3, Z:-28, T:85739
X:-241, Y:-3, Z:-48, T:85858
X:-241, Y:-3, Z:-48, T:85975
X:-241, Y:-3, Z:-48, T:86094
X:-231, Y:-15, Z:-112, T:86216
X:-231, Y:-15, Z:-112, T:86334
X:-231, Y:-15, Z:-112, T:86453
X:-210, Y:-43, Z:-150, T:86573
X:-210, Y:-43, Z:-150, T:86691
X:-210, Y:-43, Z:-150, T:86811
X:-193, Y:-58, Z:-169, T:86933
X:-193, Y:-58, Z:-169, T:87051
X:-193, Y:-58, Z:-169, T:87171
X:-182, Y:-27, Z:-179, T:87305
X:-182, Y:-27, Z:-179, T:87435
X:-182, Y:-27, Z:-179, T:87566
X:-212, Y:-19, Z:-136, T:87686
X:-212, Y:-19, Z:-136, T:87803
X:-212, Y:-19, Z:-136, T:87920
X:-233, Y:-25, Z:-83, T:88040
X:-233, Y:-25, Z:-83, T:88160
X:-233, Y:-25, Z:-83, T:88278
X:-243, Y:-16, Z:-31, T:88396
X:-243, Y:-16, Z:-31, T:88510
X:-243, Y:-16, Z:-31, T:88625
X:-244, Y:-13, Z:-27, T:88744
X:-244, Y:-13, Z:-27, T:88860
X:-244, Y:-13, Z:-27, T:88978
X:-243, Y:-15, Z:-51, T:89099
X:-243, Y:-15, Z:-51, T:89218
X:-243, Y:-15, Z:-51, T:89338
X:-228, Y:-27, Z:-78, T:89472
X:-228, Y:-27, Z:-78, T:89601
X:-228, Y:-27, Z:-78, T:89746
X:-223, Y:-24, Z:-114, T:89876
X:-223, Y:-24, Z:-114, T:89995
X:-223, Y:-24, Z:-114, T:90115
X:-205, Y:-42, Z:-141, T:90236
X:-205, Y:-42, Z:-141, T:90354
X:-205, Y:-42, Z:-141, T:90474
X:-199, Y:-67, Z:-153, T:90595
X:-199, Y:-67, Z:-153, T:90713
X:-199, Y:-67, Z:-153, T:90833
X:-202, Y:-53, Z:-152, T:90951
X:-202, Y:-53, Z:-152, T:91069
X:-202, Y:-53, Z:-152, T:91191
X:-224, Y:-41, Z:-135, T:91312
X:-224, Y:-41, Z:-135, T:91431
X:-224, Y:-41, Z:-135, T:91549
X:-229, Y:-29, Z:-91, T:91669
X:-229, Y:-29, Z:-91, T:91789
X:-229, Y:-29, Z:-91, T:91923
X:-242, Y:-8, Z:-2, T:92066
X:-242, Y:-8, Z:-2, T:92184
X:-242, Y:-8, Z:-2, T:92302
X:-233, Y:-12, Z:-5, T:92420
X:-233, Y:-12, Z:-5, T:92534
X:-233, Y:-12, Z:-5, T:92654
X:-246, Y:-1, Z:-4, T:92773
X:-246, Y:-1, Z:-4, T:92892
X:-246, Y:-1, Z:-4, T:93010
X:-242, Y:-9, Z:-23, T:93130
X:-242, Y:-9, Z:-23, T:93251
X:-242, Y:-9, Z:-23, T:93370
X:-237, Y:-19, Z:-46, T:93491
X:-237, Y:-19, Z:-46, T:93608
X:-237, Y:-19, Z:-46, T:93727
X:-213, Y:-23, Z:-95, T:93849
X:-213, Y:-23, Z:-95, T:93966
X:-213, Y:-23, Z:-95, T:94112
X:-207, Y:-36, Z:-151, T:94241
X:-207, Y:-36, Z:-151, T:94359
X:-207, Y:-36, Z:-151, T:94480
X:-199, Y:-49, Z:-162, T:94600
X:-199, Y:-49, Z:-162, T:94721
X:-199, Y:-49, Z:-162, T:94840
X:-203, Y:-36, Z:-146, T:94961
X:-203, Y:-36, Z:-146, T:95082
X:-203, Y:-36, Z:-146, T:95202
X:-222, Y:-28, Z:-124, T:95324
X:-222, Y:-28, Z:-124, T:95439
X:-222, Y:-28, Z:-124, T:95583
X:-244, Y:2, Z:-53, T:95700
X:-244, Y:2, Z:-53, T:95817
X:-244, Y:2, Z:-53, T:95935
X:-237, Y:-5, Z:-9, T:96055
X:-237, Y:-5, Z:-9, T:96171
X:-237, Y:-5, Z:-9, T:96301
X:-239, Y:-2, Z:1, T:96439
X:-239, Y:-2, Z:1, T:96568
X:-239, Y:-2, Z:1, T:96685
X:-243, Y:-4, Z:2, T:96805
X:-243, Y:-4, Z:2, T:96919
X:-243, Y:-4, Z:2, T:97037
X:-246, Y:-3, Z:-16, T:97159
X:-246, Y:-3, Z:-16, T:97276
X:-246, Y:-3, Z:-16, T:97395
X:-239, Y:-8, Z:-42, T:97513
X:-239, Y:-8, Z:-42, T:97631
X:-239, Y:-8, Z:-42, T:97752
X:-221, Y:-10, Z:-115, T:97871
X:-221, Y:-10, Z:-115, T:97990
X:-221, Y:-10, Z:-115, T:98109
X:-219, Y:-25, Z:-145, T:98230
X:-219, Y:-25, Z:-145, T:98350
X:-219, Y:-25, Z:-145, T:98468
X:-202, Y:-31, Z:-172, T:98589
X:-202, Y:-31, Z:-172, T:98736
X:-202, Y:-31, Z:-172, T:98865
X:-214, Y:-34, Z:-144, T:98985
X:-214, Y:-34, Z:-144, T:99101
X:-214, Y:-34, Z:-144, T:99223
X:-224, Y:-24, Z:-116, T:99342
X:-224, Y:-24, Z:-116, T:99460
X:-224, Y:-24, Z:-116, T:99579
X:-232, Y:2, Z:-50, T:99699
X:-232, Y:2, Z:-50, T:99818
X:-232, Y:2, Z:-50, T:99936
X:-241, Y:-4, Z:-22, T:100056
X:-241, Y:-4, Z:-22, T:100175
X:-241, Y:-4, Z:-22, T:100293
X:-240, Y:4, Z:-2, T:100414
X:-240, Y:4, Z:-2, T:100532
X:-240, Y:4, Z:-2, T:100648
X:-241, Y:3, Z:1, T:100768
X:-241, Y:3, Z:1, T:100895
X:-241, Y:3, Z:1, T:101029
X:-243, Y:1, Z:-16, T:101160
X:-243, Y:1, Z:-16, T:101278
X:-243, Y:1, Z:-16, T:101399
X:-239, Y:-2, Z:-36, T:101518
X:-239, Y:-2, Z:-36, T:101661
X:-239, Y:-2, Z:-36, T:101780
X:-228, Y:-12, Z:-71, T:101901
X:-228, Y:-12, Z:-71, T:102019
X:-228, Y:-12, Z:-71, T:102138
X:-224, Y:-23, Z:-118, T:102260
X:-224, Y:-23, Z:-118, T:102378
X:-224, Y:-23, Z:-118, T:102498
X:-209, Y:-2, Z:-161, T:102617
X:-209, Y:-2, Z:-161, T:102735
X:-209, Y:-2, Z:-161, T:102855
X:-206, Y:-3, Z:-150, T:102974
X:-206, Y:-3, Z:-150, T:103088
X:-206, Y:-3, Z:-150, T:103216
X:-218, Y:0, Z:-142, T:103355
X:-218, Y:0, Z:-142, T:103469
X:-218, Y:0, Z:-142, T:103581
X:-226, Y:-17, Z:-118, T:103700
X:-226, Y:-17, Z:-118, T:103814
X:-226, Y:-17, Z:-118, T:103931
X:-242, Y:4, Z:-40, T:104054
X:-242, Y:4, Z:-40, T:104171
X:-242, Y:4, Z:-40, T:104292
X:-242, Y:4, Z:-22, T:104410
X:-242, Y:4, Z:-22, T:104523
X:-242, Y:4, Z:-22, T:104642
X:-240, Y:5, Z:-3, T:104762
X:-240, Y:5, Z:-3, T:104879
X:-240, Y:5, Z:-3, T:104993
X:-244, Y:-2, Z:-6, T:105111
X:-244, Y:-2, Z:-6, T:105231
X:-244, Y:-2, Z:-6, T:105361
X:-244, Y:1, Z:-10, T:105497
X:-244, Y:1, Z:-10, T:105623
X:-244, Y:1, Z:-10, T:105744
X:-244, Y:-4, Z:-34, T:105865
X:-244, Y:-4, Z:-34, T:105981
X:-244, Y:-4, Z:-34, T:106101
X:-231, Y:-1, Z:-63, T:106222
X:-231, Y:-1, Z:-63, T:106341
X:-231, Y:-1, Z:-63, T:106462
X:-222, Y:-11, Z:-116, T:106580
X:-222, Y:-11, Z:-116, T:106698
X:-222, Y:-11, Z:-116, T:106818
X:-219, Y:-15, Z:-144, T:106938
X:-219, Y:-15, Z:-144, T:107058
X:-219, Y:-15, Z:-144, T:107174
X:-204, Y:-6, Z:-150, T:107297
X:-204, Y:-6, Z:-150, T:107410
X:-204, Y:-6, Z:-150, T:107528
X:-196, Y:-5, Z:-163, T:107665
X:-196, Y:-5, Z:-163, T:107802
X:-196, Y:-5, Z:-163, T:107935
X:-214, Y:-2, Z:-153, T:108066
X:-214, Y:-2, Z:-153, T:108186
X:-214, Y:-2, Z:-153, T:108306
X:-223, Y:-12, Z:-123, T:108422
X:-223, Y:-12, Z:-123, T:108544
X:-223, Y:-12, Z:-123, T:108661
X:-230, Y:7, Z:-52, T:108783
X:-230, Y:7, Z:-52, T:108900
X:-230, Y:7, Z:-52, T:109019
X:-241, Y:9, Z:-25, T:109139
X:-241, Y:9, Z:-25, T:109258
X:-241, Y:9, Z:-25, T:109375
X:-245, Y:4, Z:-12, T:109496
X:-245, Y:4, Z:-12, T:109612
X:-245, Y:4, Z:-12, T:109732
X:-242, Y:3, Z:-6, T:109852
X:-242, Y:3, Z:-6, T:109968
X:-242, Y:3, Z:-6, T:110098
X:-239, Y:-4, Z:-35, T:110243
X:-239, Y:-4, Z:-35, T:110362
X:-239, Y:-4, Z:-35, T:110484
X:-235, Y:6, Z:-65, T:110606
X:-235, Y:6, Z:-65, T:110722
X:-235, Y:6, Z:-65, T:110840
X:-215, Y:-14, Z:-117, T:110962
X:-215, Y:-14, Z:-117, T:111081
X:-215, Y:-14, Z:-117, T:111204
X:-224, Y:7, Z:-146, T:111324
X:-224, Y:7, Z:-146, T:111441
X:-224, Y:7, Z:-146, T:111561
X:-209, Y:-6, Z:-149, T:111679
X:-209, Y:-6, Z:-149, T:111799
X:-209, Y:-6, Z:-149, T:111919
X:-219, Y:-8, Z:-140, T:112038
X:-219, Y:-8, Z:-140, T:112157
X:-219, Y:-8, Z:-140, T:112274
X:-226, Y:-3, Z:-116, T:112405
X:-226, Y:-3, Z:-116, T:112540
X:-226, Y:-3, Z:-116, T:112669
X:-233, Y:2, Z:-76, T:112792
X:-233, Y:2, Z:-76, T:112909
X:-233, Y:2, Z:-76, T:113028
X:-237, Y:7, Z:-35, T:113148
X:-237, Y:7, Z:-35, T:113266
X:-237, Y:7, Z:-35, T:113386
X:-242, Y:5, Z:-15, T:113504
X:-242, Y:5, Z:-15, T:113624
X:-242, Y:5, Z:-15, T:113764
X:-244, Y:5, Z:-3, T:113884
X:-244, Y:5, Z:-3, T:113999
X:-244, Y:5, Z:-3, T:114118
X:-242, Y:3, Z:-7, T:114239
X:-242, Y:3, Z:-7, T:114357
X:-242, Y:3, Z:-7, T:114473
X:-241, Y:0, Z:-30, T:114595
X:-241, Y:0, Z:-30, T:114720
X:-241, Y:0, Z:-30, T:114867
X:-227, Y:-13, Z:-95, T:114989
X:-227, Y:-13, Z:-95, T:115104
X:-227, Y:-13, Z:-95, T:115224
X:-212, Y:-5, Z:-114, T:115343
X:-212, Y:-5, Z:-114, T:115462
X:-212, Y:-5, Z:-114, T:115579
X:-215, Y:-6, Z:-145, T:115701
X:-215, Y:-6, Z:-145, T:115819
X:-215, Y:-6, Z:-145, T:115937
X:-210, Y:5, Z:-142, T:116059
X:-210, Y:5, Z:-142, T:116176
X:-210, Y:5, Z:-142, T:116296
X:-222, Y:-19, Z:-145, T:116415
X:-222, Y:-19, Z:-145, T:116534
X:-222, Y:-19, Z:-145, T:116655
X:-231, Y:6, Z:-119, T:116775
X:-231, Y:6, Z:-119, T:116894
X:-231, Y:6, Z:-119, T:117023
The issue is that since your .to_csv call is within the loop, 'Diff.csv' is being overwritten every time. Only the last time it's written is what you end up seeing.
There are a few solutions.
.to_csv(mode='a')
This uses append mode, so it will not overwrite the entire file. You will also want to specify header=None so that it doesn't constantly write the header column in the middle of the file. If you want you can add the header once before the loop.
for i in range(len(zpeaks)-1):
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),
pd.DataFrame(b),
pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='a', header=None)
Create a list, concat after the loop
Add your DataFrames to a list within the loop, then concatenate when the loop finishes and save the full DataFrame to a file at once.
l = []
for i in range(len(zpeaks)-1):
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
l.append(pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1))
#pd.concat(l).to_csv('Diff.csv') # No column names
pd.concat(l).rename(columns=lambda x, y=iter(['x', 'y', 'z']): next(y)).to_csv('Diff.csv')
I'm not entirely sure, since I myself am still learning Python, but it looks like it may be due to your indentation at the end.
It should look like this:
for i in range(len(zpeaks)-1):
print(z[zpeaks[i]:zpeaks[i+1]])
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='w')
The last line should be within the for loop in order to add each row to the csv.
I want to find the number of unique tokens in a file. For this purpose I wrote the below code:
splittedWords = open('output.txt', encoding='windows-1252').read().lower().split()
uniqueValues = set(splittedWords)
print(uniqueValues)
The output.txt file is like this:
Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl
club+Noun toplanti+Noun+A3pl+P3sg
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc
nispi+Adj
nisbi+Adj
görece+Adj+With
izafi+Adj
obur+Adj
With this code I can get the unique tokens like Türkiye+Noun, Türkiye+Noun+Gen. But I want to get forexample Türkiye+Noun, Türkiye+Noun+Gen like only one token before the + sign. I only want Türkiye part. In the end Türkiye+Noun and Türkiye+Noun+Gen tokens needs to be same and only treated as a single unique token. I think I need to write regex for this purpose.
It seems the word you want is always the 1st in a list of '+'-joined words:
Split the splitted words at + and take the 0th one:
text = """Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl
club+Noun toplanti+Noun+A3pl+P3sg
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc
nispi+Adj
nisbi+Adj
görece+Adj+With
izafi+Adj
obur+Adj """
splittedWords = text.lower().replace("\n"," ").split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))
print(uniqueValues)
Output:
{'imha', 'çaba', 'ülke', 'arzula', 'terörizm', 'olus', 'daha', 'istikrar', 'küresel',
'sagla', 'önle', 'üzere', 'nisbi', 'türkiye', 'gelis', 'bir', 'karar', 'hedef', '2',
've', 'silah', 'kur', 'alan', 'club', 'boyut', '-', 'anlasma', 'iliski',
'izafi', 'kurumsal', 'karsi', 'ankara', 'ortaklik', 'obur', 'kitle', 'güven',
'uygula', 'ol', 'düzey', 'konsey', 'teknik', 'rejim', 'komite', 'gümrük', 'samimi',
'gel', 'yay', 'toplanti', '.', 'asama', 'mahiyet', 'ab', '69', 'için',
'paylas', '6', '/', 'nispi', 'dünya', 'at', 'sayili', 'görece', 'isbirlik', 'birlik',
',', 'tüm', 'ile', 'düzen', 'uyar', 'göster', 'tehdit', 'madde'}
You might need to do some additional cleanup to remove things like
',' '6' '/'
Split and remove anything thats just numbers or punctuation
from string import digits, punctuation
remove=set(digits+punctuation)
splittedWords = text.lower().split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))
# remove from set anything that only consists of numbers or punctuation
uniqueValues = uniqueValues - set ( x for x in uniqueValues if all(c in remove for c in x))
print(uniqueValues)
to get it as:
{'teknik', 'yay', 'göster','hedef', 'terörizm', 'ortaklik','ile', 'daha', 'ol', 'istikrar',
'paylas', 'nispi', 'üzere', 'sagla', 'tüm', 'önle', 'asama', 'uygula', 'güven', 'kur',
'türkiye', 'gel', 'dünya', 'gelis', 'sayili', 'ab', 'club', 'küresel', 'imha', 'çaba',
'olus', 'iliski', 'izafi', 'mahiyet', 've', 'düzey', 'anlasma', 'tehdit', 'bir', 'düzen',
'obur', 'samimi', 'boyut', 'ülke', 'arzula', 'rejim', 'gümrük', 'karar', 'at', 'karsi',
'nisbi', 'isbirlik', 'alan', 'toplanti', 'ankara', 'birlik', 'kurumsal', 'için', 'kitle',
'komite', 'silah', 'görece', 'uyar', 'madde', 'konsey'}
You can split all the tokens you have now on "+" and take only the first one.
uniqueValues = set(map(lambda x: x.split('+')[0], splittedWords))
Here I use map. Map will apply the function (the lambda part) on all values of the splittedWords.
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
import pdb
file = open("Data1r2.txt", 'r')
lines = file.readlines()
file.close()
t = []
z = []
y = []
x = []
with open("Data1r2.txt", 'r') as f:
for line in f:
parts = line.split(", ")
x.append(float(parts[0][2:]))
y.append(float(parts[1][2:]))
z.append(float(parts[2][2:]))
t.append(float(parts[3][2:]))
This part im mainly annotating the highest peak value of the graph, but how can i annotate all peak values at a fixed distance? say distance = 10,000
fig = plt.figure()
ax = fig.add_subplot(111)
line, = ax.plot(t, z)
ymax = max(z)
xpos = z.index(ymax)
xmax = t[xpos]
text= "x={:.1f}, y={:.1f}".format(xmax, ymax) #Annotation(correct)
ax.annotate(text, xy=(xmax, ymax), xytext=(xmax, ymax),
arrowprops=dict(facecolor='black', shrink=0.05),
)
plt.legend()
plt.show()
This is what i currently have annotating only the peak value:
Data
X:-241, Y:-31, Z:17, T:73823
X:-241, Y:-31, Z:17, T:73952
X:-240, Y:-30, Z:26, T:74073
X:-240, Y:-30, Z:26, T:74191
X:-240, Y:-30, Z:26, T:74312
X:-240, Y:-32, Z:39, T:74432
X:-240, Y:-32, Z:39, T:74549
X:-240, Y:-32, Z:39, T:74668
X:-239, Y:-21, Z:12, T:74785
X:-239, Y:-21, Z:12, T:74904
X:-239, Y:-21, Z:12, T:75022
X:-246, Y:15, Z:18, T:75142
X:-246, Y:15, Z:18, T:75260
X:-246, Y:15, Z:18, T:75378
X:-250, Y:19, Z:14, T:75498
X:-250, Y:19, Z:14, T:75615
X:-250, Y:19, Z:14, T:75732
X:-239, Y:-5, Z:27, T:75854
X:-239, Y:-5, Z:27, T:75972
X:-239, Y:-5, Z:27, T:76102
X:-236, Y:-19, Z:46, T:76240
X:-236, Y:-19, Z:46, T:76369
X:-236, Y:-19, Z:46, T:76489
X:-235, Y:-14, Z:32, T:76610
X:-235, Y:-14, Z:32, T:76727
X:-235, Y:-14, Z:32, T:76845
X:-244, Y:-16, Z:22, T:76963
X:-244, Y:-16, Z:22, T:77081
X:-244, Y:-16, Z:22, T:77201
X:-220, Y:-25, Z:-3, T:77346
X:-220, Y:-25, Z:-3, T:77464
X:-220, Y:-25, Z:-3, T:77580
X:-229, Y:24, Z:2, T:77699
X:-229, Y:24, Z:2, T:77814
X:-229, Y:24, Z:2, T:77934
X:-248, Y:-20, Z:0, T:78052
X:-248, Y:-20, Z:0, T:78171
X:-248, Y:-20, Z:0, T:78288
X:-242, Y:-15, Z:-35, T:78515
X:-242, Y:-15, Z:-35, T:78630
X:-242, Y:-15, Z:-35, T:78747
X:-235, Y:-12, Z:-63, T:78865
X:-235, Y:-12, Z:-63, T:78982
X:-235, Y:-12, Z:-63, T:79102
X:-226, Y:-35, Z:-145, T:79221
X:-226, Y:-35, Z:-145, T:79340
X:-226, Y:-35, Z:-145, T:79461
X:-205, Y:-47, Z:-156, T:79582
X:-205, Y:-47, Z:-156, T:79702
X:-205, Y:-47, Z:-156, T:79821
X:-208, Y:-39, Z:-149, T:79940
X:-208, Y:-39, Z:-149, T:80061
X:-208, Y:-39, Z:-149, T:80181
X:-235, Y:-16, Z:-99, T:80304
X:-235, Y:-16, Z:-99, T:80432
X:-235, Y:-16, Z:-99, T:80657
X:-247, Y:-10, Z:12, T:80774
X:-247, Y:-10, Z:12, T:80890
X:-247, Y:-10, Z:12, T:81008
X:-242, Y:-1, Z:2, T:81127
X:-242, Y:-1, Z:2, T:81246
X:-242, Y:-1, Z:2, T:81363
X:-239, Y:-8, Z:15, T:81483
X:-239, Y:-8, Z:15, T:81600
X:-239, Y:-8, Z:15, T:81720
X:-241, Y:-13, Z:-11, T:81841
X:-241, Y:-13, Z:-11, T:81958
X:-241, Y:-13, Z:-11, T:82076
X:-242, Y:-5, Z:-37, T:82198
X:-242, Y:-5, Z:-37, T:82315
X:-242, Y:-5, Z:-37, T:82435
X:-215, Y:-43, Z:-128, T:82554
X:-215, Y:-43, Z:-128, T:82699
X:-215, Y:-43, Z:-128, T:82829
X:-207, Y:-48, Z:-153, T:82952
X:-207, Y:-48, Z:-153, T:83072
X:-207, Y:-48, Z:-153, T:83191
X:-198, Y:-37, Z:-166, T:83315
X:-198, Y:-37, Z:-166, T:83453
X:-198, Y:-37, Z:-166, T:83572
X:-218, Y:-33, Z:-134, T:83694
X:-218, Y:-33, Z:-134, T:83812
X:-218, Y:-33, Z:-134, T:83932
X:-228, Y:-15, Z:-80, T:84047
X:-228, Y:-15, Z:-80, T:84166
X:-228, Y:-15, Z:-80, T:84288
X:-243, Y:-8, Z:-4, T:84407
X:-243, Y:-8, Z:-4, T:84524
X:-243, Y:-8, Z:-4, T:84640
X:-238, Y:-4, Z:2, T:84756
X:-238, Y:-4, Z:2, T:84872
X:-238, Y:-4, Z:2, T:84994
X:-252, Y:-7, Z:-16, T:85136
X:-252, Y:-7, Z:-16, T:85265
X:-252, Y:-7, Z:-16, T:85385
X:-243, Y:-3, Z:-28, T:85504
X:-243, Y:-3, Z:-28, T:85618
X:-243, Y:-3, Z:-28, T:85739
X:-241, Y:-3, Z:-48, T:85858
X:-241, Y:-3, Z:-48, T:85975
X:-241, Y:-3, Z:-48, T:86094
X:-231, Y:-15, Z:-112, T:86216
X:-231, Y:-15, Z:-112, T:86334
X:-231, Y:-15, Z:-112, T:86453
X:-210, Y:-43, Z:-150, T:86573
X:-210, Y:-43, Z:-150, T:86691
X:-210, Y:-43, Z:-150, T:86811
X:-193, Y:-58, Z:-169, T:86933
X:-193, Y:-58, Z:-169, T:87051
X:-193, Y:-58, Z:-169, T:87171
X:-182, Y:-27, Z:-179, T:87305
X:-182, Y:-27, Z:-179, T:87435
X:-182, Y:-27, Z:-179, T:87566
X:-212, Y:-19, Z:-136, T:87686
X:-212, Y:-19, Z:-136, T:87803
X:-212, Y:-19, Z:-136, T:87920
X:-233, Y:-25, Z:-83, T:88040
X:-233, Y:-25, Z:-83, T:88160
X:-233, Y:-25, Z:-83, T:88278
X:-243, Y:-16, Z:-31, T:88396
X:-243, Y:-16, Z:-31, T:88510
X:-243, Y:-16, Z:-31, T:88625
X:-244, Y:-13, Z:-27, T:88744
X:-244, Y:-13, Z:-27, T:88860
X:-244, Y:-13, Z:-27, T:88978
X:-243, Y:-15, Z:-51, T:89099
X:-243, Y:-15, Z:-51, T:89218
X:-243, Y:-15, Z:-51, T:89338
X:-228, Y:-27, Z:-78, T:89472
X:-228, Y:-27, Z:-78, T:89601
X:-228, Y:-27, Z:-78, T:89746
X:-223, Y:-24, Z:-114, T:89876
X:-223, Y:-24, Z:-114, T:89995
X:-223, Y:-24, Z:-114, T:90115
X:-205, Y:-42, Z:-141, T:90236
X:-205, Y:-42, Z:-141, T:90354
X:-205, Y:-42, Z:-141, T:90474
X:-199, Y:-67, Z:-153, T:90595
X:-199, Y:-67, Z:-153, T:90713
X:-199, Y:-67, Z:-153, T:90833
X:-202, Y:-53, Z:-152, T:90951
X:-202, Y:-53, Z:-152, T:91069
X:-202, Y:-53, Z:-152, T:91191
X:-224, Y:-41, Z:-135, T:91312
X:-224, Y:-41, Z:-135, T:91431
X:-224, Y:-41, Z:-135, T:91549
X:-229, Y:-29, Z:-91, T:91669
X:-229, Y:-29, Z:-91, T:91789
X:-229, Y:-29, Z:-91, T:91923
X:-242, Y:-8, Z:-2, T:92066
X:-242, Y:-8, Z:-2, T:92184
X:-242, Y:-8, Z:-2, T:92302
X:-233, Y:-12, Z:-5, T:92420
X:-233, Y:-12, Z:-5, T:92534
X:-233, Y:-12, Z:-5, T:92654
X:-246, Y:-1, Z:-4, T:92773
X:-246, Y:-1, Z:-4, T:92892
X:-246, Y:-1, Z:-4, T:93010
X:-242, Y:-9, Z:-23, T:93130
X:-242, Y:-9, Z:-23, T:93251
X:-242, Y:-9, Z:-23, T:93370
X:-237, Y:-19, Z:-46, T:93491
X:-237, Y:-19, Z:-46, T:93608
X:-237, Y:-19, Z:-46, T:93727
X:-213, Y:-23, Z:-95, T:93849
X:-213, Y:-23, Z:-95, T:93966
X:-213, Y:-23, Z:-95, T:94112
X:-207, Y:-36, Z:-151, T:94241
X:-207, Y:-36, Z:-151, T:94359
X:-207, Y:-36, Z:-151, T:94480
X:-199, Y:-49, Z:-162, T:94600
X:-199, Y:-49, Z:-162, T:94721
X:-199, Y:-49, Z:-162, T:94840
X:-203, Y:-36, Z:-146, T:94961
X:-203, Y:-36, Z:-146, T:95082
X:-203, Y:-36, Z:-146, T:95202
X:-222, Y:-28, Z:-124, T:95324
X:-222, Y:-28, Z:-124, T:95439
X:-222, Y:-28, Z:-124, T:95583
X:-244, Y:2, Z:-53, T:95700
X:-244, Y:2, Z:-53, T:95817
X:-244, Y:2, Z:-53, T:95935
X:-237, Y:-5, Z:-9, T:96055
X:-237, Y:-5, Z:-9, T:96171
X:-237, Y:-5, Z:-9, T:96301
X:-239, Y:-2, Z:1, T:96439
X:-239, Y:-2, Z:1, T:96568
X:-239, Y:-2, Z:1, T:96685
X:-243, Y:-4, Z:2, T:96805
X:-243, Y:-4, Z:2, T:96919
X:-243, Y:-4, Z:2, T:97037
X:-246, Y:-3, Z:-16, T:97159
X:-246, Y:-3, Z:-16, T:97276
X:-246, Y:-3, Z:-16, T:97395
X:-239, Y:-8, Z:-42, T:97513
X:-239, Y:-8, Z:-42, T:97631
X:-239, Y:-8, Z:-42, T:97752
X:-221, Y:-10, Z:-115, T:97871
X:-221, Y:-10, Z:-115, T:97990
X:-221, Y:-10, Z:-115, T:98109
X:-219, Y:-25, Z:-145, T:98230
X:-219, Y:-25, Z:-145, T:98350
X:-219, Y:-25, Z:-145, T:98468
X:-202, Y:-31, Z:-172, T:98589
X:-202, Y:-31, Z:-172, T:98736
X:-202, Y:-31, Z:-172, T:98865
X:-214, Y:-34, Z:-144, T:98985
X:-214, Y:-34, Z:-144, T:99101
X:-214, Y:-34, Z:-144, T:99223
X:-224, Y:-24, Z:-116, T:99342
X:-224, Y:-24, Z:-116, T:99460
X:-224, Y:-24, Z:-116, T:99579
X:-232, Y:2, Z:-50, T:99699
X:-232, Y:2, Z:-50, T:99818
X:-232, Y:2, Z:-50, T:99936
X:-241, Y:-4, Z:-22, T:100056
X:-241, Y:-4, Z:-22, T:100175
X:-241, Y:-4, Z:-22, T:100293
X:-240, Y:4, Z:-2, T:100414
X:-240, Y:4, Z:-2, T:100532
X:-240, Y:4, Z:-2, T:100648
X:-241, Y:3, Z:1, T:100768
X:-241, Y:3, Z:1, T:100895
X:-241, Y:3, Z:1, T:101029
X:-243, Y:1, Z:-16, T:101160
X:-243, Y:1, Z:-16, T:101278
X:-243, Y:1, Z:-16, T:101399
X:-239, Y:-2, Z:-36, T:101518
X:-239, Y:-2, Z:-36, T:101661
X:-239, Y:-2, Z:-36, T:101780
X:-228, Y:-12, Z:-71, T:101901
X:-228, Y:-12, Z:-71, T:102019
X:-228, Y:-12, Z:-71, T:102138
X:-224, Y:-23, Z:-118, T:102260
X:-224, Y:-23, Z:-118, T:102378
X:-224, Y:-23, Z:-118, T:102498
X:-209, Y:-2, Z:-161, T:102617
X:-209, Y:-2, Z:-161, T:102735
X:-209, Y:-2, Z:-161, T:102855
X:-206, Y:-3, Z:-150, T:102974
X:-206, Y:-3, Z:-150, T:103088
X:-206, Y:-3, Z:-150, T:103216
X:-218, Y:0, Z:-142, T:103355
X:-218, Y:0, Z:-142, T:103469
X:-218, Y:0, Z:-142, T:103581
X:-226, Y:-17, Z:-118, T:103700
X:-226, Y:-17, Z:-118, T:103814
X:-226, Y:-17, Z:-118, T:103931
X:-242, Y:4, Z:-40, T:104054
X:-242, Y:4, Z:-40, T:104171
X:-242, Y:4, Z:-40, T:104292
X:-242, Y:4, Z:-22, T:104410
X:-242, Y:4, Z:-22, T:104523
X:-242, Y:4, Z:-22, T:104642
X:-240, Y:5, Z:-3, T:104762
X:-240, Y:5, Z:-3, T:104879
X:-240, Y:5, Z:-3, T:104993
X:-244, Y:-2, Z:-6, T:105111
X:-244, Y:-2, Z:-6, T:105231
X:-244, Y:-2, Z:-6, T:105361
X:-244, Y:1, Z:-10, T:105497
X:-244, Y:1, Z:-10, T:105623
X:-244, Y:1, Z:-10, T:105744
X:-244, Y:-4, Z:-34, T:105865
X:-244, Y:-4, Z:-34, T:105981
X:-244, Y:-4, Z:-34, T:106101
X:-231, Y:-1, Z:-63, T:106222
X:-231, Y:-1, Z:-63, T:106341
X:-231, Y:-1, Z:-63, T:106462
X:-222, Y:-11, Z:-116, T:106580
X:-222, Y:-11, Z:-116, T:106698
X:-222, Y:-11, Z:-116, T:106818
X:-219, Y:-15, Z:-144, T:106938
X:-219, Y:-15, Z:-144, T:107058
X:-219, Y:-15, Z:-144, T:107174
X:-204, Y:-6, Z:-150, T:107297
X:-204, Y:-6, Z:-150, T:107410
X:-204, Y:-6, Z:-150, T:107528
X:-196, Y:-5, Z:-163, T:107665
X:-196, Y:-5, Z:-163, T:107802
X:-196, Y:-5, Z:-163, T:107935
X:-214, Y:-2, Z:-153, T:108066
X:-214, Y:-2, Z:-153, T:108186
X:-214, Y:-2, Z:-153, T:108306
X:-223, Y:-12, Z:-123, T:108422
X:-223, Y:-12, Z:-123, T:108544
X:-223, Y:-12, Z:-123, T:108661
X:-230, Y:7, Z:-52, T:108783
X:-230, Y:7, Z:-52, T:108900
X:-230, Y:7, Z:-52, T:109019
X:-241, Y:9, Z:-25, T:109139
X:-241, Y:9, Z:-25, T:109258
X:-241, Y:9, Z:-25, T:109375
X:-245, Y:4, Z:-12, T:109496
X:-245, Y:4, Z:-12, T:109612
X:-245, Y:4, Z:-12, T:109732
X:-242, Y:3, Z:-6, T:109852
X:-242, Y:3, Z:-6, T:109968
X:-242, Y:3, Z:-6, T:110098
X:-239, Y:-4, Z:-35, T:110243
X:-239, Y:-4, Z:-35, T:110362
X:-239, Y:-4, Z:-35, T:110484
X:-235, Y:6, Z:-65, T:110606
X:-235, Y:6, Z:-65, T:110722
X:-235, Y:6, Z:-65, T:110840
X:-215, Y:-14, Z:-117, T:110962
X:-215, Y:-14, Z:-117, T:111081
X:-215, Y:-14, Z:-117, T:111204
X:-224, Y:7, Z:-146, T:111324
X:-224, Y:7, Z:-146, T:111441
X:-224, Y:7, Z:-146, T:111561
X:-209, Y:-6, Z:-149, T:111679
X:-209, Y:-6, Z:-149, T:111799
X:-209, Y:-6, Z:-149, T:111919
X:-219, Y:-8, Z:-140, T:112038
X:-219, Y:-8, Z:-140, T:112157
X:-219, Y:-8, Z:-140, T:112274
X:-226, Y:-3, Z:-116, T:112405
X:-226, Y:-3, Z:-116, T:112540
X:-226, Y:-3, Z:-116, T:112669
X:-233, Y:2, Z:-76, T:112792
X:-233, Y:2, Z:-76, T:112909
X:-233, Y:2, Z:-76, T:113028
X:-237, Y:7, Z:-35, T:113148
X:-237, Y:7, Z:-35, T:113266
X:-237, Y:7, Z:-35, T:113386
X:-242, Y:5, Z:-15, T:113504
X:-242, Y:5, Z:-15, T:113624
X:-242, Y:5, Z:-15, T:113764
X:-244, Y:5, Z:-3, T:113884
X:-244, Y:5, Z:-3, T:113999
X:-244, Y:5, Z:-3, T:114118
X:-242, Y:3, Z:-7, T:114239
X:-242, Y:3, Z:-7, T:114357
X:-242, Y:3, Z:-7, T:114473
X:-241, Y:0, Z:-30, T:114595
X:-241, Y:0, Z:-30, T:114720
X:-241, Y:0, Z:-30, T:114867
X:-227, Y:-13, Z:-95, T:114989
X:-227, Y:-13, Z:-95, T:115104
X:-227, Y:-13, Z:-95, T:115224
X:-212, Y:-5, Z:-114, T:115343
X:-212, Y:-5, Z:-114, T:115462
X:-212, Y:-5, Z:-114, T:115579
X:-215, Y:-6, Z:-145, T:115701
X:-215, Y:-6, Z:-145, T:115819
X:-215, Y:-6, Z:-145, T:115937
X:-210, Y:5, Z:-142, T:116059
X:-210, Y:5, Z:-142, T:116176
X:-210, Y:5, Z:-142, T:116296
X:-222, Y:-19, Z:-145, T:116415
X:-222, Y:-19, Z:-145, T:116534
X:-222, Y:-19, Z:-145, T:116655
X:-231, Y:6, Z:-119, T:116775
X:-231, Y:6, Z:-119, T:116894
X:-231, Y:6, Z:-119, T:117023
I don't think there's something ready in matplotlib - the task of peak detection is hard and the notion of peak can vary greatly from application to application.
Since your data is relatively simple, you can try an approach inspired by a Schmitt trigger: look for recent high values but discard small oscillations. The (pseudo) code would be:
y_max = None
for y in data:
if y_max is None: # start tracking
y_max = y
if y > y_max: # update max value
y_max = y
if y < y_max * 0.9: # signal is too different from
add_label(y=y_max) # the peak - save the peak and
y_max = None # start looking for another one
I got this a function that replaces sub-string matches with the match surrounded with HTML tags. This function will consume string in English and Greek mostly.
The function:
def highlight_text(st, kwlist, start_tag=None, end_tag=None):
if start_tag is None:
start_tag = '<span class="nom">'
if end_tag is None:
end_tag = '</span>'
for kw in kwlist:
st = re.sub(r'\b' + kw + r'\b', '{}{}{}'.format(start_tag, kw, end_tag), st)
return st
The testing string is in Greek except the first sub-string [Korais]: st="Korais Ο Αδαμάντιος Κοραής (Σμύρνη, 27 Απριλίου 1748 – Παρίσι, 6 Απριλίου 1833), ήταν Έλληνας φιλόλογος με βαθιά γνώση του ελληνικού πολιτισμού. Ο Κοραής είναι ένας από τους σημαντικότερους εκπροσώπους του νεοελληνικού διαφωτισμού και μνημονεύεται, ανάμεσα σε άλλα, ως πρωτοπόρος στην έκδοση έργων αρχαίας ελληνικής γραμματείας, αλλά και για τις γλωσσικές του απόψεις στην υποστήριξη της καθαρεύουσας, σε μια μετριοπαθή όμως μορφή της με σκοπό την εκκαθάριση των πλείστων ξένων λέξεων που υπήρχαν στη γλώσσα του λαού."
The test code:
kwlist = ['ελληνικού', 'Σμύρνη', 'Αδαμάντιος', 'Korais']
d = highlight_text(st, kwlist, start_tag=None, end_tag=None)
print(d)
When I'm running the code [st is the above string] only sub-strings in English get tagged. Greek substr are ignored. Notice that I run the above block on Python 2.7. When I use Python 3.4 all sub-string get replaced.
Another issue is that when I'm running the above function withing Flask application, it throws me an error: unexpected end of regular expression.
How should I tackle the above issue without using external library if possible?
I'm pulling my hairs off my head two days now.
In Python 2.7, you need to explicitly convert text to Unicode. See the fixed snippet below:
# -*- coding: utf-8 -*-
import re
def highlight_text(st, kwlist, start_tag=None, end_tag=None):
if start_tag is None:
start_tag = '<span class="nom">'
if end_tag is None:
end_tag = '</span>'
for kw in kwlist:
st = re.sub(ur'\b' + kw.decode('utf8') + ur'\b',
u'{}{}{}'.format(start_tag.decode('utf8'), kw.decode('utf8'), end_tag.decode('utf8')),
st.decode('utf8'), 0, re.U).encode("utf8")
return st
st="Korais Ο Αδαμάντιος Κοραής (Σμύρνη, 27 Απριλίου 1748 – Παρίσι, 6 Απριλίου 1833), ήταν Έλληνας φιλόλογος με βαθιά γνώση του ελληνικού πολιτισμού. Ο Κοραής είναι ένας από τους σημαντικότερους εκπροσώπους του νεοελληνικού διαφωτισμού και μνημονεύεται, ανάμεσα σε άλλα, ως πρωτοπόρος στην έκδοση έργων αρχαίας ελληνικής γραμματείας, αλλά και για τις γλωσσικές του απόψεις στην υποστήριξη της καθαρεύουσας, σε μια μετριοπαθή όμως μορφή της με σκοπό την εκκαθάριση των πλείστων ξένων λέξεων που υπήρχαν στη γλώσσα του λαού."
kwlist = ['ελληνικού', 'Σμύρνη', 'Αδαμάντιος', 'Korais']
d = highlight_text(st, kwlist, start_tag=None, end_tag=None)
print(d)
See demo
Note that all literals are declared with u prefix and all variables are decodeed and the re.sub result is encoded back to UTF8.
English get tagged. Greek substr are ignored.
Where does your st come from? Please notice that in Python 2.x 'μορφή' != u'μορφή' Maybe you are comparing str with unicode.
Suggestions: Use unicode everywhere when you can, e.g.:
kwlist = [u'ελληνικού', u'Σμύρνη', u'Αδαμάντιος', u'Korais']