Python unicode decode not working for outlook exported csv - python

Hi I've exported an outlook contacts csv file and loaded it into a python shell.
I have a number of European names in the list and the following for example
tmp = 'Fern\xc3\x9fndez'
tmp.encode("latin-1")
results in an error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
while
tmp.decode('latin-1')
gives me
u'Fern\xc3\x9fndez'
How do I get the text to read as Fernandez? (not too worried about the accents - but happy to have them)

You must be using Python 2.x. Here is one way to print out the character (depending on which encoding you are working with):
>>> tmp = 'Fern\xc3\x9fndez'
>>> print tmp.decode('utf-8') # print formats the string for stdout
Fernßndez
>>> print tmp.decode('latin1')
FernÃndez
Are you sure you have the right character? Is it utf-8? And another way:
>>> print unicode(tmp, 'latin1')
FernÃndez
>>> print unicode(tmp, 'utf-8')
Fernßndez
Interesting. So none of these options worked for you? Incidentally, I ran the string through a few other encodings just to see if any of them had a character more in line with what I would expect. Unfortunately, I don't see any that look quite right:
>>> for encoding in ['ascii', 'big5', 'big5hkscs', 'cp037', 'cp424', 'cp437', 'cp500', 'cp737', 'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875', 'cp932', 'cp949', 'cp950', 'cp1006', 'cp1026', 'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255', 'cp1256', 'cp1257', 'cp1258', 'euc_jp', 'euc_jis_2004', 'euc_jisx0213', 'euc_kr', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2', 'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', 'iso2022_kr', 'latin_1', 'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9', 'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15', 'johab', 'koi8_r', 'koi8_u', 'mac_cyrillic', 'mac_greek', 'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'ptcp154', 'shift_jis', 'shift_jis_2004', 'shift_jisx0213', 'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8']:
try:
print encoding + ': ' + tmp.decode(encoding)
except:
pass
cp037: ãÁÊ>C¤>ÀÁ:
cp437: Fernßndez
cp500: ãÁÊ>C¤>ÀÁ:
cp737: Fern├θndez
cp775: Fern├¤ndez
cp850: Fernßndez
cp852: Fern├čndez
cp855: Fern├Ъndez
cp857: Fern├şndez
cp860: Fern├Óndez
cp861: Fernßndez
cp862: Fernßndez
cp863: Fernßndez
cp865: Fernßndez
cp866: Fern├Яndez
cp869: Fern├ίndez
cp875: ΖΧΈ>Cμ>ΦΧ:
cp932: Fernテ殤dez
cp949: Fern횩ndez
cp1006: Fernﺣndez
cp1026: ãÁÊ>C¤>ÀÁ:
cp1140: ãÁÊ>C€>ÀÁ:
cp1250: FernĂźndez
cp1251: FernГџndez
cp1252: Fernßndez
cp1254: Fernßndez
cp1256: Fernأںndez
cp1258: FernĂŸndez
gbk: Fern脽ndez
gb18030: Fern脽ndez
latin_1: FernÃndez
iso8859_2: FernĂndez
iso8859_4: FernÃndez
iso8859_5: FernУndez
iso8859_6: Fernأndez
iso8859_7: FernΓndez
iso8859_9: FernÃndez
iso8859_10: FernÃndez
iso8859_13: FernĆndez
iso8859_14: FernÃndez
iso8859_15: FernÃndez
koi8_r: Fernц÷ndez
koi8_u: Fernц÷ndez
mac_cyrillic: Fern√Яndez
mac_greek: FernΟündez
mac_iceland: Fernßndez
mac_latin2: Fernßndez
mac_roman: Fernßndez
mac_turkish: Fernßndez
ptcp154: FernГҹndez
shift_jis: Fernテ殤dez
shift_jis_2004: Fernテ殤dez
shift_jisx0213: Fernテ殤dez
utf_16: 敆湲鿃摮穥
utf_16_be: 䙥牮쎟湤敺
utf_16_le: 敆湲鿃摮穥
utf_8: Fernßndez

Related

Invalid character "\u64e" in token Pylance

What is the meaning of this error Invalid character "\u64e" in token Pylance the read error line under Acc for this code, How can fixed it?
err = calculateCError()
print('Error is:', err, '%')
َAcc = 100 - err
print('َAccuracy is:',َAcc , '%')
Here's how to debug something like this:
s = """err = calculateCError()
print('Error is:', err, '%')
َAcc = 100 - err
print('َAccuracy is:',َAcc , '%')"""
print([hex(ord(c)) for c in s])
['0x65', '0x72', '0x72', '0x20', '0x3d', '0x20', '0x63', '0x61', '0x6c',
'0x63', '0x75', '0x6c', '0x61', '0x74', '0x65', '0x43', '0x45', '0x72',
'0x72', '0x6f', '0x72', '0x28', '0x29', '0xa', '0x70', '0x72', '0x69',
'0x6e', '0x74', '0x28', '0x27', '0x45', '0x72', '0x72', '0x6f', '0x72',
'0x20', '0x69', '0x73', '0x3a', '0x27', '0x2c', '0x20', '0x65', '0x72',
'0x72', '0x2c', '0x20', '0x27', '0x25', '0x27', '0x29', '0xa', '0x64e',
'0x41', '0x63', '0x63', '0x20', '0x3d', '0x20', '0x31', '0x30', '0x30',
'0x20', '0x2d', '0x20', '0x65', '0x72', '0x72', '0xa', '0x70', '0x72',
'0x69', '0x6e', '0x74', '0x28', '0x27', '0x64e', '0x41', '0x63', '0x63',
'0x75', '0x72', '0x61', '0x63', '0x79', '0x20', '0x69', '0x73', '0x3a',
'0x27', '0x2c', '0x64e', '0x41', '0x63', '0x63', '0x20', '0x2c', '0x20',
'0x27', '0x25', '0x27', '0x29']
And sure enough, there are three instances of 0x64E, always appearing before 0x41 (A). In fact, if you look carefully at your A characters, you will notice a faint slanted accent line above the A. This is called Arabic Fatha in Unicode. Here is a 320% zoom from my browser showing it more obviously:

Saving all arrays into csv instead of the last set of array only

I have issues getting my data to save properly into my csv. I have a few sets of arrays in x[zpeaks] however when i save data it only saves the last array and not all of them.
Like say for example my x[zpeaks] contains [1,2,1 ],[1,4,1],[1,3,5]. but when i wanna save all the arrays in the csv file it will only save the last array being [1,3,5].
import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
import pdb
import pandas as pd
t = []
z = []
y = []
x = []
with open("Data1r2.txt", 'r') as f:
for line in f:
parts = line.split(", ")
x.append(float(parts[0][2:]))
y.append(float(parts[1][2:]))
z.append(float(parts[2][2:]))
t.append(float(parts[3][2:]))
zz = np.array(z)
tt = np.array(t)
zminvalue = np.min(zz)
zzz = zz - zminvalue
zpeaks, _ = find_peaks(zzz)
for i in range(len(zpeaks)-1):
print(z[zpeaks[i]:zpeaks[i+1]])
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='w')
My data.txt
X:-241, Y:-31, Z:17, T:73823
X:-241, Y:-31, Z:17, T:73952
X:-240, Y:-30, Z:26, T:74073
X:-240, Y:-30, Z:26, T:74191
X:-240, Y:-30, Z:26, T:74312
X:-240, Y:-32, Z:39, T:74432
X:-240, Y:-32, Z:39, T:74549
X:-240, Y:-32, Z:39, T:74668
X:-239, Y:-21, Z:12, T:74785
X:-239, Y:-21, Z:12, T:74904
X:-239, Y:-21, Z:12, T:75022
X:-246, Y:15, Z:18, T:75142
X:-246, Y:15, Z:18, T:75260
X:-246, Y:15, Z:18, T:75378
X:-250, Y:19, Z:14, T:75498
X:-250, Y:19, Z:14, T:75615
X:-250, Y:19, Z:14, T:75732
X:-239, Y:-5, Z:27, T:75854
X:-239, Y:-5, Z:27, T:75972
X:-239, Y:-5, Z:27, T:76102
X:-236, Y:-19, Z:46, T:76240
X:-236, Y:-19, Z:46, T:76369
X:-236, Y:-19, Z:46, T:76489
X:-235, Y:-14, Z:32, T:76610
X:-235, Y:-14, Z:32, T:76727
X:-235, Y:-14, Z:32, T:76845
X:-244, Y:-16, Z:22, T:76963
X:-244, Y:-16, Z:22, T:77081
X:-244, Y:-16, Z:22, T:77201
X:-220, Y:-25, Z:-3, T:77346
X:-220, Y:-25, Z:-3, T:77464
X:-220, Y:-25, Z:-3, T:77580
X:-229, Y:24, Z:2, T:77699
X:-229, Y:24, Z:2, T:77814
X:-229, Y:24, Z:2, T:77934
X:-248, Y:-20, Z:0, T:78052
X:-248, Y:-20, Z:0, T:78171
X:-248, Y:-20, Z:0, T:78288
X:-242, Y:-15, Z:-35, T:78515
X:-242, Y:-15, Z:-35, T:78630
X:-242, Y:-15, Z:-35, T:78747
X:-235, Y:-12, Z:-63, T:78865
X:-235, Y:-12, Z:-63, T:78982
X:-235, Y:-12, Z:-63, T:79102
X:-226, Y:-35, Z:-145, T:79221
X:-226, Y:-35, Z:-145, T:79340
X:-226, Y:-35, Z:-145, T:79461
X:-205, Y:-47, Z:-156, T:79582
X:-205, Y:-47, Z:-156, T:79702
X:-205, Y:-47, Z:-156, T:79821
X:-208, Y:-39, Z:-149, T:79940
X:-208, Y:-39, Z:-149, T:80061
X:-208, Y:-39, Z:-149, T:80181
X:-235, Y:-16, Z:-99, T:80304
X:-235, Y:-16, Z:-99, T:80432
X:-235, Y:-16, Z:-99, T:80657
X:-247, Y:-10, Z:12, T:80774
X:-247, Y:-10, Z:12, T:80890
X:-247, Y:-10, Z:12, T:81008
X:-242, Y:-1, Z:2, T:81127
X:-242, Y:-1, Z:2, T:81246
X:-242, Y:-1, Z:2, T:81363
X:-239, Y:-8, Z:15, T:81483
X:-239, Y:-8, Z:15, T:81600
X:-239, Y:-8, Z:15, T:81720
X:-241, Y:-13, Z:-11, T:81841
X:-241, Y:-13, Z:-11, T:81958
X:-241, Y:-13, Z:-11, T:82076
X:-242, Y:-5, Z:-37, T:82198
X:-242, Y:-5, Z:-37, T:82315
X:-242, Y:-5, Z:-37, T:82435
X:-215, Y:-43, Z:-128, T:82554
X:-215, Y:-43, Z:-128, T:82699
X:-215, Y:-43, Z:-128, T:82829
X:-207, Y:-48, Z:-153, T:82952
X:-207, Y:-48, Z:-153, T:83072
X:-207, Y:-48, Z:-153, T:83191
X:-198, Y:-37, Z:-166, T:83315
X:-198, Y:-37, Z:-166, T:83453
X:-198, Y:-37, Z:-166, T:83572
X:-218, Y:-33, Z:-134, T:83694
X:-218, Y:-33, Z:-134, T:83812
X:-218, Y:-33, Z:-134, T:83932
X:-228, Y:-15, Z:-80, T:84047
X:-228, Y:-15, Z:-80, T:84166
X:-228, Y:-15, Z:-80, T:84288
X:-243, Y:-8, Z:-4, T:84407
X:-243, Y:-8, Z:-4, T:84524
X:-243, Y:-8, Z:-4, T:84640
X:-238, Y:-4, Z:2, T:84756
X:-238, Y:-4, Z:2, T:84872
X:-238, Y:-4, Z:2, T:84994
X:-252, Y:-7, Z:-16, T:85136
X:-252, Y:-7, Z:-16, T:85265
X:-252, Y:-7, Z:-16, T:85385
X:-243, Y:-3, Z:-28, T:85504
X:-243, Y:-3, Z:-28, T:85618
X:-243, Y:-3, Z:-28, T:85739
X:-241, Y:-3, Z:-48, T:85858
X:-241, Y:-3, Z:-48, T:85975
X:-241, Y:-3, Z:-48, T:86094
X:-231, Y:-15, Z:-112, T:86216
X:-231, Y:-15, Z:-112, T:86334
X:-231, Y:-15, Z:-112, T:86453
X:-210, Y:-43, Z:-150, T:86573
X:-210, Y:-43, Z:-150, T:86691
X:-210, Y:-43, Z:-150, T:86811
X:-193, Y:-58, Z:-169, T:86933
X:-193, Y:-58, Z:-169, T:87051
X:-193, Y:-58, Z:-169, T:87171
X:-182, Y:-27, Z:-179, T:87305
X:-182, Y:-27, Z:-179, T:87435
X:-182, Y:-27, Z:-179, T:87566
X:-212, Y:-19, Z:-136, T:87686
X:-212, Y:-19, Z:-136, T:87803
X:-212, Y:-19, Z:-136, T:87920
X:-233, Y:-25, Z:-83, T:88040
X:-233, Y:-25, Z:-83, T:88160
X:-233, Y:-25, Z:-83, T:88278
X:-243, Y:-16, Z:-31, T:88396
X:-243, Y:-16, Z:-31, T:88510
X:-243, Y:-16, Z:-31, T:88625
X:-244, Y:-13, Z:-27, T:88744
X:-244, Y:-13, Z:-27, T:88860
X:-244, Y:-13, Z:-27, T:88978
X:-243, Y:-15, Z:-51, T:89099
X:-243, Y:-15, Z:-51, T:89218
X:-243, Y:-15, Z:-51, T:89338
X:-228, Y:-27, Z:-78, T:89472
X:-228, Y:-27, Z:-78, T:89601
X:-228, Y:-27, Z:-78, T:89746
X:-223, Y:-24, Z:-114, T:89876
X:-223, Y:-24, Z:-114, T:89995
X:-223, Y:-24, Z:-114, T:90115
X:-205, Y:-42, Z:-141, T:90236
X:-205, Y:-42, Z:-141, T:90354
X:-205, Y:-42, Z:-141, T:90474
X:-199, Y:-67, Z:-153, T:90595
X:-199, Y:-67, Z:-153, T:90713
X:-199, Y:-67, Z:-153, T:90833
X:-202, Y:-53, Z:-152, T:90951
X:-202, Y:-53, Z:-152, T:91069
X:-202, Y:-53, Z:-152, T:91191
X:-224, Y:-41, Z:-135, T:91312
X:-224, Y:-41, Z:-135, T:91431
X:-224, Y:-41, Z:-135, T:91549
X:-229, Y:-29, Z:-91, T:91669
X:-229, Y:-29, Z:-91, T:91789
X:-229, Y:-29, Z:-91, T:91923
X:-242, Y:-8, Z:-2, T:92066
X:-242, Y:-8, Z:-2, T:92184
X:-242, Y:-8, Z:-2, T:92302
X:-233, Y:-12, Z:-5, T:92420
X:-233, Y:-12, Z:-5, T:92534
X:-233, Y:-12, Z:-5, T:92654
X:-246, Y:-1, Z:-4, T:92773
X:-246, Y:-1, Z:-4, T:92892
X:-246, Y:-1, Z:-4, T:93010
X:-242, Y:-9, Z:-23, T:93130
X:-242, Y:-9, Z:-23, T:93251
X:-242, Y:-9, Z:-23, T:93370
X:-237, Y:-19, Z:-46, T:93491
X:-237, Y:-19, Z:-46, T:93608
X:-237, Y:-19, Z:-46, T:93727
X:-213, Y:-23, Z:-95, T:93849
X:-213, Y:-23, Z:-95, T:93966
X:-213, Y:-23, Z:-95, T:94112
X:-207, Y:-36, Z:-151, T:94241
X:-207, Y:-36, Z:-151, T:94359
X:-207, Y:-36, Z:-151, T:94480
X:-199, Y:-49, Z:-162, T:94600
X:-199, Y:-49, Z:-162, T:94721
X:-199, Y:-49, Z:-162, T:94840
X:-203, Y:-36, Z:-146, T:94961
X:-203, Y:-36, Z:-146, T:95082
X:-203, Y:-36, Z:-146, T:95202
X:-222, Y:-28, Z:-124, T:95324
X:-222, Y:-28, Z:-124, T:95439
X:-222, Y:-28, Z:-124, T:95583
X:-244, Y:2, Z:-53, T:95700
X:-244, Y:2, Z:-53, T:95817
X:-244, Y:2, Z:-53, T:95935
X:-237, Y:-5, Z:-9, T:96055
X:-237, Y:-5, Z:-9, T:96171
X:-237, Y:-5, Z:-9, T:96301
X:-239, Y:-2, Z:1, T:96439
X:-239, Y:-2, Z:1, T:96568
X:-239, Y:-2, Z:1, T:96685
X:-243, Y:-4, Z:2, T:96805
X:-243, Y:-4, Z:2, T:96919
X:-243, Y:-4, Z:2, T:97037
X:-246, Y:-3, Z:-16, T:97159
X:-246, Y:-3, Z:-16, T:97276
X:-246, Y:-3, Z:-16, T:97395
X:-239, Y:-8, Z:-42, T:97513
X:-239, Y:-8, Z:-42, T:97631
X:-239, Y:-8, Z:-42, T:97752
X:-221, Y:-10, Z:-115, T:97871
X:-221, Y:-10, Z:-115, T:97990
X:-221, Y:-10, Z:-115, T:98109
X:-219, Y:-25, Z:-145, T:98230
X:-219, Y:-25, Z:-145, T:98350
X:-219, Y:-25, Z:-145, T:98468
X:-202, Y:-31, Z:-172, T:98589
X:-202, Y:-31, Z:-172, T:98736
X:-202, Y:-31, Z:-172, T:98865
X:-214, Y:-34, Z:-144, T:98985
X:-214, Y:-34, Z:-144, T:99101
X:-214, Y:-34, Z:-144, T:99223
X:-224, Y:-24, Z:-116, T:99342
X:-224, Y:-24, Z:-116, T:99460
X:-224, Y:-24, Z:-116, T:99579
X:-232, Y:2, Z:-50, T:99699
X:-232, Y:2, Z:-50, T:99818
X:-232, Y:2, Z:-50, T:99936
X:-241, Y:-4, Z:-22, T:100056
X:-241, Y:-4, Z:-22, T:100175
X:-241, Y:-4, Z:-22, T:100293
X:-240, Y:4, Z:-2, T:100414
X:-240, Y:4, Z:-2, T:100532
X:-240, Y:4, Z:-2, T:100648
X:-241, Y:3, Z:1, T:100768
X:-241, Y:3, Z:1, T:100895
X:-241, Y:3, Z:1, T:101029
X:-243, Y:1, Z:-16, T:101160
X:-243, Y:1, Z:-16, T:101278
X:-243, Y:1, Z:-16, T:101399
X:-239, Y:-2, Z:-36, T:101518
X:-239, Y:-2, Z:-36, T:101661
X:-239, Y:-2, Z:-36, T:101780
X:-228, Y:-12, Z:-71, T:101901
X:-228, Y:-12, Z:-71, T:102019
X:-228, Y:-12, Z:-71, T:102138
X:-224, Y:-23, Z:-118, T:102260
X:-224, Y:-23, Z:-118, T:102378
X:-224, Y:-23, Z:-118, T:102498
X:-209, Y:-2, Z:-161, T:102617
X:-209, Y:-2, Z:-161, T:102735
X:-209, Y:-2, Z:-161, T:102855
X:-206, Y:-3, Z:-150, T:102974
X:-206, Y:-3, Z:-150, T:103088
X:-206, Y:-3, Z:-150, T:103216
X:-218, Y:0, Z:-142, T:103355
X:-218, Y:0, Z:-142, T:103469
X:-218, Y:0, Z:-142, T:103581
X:-226, Y:-17, Z:-118, T:103700
X:-226, Y:-17, Z:-118, T:103814
X:-226, Y:-17, Z:-118, T:103931
X:-242, Y:4, Z:-40, T:104054
X:-242, Y:4, Z:-40, T:104171
X:-242, Y:4, Z:-40, T:104292
X:-242, Y:4, Z:-22, T:104410
X:-242, Y:4, Z:-22, T:104523
X:-242, Y:4, Z:-22, T:104642
X:-240, Y:5, Z:-3, T:104762
X:-240, Y:5, Z:-3, T:104879
X:-240, Y:5, Z:-3, T:104993
X:-244, Y:-2, Z:-6, T:105111
X:-244, Y:-2, Z:-6, T:105231
X:-244, Y:-2, Z:-6, T:105361
X:-244, Y:1, Z:-10, T:105497
X:-244, Y:1, Z:-10, T:105623
X:-244, Y:1, Z:-10, T:105744
X:-244, Y:-4, Z:-34, T:105865
X:-244, Y:-4, Z:-34, T:105981
X:-244, Y:-4, Z:-34, T:106101
X:-231, Y:-1, Z:-63, T:106222
X:-231, Y:-1, Z:-63, T:106341
X:-231, Y:-1, Z:-63, T:106462
X:-222, Y:-11, Z:-116, T:106580
X:-222, Y:-11, Z:-116, T:106698
X:-222, Y:-11, Z:-116, T:106818
X:-219, Y:-15, Z:-144, T:106938
X:-219, Y:-15, Z:-144, T:107058
X:-219, Y:-15, Z:-144, T:107174
X:-204, Y:-6, Z:-150, T:107297
X:-204, Y:-6, Z:-150, T:107410
X:-204, Y:-6, Z:-150, T:107528
X:-196, Y:-5, Z:-163, T:107665
X:-196, Y:-5, Z:-163, T:107802
X:-196, Y:-5, Z:-163, T:107935
X:-214, Y:-2, Z:-153, T:108066
X:-214, Y:-2, Z:-153, T:108186
X:-214, Y:-2, Z:-153, T:108306
X:-223, Y:-12, Z:-123, T:108422
X:-223, Y:-12, Z:-123, T:108544
X:-223, Y:-12, Z:-123, T:108661
X:-230, Y:7, Z:-52, T:108783
X:-230, Y:7, Z:-52, T:108900
X:-230, Y:7, Z:-52, T:109019
X:-241, Y:9, Z:-25, T:109139
X:-241, Y:9, Z:-25, T:109258
X:-241, Y:9, Z:-25, T:109375
X:-245, Y:4, Z:-12, T:109496
X:-245, Y:4, Z:-12, T:109612
X:-245, Y:4, Z:-12, T:109732
X:-242, Y:3, Z:-6, T:109852
X:-242, Y:3, Z:-6, T:109968
X:-242, Y:3, Z:-6, T:110098
X:-239, Y:-4, Z:-35, T:110243
X:-239, Y:-4, Z:-35, T:110362
X:-239, Y:-4, Z:-35, T:110484
X:-235, Y:6, Z:-65, T:110606
X:-235, Y:6, Z:-65, T:110722
X:-235, Y:6, Z:-65, T:110840
X:-215, Y:-14, Z:-117, T:110962
X:-215, Y:-14, Z:-117, T:111081
X:-215, Y:-14, Z:-117, T:111204
X:-224, Y:7, Z:-146, T:111324
X:-224, Y:7, Z:-146, T:111441
X:-224, Y:7, Z:-146, T:111561
X:-209, Y:-6, Z:-149, T:111679
X:-209, Y:-6, Z:-149, T:111799
X:-209, Y:-6, Z:-149, T:111919
X:-219, Y:-8, Z:-140, T:112038
X:-219, Y:-8, Z:-140, T:112157
X:-219, Y:-8, Z:-140, T:112274
X:-226, Y:-3, Z:-116, T:112405
X:-226, Y:-3, Z:-116, T:112540
X:-226, Y:-3, Z:-116, T:112669
X:-233, Y:2, Z:-76, T:112792
X:-233, Y:2, Z:-76, T:112909
X:-233, Y:2, Z:-76, T:113028
X:-237, Y:7, Z:-35, T:113148
X:-237, Y:7, Z:-35, T:113266
X:-237, Y:7, Z:-35, T:113386
X:-242, Y:5, Z:-15, T:113504
X:-242, Y:5, Z:-15, T:113624
X:-242, Y:5, Z:-15, T:113764
X:-244, Y:5, Z:-3, T:113884
X:-244, Y:5, Z:-3, T:113999
X:-244, Y:5, Z:-3, T:114118
X:-242, Y:3, Z:-7, T:114239
X:-242, Y:3, Z:-7, T:114357
X:-242, Y:3, Z:-7, T:114473
X:-241, Y:0, Z:-30, T:114595
X:-241, Y:0, Z:-30, T:114720
X:-241, Y:0, Z:-30, T:114867
X:-227, Y:-13, Z:-95, T:114989
X:-227, Y:-13, Z:-95, T:115104
X:-227, Y:-13, Z:-95, T:115224
X:-212, Y:-5, Z:-114, T:115343
X:-212, Y:-5, Z:-114, T:115462
X:-212, Y:-5, Z:-114, T:115579
X:-215, Y:-6, Z:-145, T:115701
X:-215, Y:-6, Z:-145, T:115819
X:-215, Y:-6, Z:-145, T:115937
X:-210, Y:5, Z:-142, T:116059
X:-210, Y:5, Z:-142, T:116176
X:-210, Y:5, Z:-142, T:116296
X:-222, Y:-19, Z:-145, T:116415
X:-222, Y:-19, Z:-145, T:116534
X:-222, Y:-19, Z:-145, T:116655
X:-231, Y:6, Z:-119, T:116775
X:-231, Y:6, Z:-119, T:116894
X:-231, Y:6, Z:-119, T:117023
The issue is that since your .to_csv call is within the loop, 'Diff.csv' is being overwritten every time. Only the last time it's written is what you end up seeing.
There are a few solutions.
.to_csv(mode='a')
This uses append mode, so it will not overwrite the entire file. You will also want to specify header=None so that it doesn't constantly write the header column in the middle of the file. If you want you can add the header once before the loop.
for i in range(len(zpeaks)-1):
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),
pd.DataFrame(b),
pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='a', header=None)
Create a list, concat after the loop
Add your DataFrames to a list within the loop, then concatenate when the loop finishes and save the full DataFrame to a file at once.
l = []
for i in range(len(zpeaks)-1):
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
l.append(pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1))
#pd.concat(l).to_csv('Diff.csv') # No column names
pd.concat(l).rename(columns=lambda x, y=iter(['x', 'y', 'z']): next(y)).to_csv('Diff.csv')
I'm not entirely sure, since I myself am still learning Python, but it looks like it may be due to your indentation at the end.
It should look like this:
for i in range(len(zpeaks)-1):
print(z[zpeaks[i]:zpeaks[i+1]])
a = (x[zpeaks[i]:zpeaks[i+1]])
b = (y[zpeaks[i]:zpeaks[i+1]])
c = (z[zpeaks[i]:zpeaks[i+1]])
pd.concat([pd.DataFrame(a),pd.DataFrame(b), pd.DataFrame(c)], axis=1).to_csv('Diff.csv', mode='w')
The last line should be within the for loop in order to add each row to the csv.

Python / Get unique tokens from a file with a exception

I want to find the number of unique tokens in a file. For this purpose I wrote the below code:
splittedWords = open('output.txt', encoding='windows-1252').read().lower().split()
uniqueValues = set(splittedWords)
print(uniqueValues)
The output.txt file is like this:
Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl
club+Noun toplanti+Noun+A3pl+P3sg
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc
nispi+Adj
nisbi+Adj
görece+Adj+With
izafi+Adj
obur+Adj
With this code I can get the unique tokens like Türkiye+Noun, Türkiye+Noun+Gen. But I want to get forexample Türkiye+Noun, Türkiye+Noun+Gen like only one token before the + sign. I only want Türkiye part. In the end Türkiye+Noun and Türkiye+Noun+Gen tokens needs to be same and only treated as a single unique token. I think I need to write regex for this purpose.
It seems the word you want is always the 1st in a list of '+'-joined words:
Split the splitted words at + and take the 0th one:
text = """Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl
club+Noun toplanti+Noun+A3pl+P3sg
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc
nispi+Adj
nisbi+Adj
görece+Adj+With
izafi+Adj
obur+Adj """
splittedWords = text.lower().replace("\n"," ").split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))
print(uniqueValues)
Output:
{'imha', 'çaba', 'ülke', 'arzula', 'terörizm', 'olus', 'daha', 'istikrar', 'küresel',
'sagla', 'önle', 'üzere', 'nisbi', 'türkiye', 'gelis', 'bir', 'karar', 'hedef', '2',
've', 'silah', 'kur', 'alan', 'club', 'boyut', '-', 'anlasma', 'iliski',
'izafi', 'kurumsal', 'karsi', 'ankara', 'ortaklik', 'obur', 'kitle', 'güven',
'uygula', 'ol', 'düzey', 'konsey', 'teknik', 'rejim', 'komite', 'gümrük', 'samimi',
'gel', 'yay', 'toplanti', '.', 'asama', 'mahiyet', 'ab', '69', 'için',
'paylas', '6', '/', 'nispi', 'dünya', 'at', 'sayili', 'görece', 'isbirlik', 'birlik',
',', 'tüm', 'ile', 'düzen', 'uyar', 'göster', 'tehdit', 'madde'}
You might need to do some additional cleanup to remove things like
',' '6' '/'
Split and remove anything thats just numbers or punctuation
from string import digits, punctuation
remove=set(digits+punctuation)
splittedWords = text.lower().split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))
# remove from set anything that only consists of numbers or punctuation
uniqueValues = uniqueValues - set ( x for x in uniqueValues if all(c in remove for c in x))
print(uniqueValues)
to get it as:
{'teknik', 'yay', 'göster','hedef', 'terörizm', 'ortaklik','ile', 'daha', 'ol', 'istikrar',
'paylas', 'nispi', 'üzere', 'sagla', 'tüm', 'önle', 'asama', 'uygula', 'güven', 'kur',
'türkiye', 'gel', 'dünya', 'gelis', 'sayili', 'ab', 'club', 'küresel', 'imha', 'çaba',
'olus', 'iliski', 'izafi', 'mahiyet', 've', 'düzey', 'anlasma', 'tehdit', 'bir', 'düzen',
'obur', 'samimi', 'boyut', 'ülke', 'arzula', 'rejim', 'gümrük', 'karar', 'at', 'karsi',
'nisbi', 'isbirlik', 'alan', 'toplanti', 'ankara', 'birlik', 'kurumsal', 'için', 'kitle',
'komite', 'silah', 'görece', 'uyar', 'madde', 'konsey'}
You can split all the tokens you have now on "+" and take only the first one.
uniqueValues = set(map(lambda x: x.split('+')[0], splittedWords))
Here I use map. Map will apply the function (the lambda part) on all values of the splittedWords.

How do i annotate all peak values

import matplotlib.pyplot as plt
import numpy as np
from scipy.signal import find_peaks
import pdb
file = open("Data1r2.txt", 'r')
lines = file.readlines()
file.close()
t = []
z = []
y = []
x = []
with open("Data1r2.txt", 'r') as f:
for line in f:
parts = line.split(", ")
x.append(float(parts[0][2:]))
y.append(float(parts[1][2:]))
z.append(float(parts[2][2:]))
t.append(float(parts[3][2:]))
This part im mainly annotating the highest peak value of the graph, but how can i annotate all peak values at a fixed distance? say distance = 10,000
fig = plt.figure()
ax = fig.add_subplot(111)
line, = ax.plot(t, z)
ymax = max(z)
xpos = z.index(ymax)
xmax = t[xpos]
text= "x={:.1f}, y={:.1f}".format(xmax, ymax) #Annotation(correct)
ax.annotate(text, xy=(xmax, ymax), xytext=(xmax, ymax),
arrowprops=dict(facecolor='black', shrink=0.05),
)
plt.legend()
plt.show()
This is what i currently have annotating only the peak value:
Data
X:-241, Y:-31, Z:17, T:73823
X:-241, Y:-31, Z:17, T:73952
X:-240, Y:-30, Z:26, T:74073
X:-240, Y:-30, Z:26, T:74191
X:-240, Y:-30, Z:26, T:74312
X:-240, Y:-32, Z:39, T:74432
X:-240, Y:-32, Z:39, T:74549
X:-240, Y:-32, Z:39, T:74668
X:-239, Y:-21, Z:12, T:74785
X:-239, Y:-21, Z:12, T:74904
X:-239, Y:-21, Z:12, T:75022
X:-246, Y:15, Z:18, T:75142
X:-246, Y:15, Z:18, T:75260
X:-246, Y:15, Z:18, T:75378
X:-250, Y:19, Z:14, T:75498
X:-250, Y:19, Z:14, T:75615
X:-250, Y:19, Z:14, T:75732
X:-239, Y:-5, Z:27, T:75854
X:-239, Y:-5, Z:27, T:75972
X:-239, Y:-5, Z:27, T:76102
X:-236, Y:-19, Z:46, T:76240
X:-236, Y:-19, Z:46, T:76369
X:-236, Y:-19, Z:46, T:76489
X:-235, Y:-14, Z:32, T:76610
X:-235, Y:-14, Z:32, T:76727
X:-235, Y:-14, Z:32, T:76845
X:-244, Y:-16, Z:22, T:76963
X:-244, Y:-16, Z:22, T:77081
X:-244, Y:-16, Z:22, T:77201
X:-220, Y:-25, Z:-3, T:77346
X:-220, Y:-25, Z:-3, T:77464
X:-220, Y:-25, Z:-3, T:77580
X:-229, Y:24, Z:2, T:77699
X:-229, Y:24, Z:2, T:77814
X:-229, Y:24, Z:2, T:77934
X:-248, Y:-20, Z:0, T:78052
X:-248, Y:-20, Z:0, T:78171
X:-248, Y:-20, Z:0, T:78288
X:-242, Y:-15, Z:-35, T:78515
X:-242, Y:-15, Z:-35, T:78630
X:-242, Y:-15, Z:-35, T:78747
X:-235, Y:-12, Z:-63, T:78865
X:-235, Y:-12, Z:-63, T:78982
X:-235, Y:-12, Z:-63, T:79102
X:-226, Y:-35, Z:-145, T:79221
X:-226, Y:-35, Z:-145, T:79340
X:-226, Y:-35, Z:-145, T:79461
X:-205, Y:-47, Z:-156, T:79582
X:-205, Y:-47, Z:-156, T:79702
X:-205, Y:-47, Z:-156, T:79821
X:-208, Y:-39, Z:-149, T:79940
X:-208, Y:-39, Z:-149, T:80061
X:-208, Y:-39, Z:-149, T:80181
X:-235, Y:-16, Z:-99, T:80304
X:-235, Y:-16, Z:-99, T:80432
X:-235, Y:-16, Z:-99, T:80657
X:-247, Y:-10, Z:12, T:80774
X:-247, Y:-10, Z:12, T:80890
X:-247, Y:-10, Z:12, T:81008
X:-242, Y:-1, Z:2, T:81127
X:-242, Y:-1, Z:2, T:81246
X:-242, Y:-1, Z:2, T:81363
X:-239, Y:-8, Z:15, T:81483
X:-239, Y:-8, Z:15, T:81600
X:-239, Y:-8, Z:15, T:81720
X:-241, Y:-13, Z:-11, T:81841
X:-241, Y:-13, Z:-11, T:81958
X:-241, Y:-13, Z:-11, T:82076
X:-242, Y:-5, Z:-37, T:82198
X:-242, Y:-5, Z:-37, T:82315
X:-242, Y:-5, Z:-37, T:82435
X:-215, Y:-43, Z:-128, T:82554
X:-215, Y:-43, Z:-128, T:82699
X:-215, Y:-43, Z:-128, T:82829
X:-207, Y:-48, Z:-153, T:82952
X:-207, Y:-48, Z:-153, T:83072
X:-207, Y:-48, Z:-153, T:83191
X:-198, Y:-37, Z:-166, T:83315
X:-198, Y:-37, Z:-166, T:83453
X:-198, Y:-37, Z:-166, T:83572
X:-218, Y:-33, Z:-134, T:83694
X:-218, Y:-33, Z:-134, T:83812
X:-218, Y:-33, Z:-134, T:83932
X:-228, Y:-15, Z:-80, T:84047
X:-228, Y:-15, Z:-80, T:84166
X:-228, Y:-15, Z:-80, T:84288
X:-243, Y:-8, Z:-4, T:84407
X:-243, Y:-8, Z:-4, T:84524
X:-243, Y:-8, Z:-4, T:84640
X:-238, Y:-4, Z:2, T:84756
X:-238, Y:-4, Z:2, T:84872
X:-238, Y:-4, Z:2, T:84994
X:-252, Y:-7, Z:-16, T:85136
X:-252, Y:-7, Z:-16, T:85265
X:-252, Y:-7, Z:-16, T:85385
X:-243, Y:-3, Z:-28, T:85504
X:-243, Y:-3, Z:-28, T:85618
X:-243, Y:-3, Z:-28, T:85739
X:-241, Y:-3, Z:-48, T:85858
X:-241, Y:-3, Z:-48, T:85975
X:-241, Y:-3, Z:-48, T:86094
X:-231, Y:-15, Z:-112, T:86216
X:-231, Y:-15, Z:-112, T:86334
X:-231, Y:-15, Z:-112, T:86453
X:-210, Y:-43, Z:-150, T:86573
X:-210, Y:-43, Z:-150, T:86691
X:-210, Y:-43, Z:-150, T:86811
X:-193, Y:-58, Z:-169, T:86933
X:-193, Y:-58, Z:-169, T:87051
X:-193, Y:-58, Z:-169, T:87171
X:-182, Y:-27, Z:-179, T:87305
X:-182, Y:-27, Z:-179, T:87435
X:-182, Y:-27, Z:-179, T:87566
X:-212, Y:-19, Z:-136, T:87686
X:-212, Y:-19, Z:-136, T:87803
X:-212, Y:-19, Z:-136, T:87920
X:-233, Y:-25, Z:-83, T:88040
X:-233, Y:-25, Z:-83, T:88160
X:-233, Y:-25, Z:-83, T:88278
X:-243, Y:-16, Z:-31, T:88396
X:-243, Y:-16, Z:-31, T:88510
X:-243, Y:-16, Z:-31, T:88625
X:-244, Y:-13, Z:-27, T:88744
X:-244, Y:-13, Z:-27, T:88860
X:-244, Y:-13, Z:-27, T:88978
X:-243, Y:-15, Z:-51, T:89099
X:-243, Y:-15, Z:-51, T:89218
X:-243, Y:-15, Z:-51, T:89338
X:-228, Y:-27, Z:-78, T:89472
X:-228, Y:-27, Z:-78, T:89601
X:-228, Y:-27, Z:-78, T:89746
X:-223, Y:-24, Z:-114, T:89876
X:-223, Y:-24, Z:-114, T:89995
X:-223, Y:-24, Z:-114, T:90115
X:-205, Y:-42, Z:-141, T:90236
X:-205, Y:-42, Z:-141, T:90354
X:-205, Y:-42, Z:-141, T:90474
X:-199, Y:-67, Z:-153, T:90595
X:-199, Y:-67, Z:-153, T:90713
X:-199, Y:-67, Z:-153, T:90833
X:-202, Y:-53, Z:-152, T:90951
X:-202, Y:-53, Z:-152, T:91069
X:-202, Y:-53, Z:-152, T:91191
X:-224, Y:-41, Z:-135, T:91312
X:-224, Y:-41, Z:-135, T:91431
X:-224, Y:-41, Z:-135, T:91549
X:-229, Y:-29, Z:-91, T:91669
X:-229, Y:-29, Z:-91, T:91789
X:-229, Y:-29, Z:-91, T:91923
X:-242, Y:-8, Z:-2, T:92066
X:-242, Y:-8, Z:-2, T:92184
X:-242, Y:-8, Z:-2, T:92302
X:-233, Y:-12, Z:-5, T:92420
X:-233, Y:-12, Z:-5, T:92534
X:-233, Y:-12, Z:-5, T:92654
X:-246, Y:-1, Z:-4, T:92773
X:-246, Y:-1, Z:-4, T:92892
X:-246, Y:-1, Z:-4, T:93010
X:-242, Y:-9, Z:-23, T:93130
X:-242, Y:-9, Z:-23, T:93251
X:-242, Y:-9, Z:-23, T:93370
X:-237, Y:-19, Z:-46, T:93491
X:-237, Y:-19, Z:-46, T:93608
X:-237, Y:-19, Z:-46, T:93727
X:-213, Y:-23, Z:-95, T:93849
X:-213, Y:-23, Z:-95, T:93966
X:-213, Y:-23, Z:-95, T:94112
X:-207, Y:-36, Z:-151, T:94241
X:-207, Y:-36, Z:-151, T:94359
X:-207, Y:-36, Z:-151, T:94480
X:-199, Y:-49, Z:-162, T:94600
X:-199, Y:-49, Z:-162, T:94721
X:-199, Y:-49, Z:-162, T:94840
X:-203, Y:-36, Z:-146, T:94961
X:-203, Y:-36, Z:-146, T:95082
X:-203, Y:-36, Z:-146, T:95202
X:-222, Y:-28, Z:-124, T:95324
X:-222, Y:-28, Z:-124, T:95439
X:-222, Y:-28, Z:-124, T:95583
X:-244, Y:2, Z:-53, T:95700
X:-244, Y:2, Z:-53, T:95817
X:-244, Y:2, Z:-53, T:95935
X:-237, Y:-5, Z:-9, T:96055
X:-237, Y:-5, Z:-9, T:96171
X:-237, Y:-5, Z:-9, T:96301
X:-239, Y:-2, Z:1, T:96439
X:-239, Y:-2, Z:1, T:96568
X:-239, Y:-2, Z:1, T:96685
X:-243, Y:-4, Z:2, T:96805
X:-243, Y:-4, Z:2, T:96919
X:-243, Y:-4, Z:2, T:97037
X:-246, Y:-3, Z:-16, T:97159
X:-246, Y:-3, Z:-16, T:97276
X:-246, Y:-3, Z:-16, T:97395
X:-239, Y:-8, Z:-42, T:97513
X:-239, Y:-8, Z:-42, T:97631
X:-239, Y:-8, Z:-42, T:97752
X:-221, Y:-10, Z:-115, T:97871
X:-221, Y:-10, Z:-115, T:97990
X:-221, Y:-10, Z:-115, T:98109
X:-219, Y:-25, Z:-145, T:98230
X:-219, Y:-25, Z:-145, T:98350
X:-219, Y:-25, Z:-145, T:98468
X:-202, Y:-31, Z:-172, T:98589
X:-202, Y:-31, Z:-172, T:98736
X:-202, Y:-31, Z:-172, T:98865
X:-214, Y:-34, Z:-144, T:98985
X:-214, Y:-34, Z:-144, T:99101
X:-214, Y:-34, Z:-144, T:99223
X:-224, Y:-24, Z:-116, T:99342
X:-224, Y:-24, Z:-116, T:99460
X:-224, Y:-24, Z:-116, T:99579
X:-232, Y:2, Z:-50, T:99699
X:-232, Y:2, Z:-50, T:99818
X:-232, Y:2, Z:-50, T:99936
X:-241, Y:-4, Z:-22, T:100056
X:-241, Y:-4, Z:-22, T:100175
X:-241, Y:-4, Z:-22, T:100293
X:-240, Y:4, Z:-2, T:100414
X:-240, Y:4, Z:-2, T:100532
X:-240, Y:4, Z:-2, T:100648
X:-241, Y:3, Z:1, T:100768
X:-241, Y:3, Z:1, T:100895
X:-241, Y:3, Z:1, T:101029
X:-243, Y:1, Z:-16, T:101160
X:-243, Y:1, Z:-16, T:101278
X:-243, Y:1, Z:-16, T:101399
X:-239, Y:-2, Z:-36, T:101518
X:-239, Y:-2, Z:-36, T:101661
X:-239, Y:-2, Z:-36, T:101780
X:-228, Y:-12, Z:-71, T:101901
X:-228, Y:-12, Z:-71, T:102019
X:-228, Y:-12, Z:-71, T:102138
X:-224, Y:-23, Z:-118, T:102260
X:-224, Y:-23, Z:-118, T:102378
X:-224, Y:-23, Z:-118, T:102498
X:-209, Y:-2, Z:-161, T:102617
X:-209, Y:-2, Z:-161, T:102735
X:-209, Y:-2, Z:-161, T:102855
X:-206, Y:-3, Z:-150, T:102974
X:-206, Y:-3, Z:-150, T:103088
X:-206, Y:-3, Z:-150, T:103216
X:-218, Y:0, Z:-142, T:103355
X:-218, Y:0, Z:-142, T:103469
X:-218, Y:0, Z:-142, T:103581
X:-226, Y:-17, Z:-118, T:103700
X:-226, Y:-17, Z:-118, T:103814
X:-226, Y:-17, Z:-118, T:103931
X:-242, Y:4, Z:-40, T:104054
X:-242, Y:4, Z:-40, T:104171
X:-242, Y:4, Z:-40, T:104292
X:-242, Y:4, Z:-22, T:104410
X:-242, Y:4, Z:-22, T:104523
X:-242, Y:4, Z:-22, T:104642
X:-240, Y:5, Z:-3, T:104762
X:-240, Y:5, Z:-3, T:104879
X:-240, Y:5, Z:-3, T:104993
X:-244, Y:-2, Z:-6, T:105111
X:-244, Y:-2, Z:-6, T:105231
X:-244, Y:-2, Z:-6, T:105361
X:-244, Y:1, Z:-10, T:105497
X:-244, Y:1, Z:-10, T:105623
X:-244, Y:1, Z:-10, T:105744
X:-244, Y:-4, Z:-34, T:105865
X:-244, Y:-4, Z:-34, T:105981
X:-244, Y:-4, Z:-34, T:106101
X:-231, Y:-1, Z:-63, T:106222
X:-231, Y:-1, Z:-63, T:106341
X:-231, Y:-1, Z:-63, T:106462
X:-222, Y:-11, Z:-116, T:106580
X:-222, Y:-11, Z:-116, T:106698
X:-222, Y:-11, Z:-116, T:106818
X:-219, Y:-15, Z:-144, T:106938
X:-219, Y:-15, Z:-144, T:107058
X:-219, Y:-15, Z:-144, T:107174
X:-204, Y:-6, Z:-150, T:107297
X:-204, Y:-6, Z:-150, T:107410
X:-204, Y:-6, Z:-150, T:107528
X:-196, Y:-5, Z:-163, T:107665
X:-196, Y:-5, Z:-163, T:107802
X:-196, Y:-5, Z:-163, T:107935
X:-214, Y:-2, Z:-153, T:108066
X:-214, Y:-2, Z:-153, T:108186
X:-214, Y:-2, Z:-153, T:108306
X:-223, Y:-12, Z:-123, T:108422
X:-223, Y:-12, Z:-123, T:108544
X:-223, Y:-12, Z:-123, T:108661
X:-230, Y:7, Z:-52, T:108783
X:-230, Y:7, Z:-52, T:108900
X:-230, Y:7, Z:-52, T:109019
X:-241, Y:9, Z:-25, T:109139
X:-241, Y:9, Z:-25, T:109258
X:-241, Y:9, Z:-25, T:109375
X:-245, Y:4, Z:-12, T:109496
X:-245, Y:4, Z:-12, T:109612
X:-245, Y:4, Z:-12, T:109732
X:-242, Y:3, Z:-6, T:109852
X:-242, Y:3, Z:-6, T:109968
X:-242, Y:3, Z:-6, T:110098
X:-239, Y:-4, Z:-35, T:110243
X:-239, Y:-4, Z:-35, T:110362
X:-239, Y:-4, Z:-35, T:110484
X:-235, Y:6, Z:-65, T:110606
X:-235, Y:6, Z:-65, T:110722
X:-235, Y:6, Z:-65, T:110840
X:-215, Y:-14, Z:-117, T:110962
X:-215, Y:-14, Z:-117, T:111081
X:-215, Y:-14, Z:-117, T:111204
X:-224, Y:7, Z:-146, T:111324
X:-224, Y:7, Z:-146, T:111441
X:-224, Y:7, Z:-146, T:111561
X:-209, Y:-6, Z:-149, T:111679
X:-209, Y:-6, Z:-149, T:111799
X:-209, Y:-6, Z:-149, T:111919
X:-219, Y:-8, Z:-140, T:112038
X:-219, Y:-8, Z:-140, T:112157
X:-219, Y:-8, Z:-140, T:112274
X:-226, Y:-3, Z:-116, T:112405
X:-226, Y:-3, Z:-116, T:112540
X:-226, Y:-3, Z:-116, T:112669
X:-233, Y:2, Z:-76, T:112792
X:-233, Y:2, Z:-76, T:112909
X:-233, Y:2, Z:-76, T:113028
X:-237, Y:7, Z:-35, T:113148
X:-237, Y:7, Z:-35, T:113266
X:-237, Y:7, Z:-35, T:113386
X:-242, Y:5, Z:-15, T:113504
X:-242, Y:5, Z:-15, T:113624
X:-242, Y:5, Z:-15, T:113764
X:-244, Y:5, Z:-3, T:113884
X:-244, Y:5, Z:-3, T:113999
X:-244, Y:5, Z:-3, T:114118
X:-242, Y:3, Z:-7, T:114239
X:-242, Y:3, Z:-7, T:114357
X:-242, Y:3, Z:-7, T:114473
X:-241, Y:0, Z:-30, T:114595
X:-241, Y:0, Z:-30, T:114720
X:-241, Y:0, Z:-30, T:114867
X:-227, Y:-13, Z:-95, T:114989
X:-227, Y:-13, Z:-95, T:115104
X:-227, Y:-13, Z:-95, T:115224
X:-212, Y:-5, Z:-114, T:115343
X:-212, Y:-5, Z:-114, T:115462
X:-212, Y:-5, Z:-114, T:115579
X:-215, Y:-6, Z:-145, T:115701
X:-215, Y:-6, Z:-145, T:115819
X:-215, Y:-6, Z:-145, T:115937
X:-210, Y:5, Z:-142, T:116059
X:-210, Y:5, Z:-142, T:116176
X:-210, Y:5, Z:-142, T:116296
X:-222, Y:-19, Z:-145, T:116415
X:-222, Y:-19, Z:-145, T:116534
X:-222, Y:-19, Z:-145, T:116655
X:-231, Y:6, Z:-119, T:116775
X:-231, Y:6, Z:-119, T:116894
X:-231, Y:6, Z:-119, T:117023
I don't think there's something ready in matplotlib - the task of peak detection is hard and the notion of peak can vary greatly from application to application.
Since your data is relatively simple, you can try an approach inspired by a Schmitt trigger: look for recent high values but discard small oscillations. The (pseudo) code would be:
y_max = None
for y in data:
if y_max is None: # start tracking
y_max = y
if y > y_max: # update max value
y_max = y
if y < y_max * 0.9: # signal is too different from
add_label(y=y_max) # the peak - save the peak and
y_max = None # start looking for another one

Python 2 re.sub issue

I got this a function that replaces sub-string matches with the match surrounded with HTML tags. This function will consume string in English and Greek mostly.
The function:
def highlight_text(st, kwlist, start_tag=None, end_tag=None):
if start_tag is None:
start_tag = '<span class="nom">'
if end_tag is None:
end_tag = '</span>'
for kw in kwlist:
st = re.sub(r'\b' + kw + r'\b', '{}{}{}'.format(start_tag, kw, end_tag), st)
return st
The testing string is in Greek except the first sub-string [Korais]: st="Korais Ο Αδαμάντιος Κοραής (Σμύρνη, 27 Απριλίου 1748 – Παρίσι, 6 Απριλίου 1833), ήταν Έλληνας φιλόλογος με βαθιά γνώση του ελληνικού πολιτισμού. Ο Κοραής είναι ένας από τους σημαντικότερους εκπροσώπους του νεοελληνικού διαφωτισμού και μνημονεύεται, ανάμεσα σε άλλα, ως πρωτοπόρος στην έκδοση έργων αρχαίας ελληνικής γραμματείας, αλλά και για τις γλωσσικές του απόψεις στην υποστήριξη της καθαρεύουσας, σε μια μετριοπαθή όμως μορφή της με σκοπό την εκκαθάριση των πλείστων ξένων λέξεων που υπήρχαν στη γλώσσα του λαού."
The test code:
kwlist = ['ελληνικού', 'Σμύρνη', 'Αδαμάντιος', 'Korais']
d = highlight_text(st, kwlist, start_tag=None, end_tag=None)
print(d)
When I'm running the code [st is the above string] only sub-strings in English get tagged. Greek substr are ignored. Notice that I run the above block on Python 2.7. When I use Python 3.4 all sub-string get replaced.
Another issue is that when I'm running the above function withing Flask application, it throws me an error: unexpected end of regular expression.
How should I tackle the above issue without using external library if possible?
I'm pulling my hairs off my head two days now.
In Python 2.7, you need to explicitly convert text to Unicode. See the fixed snippet below:
# -*- coding: utf-8 -*-
import re
def highlight_text(st, kwlist, start_tag=None, end_tag=None):
if start_tag is None:
start_tag = '<span class="nom">'
if end_tag is None:
end_tag = '</span>'
for kw in kwlist:
st = re.sub(ur'\b' + kw.decode('utf8') + ur'\b',
u'{}{}{}'.format(start_tag.decode('utf8'), kw.decode('utf8'), end_tag.decode('utf8')),
st.decode('utf8'), 0, re.U).encode("utf8")
return st
st="Korais Ο Αδαμάντιος Κοραής (Σμύρνη, 27 Απριλίου 1748 – Παρίσι, 6 Απριλίου 1833), ήταν Έλληνας φιλόλογος με βαθιά γνώση του ελληνικού πολιτισμού. Ο Κοραής είναι ένας από τους σημαντικότερους εκπροσώπους του νεοελληνικού διαφωτισμού και μνημονεύεται, ανάμεσα σε άλλα, ως πρωτοπόρος στην έκδοση έργων αρχαίας ελληνικής γραμματείας, αλλά και για τις γλωσσικές του απόψεις στην υποστήριξη της καθαρεύουσας, σε μια μετριοπαθή όμως μορφή της με σκοπό την εκκαθάριση των πλείστων ξένων λέξεων που υπήρχαν στη γλώσσα του λαού."
kwlist = ['ελληνικού', 'Σμύρνη', 'Αδαμάντιος', 'Korais']
d = highlight_text(st, kwlist, start_tag=None, end_tag=None)
print(d)
See demo
Note that all literals are declared with u prefix and all variables are decodeed and the re.sub result is encoded back to UTF8.
English get tagged. Greek substr are ignored.
Where does your st come from? Please notice that in Python 2.x 'μορφή' != u'μορφή' Maybe you are comparing str with unicode.
Suggestions: Use unicode everywhere when you can, e.g.:
kwlist = [u'ελληνικού', u'Σμύρνη', u'Αδαμάντιος', u'Korais']

Categories

Resources