regex for data preparation and processing afterwards in python - python
I have a quiet big file of data, which is not in a really good state for further processing. So I want to regex the best out of it and process this data in pandas for further data analysis.
The Data-Information segment repeats itself within the file and contains the necessary information.
My approach so far for the regex was to get some header information out of it. What I'm missing right now, is all three sections of data points. I only need the header from Points to the last data point. How could I grep these sections into multiple or one group?
^(?:Data-Information.*)
(?:\nName:\t+)(?P<Name>.+)
(?:\nSample:\t+)(?P<Sample>.+)
((?:\r?\n.+)+)
(?:\nSystem:\t+)(?P<System>.+)
(?:\r?\n(?!Data-Information).*)*
Sample file
Data-Information
Name: Polymer A
Sample: Sunday till Monday
User: SUD
Count Segments: 5
Application: RHEOSTAR
Tool: CP
Date/Time: 24.10.2021; 13:37
System: CP25
Constants:
- Csr [min/s]: 2,5421
- Css [Pa/mNm]: 2,54679
Section: 1
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 30 s
Measurement profile:
Temperature T[-1] = 25 °C
Section: 2
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 62 10,93 100 1.090 4,45 TGC,Dy_
2 64 11,05 100 1.100 4,5 TGC,Dy_
3 66 11,07 100 1.110 4,51 TGC,Dy_
4 68 11,05 100 1.100 4,5 TGC,Dy_
5 70 10,99 100 1.100 4,47 TGC,Dy_
6 72 10,92 100 1.090 4,44 TGC,Dy_
Section: 3
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 60 s
Section: 4
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
*** 1 *** 242 -6,334E+6 -0,0000115 72,7 0,296 TGC,Dy_
2 244 63,94 10,3 661 2,69 TGC,Dy_
3 246 35,56 20,7 736 2,99 TGC,Dy_
4 248 25,25 31 784 3,19 TGC,Dy_
5 250 19,82 41,4 820 3,34 TGC,Dy_
Section: 5
Number measuring points: 300
Time limit: 300 measuring points
Duration 1 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 301 4,142 300 1.240 5,06 TGC,Dy_
2 302 4,139 300 1.240 5,05 TGC,Dy_
3 303 4,138 300 1.240 5,05 TGC,Dy_
4 304 4,141 300 1.240 5,06 TGC,Dy_
5 305 4,156 300 1.250 5,07 TGC,Dy_
6 306 4,153 300 1.250 5,07 TGC,Dy_
Data-Information
Name: Polymer B
Sample: Monday till Tuesday
User: SUD
Count Segments: 5
Application: RHEOSTAR
Tool: CP
Date/Time: 24.10.2021; 13:37
System: CP25
Constants:
- Csr [min/s]: 2,5421
- Css [Pa/mNm]: 2,54679
Section: 1
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 30 s
Measurement profile:
Temperature T[-1] = 25 °C
Section: 2
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 62 10,93 100 1.090 4,45 TGC,Dy_
2 64 11,05 100 1.100 4,5 TGC,Dy_
3 66 11,07 100 1.110 4,51 TGC,Dy_
4 68 11,05 100 1.100 4,5 TGC,Dy_
5 70 10,99 100 1.100 4,47 TGC,Dy_
6 72 10,92 100 1.090 4,44 TGC,Dy_
Section: 3
Number measuring points: 0
Time limit: 2 measuring points, drop
Duration 60 s
Section: 4
Number measuring points: 30
Time limit: 30 measuring points
Duration 2 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
*** 1 *** 242 -6,334E+6 -0,0000115 72,7 0,296 TGC,Dy_
2 244 63,94 10,3 661 2,69 TGC,Dy_
3 246 35,56 20,7 736 2,99 TGC,Dy_
4 248 25,25 31 784 3,19 TGC,Dy_
5 250 19,82 41,4 820 3,34 TGC,Dy_
Section: 5
Number measuring points: 300
Time limit: 300 measuring points
Duration 1 s
Points Time Viscosity Shear rate Shear stress Momentum Status
[s] [Pa·s] [1/s] [Pa] [mNm] []
1 301 4,142 300 1.240 5,06 TGC,Dy_
2 302 4,139 300 1.240 5,05 TGC,Dy_
3 303 4,138 300 1.240 5,05 TGC,Dy_
4 304 4,141 300 1.240 5,06 TGC,Dy_
5 305 4,156 300 1.250 5,07 TGC,Dy_
6 306 4,153 300 1.250 5,07 TGC,Dy_
One option is to do it in 2 steps.
First get all the Data-Information parts using a pattern that starts with Data-Information and matches all following lines that do not start with Data-Information.
^Data-Information(?:\n(?!Data-Information$).*)*
Regex demo for Data-Information
The for every part, you can match the line that start with Points, and then match all following lines that contain at least a character (no empty lines)
^Points\b.*(?:\n.+)+
Regex demo for Points
Related
How to calculate the distance between pedestrians and vehicles in each frame? [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 1 year ago. Improve this question I have data from a drone. The first table has the pedestrians' data in each frame. The pedestrians' data has the pedestrian Id, frame, x_est, y_est, v_abs. The second table is the vehicles' data. The vehicles' data has the vehicle Id, frame, x_est, y_est, vel_est. For example, in frame number 1, I have 39 pedestrians and two vehicles. I want to create a new table that has the following information: the first column is the distance between each pedestrian and every vehicle in each frame. For example, I have 39 pedestrians and 2 vehicles: d1 = sqrt((ped1_x - veh1_x)^2 + (ped1_y - veh1_y)^2) d2 = sqrt((ped1_x - veh2_x)^2 + (ped1_y - veh2_y)^2) d3 = sqrt((ped2_x - veh1_x)^2 + (ped2_y - veh1_y)^2) d4 = sqrt((ped2_x - veh2_x)^2 + (ped2_y - veh2_y)^2) and so on the second and third columns are the associated speed of the pedestrian and vehicles. For example, if i get d1 then I have to include the speed of ped1 and speed of veh1. If i get d2, then I have to include the speed of ped1 and speed of veh2 and so on. I have 116 frames. I want to write code in python or Matlab to do these tasks. I tried python with the following code but didn't work because the vehicles' data has 182 rows and the pedestrians' data has 3950: if peds["frame"] == vehs["frame"]: distance = math.sqrt(((peds.x_est-vehs.x_est)**2)+((peds.y_est - vehs.y_est))**2) I'm thinking to add a for loop to loop over each frame. How to modify the code to loop over each frame and calculate the distances between these objects? This is an example of the data. Pedestrians data: id 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 frame 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x_est 15.31251243 14.62957291 14.81940554 16.261254 14.25065235 13.71913744 12.77037156 11.82149333 11.02452266 10.4550769 9.962500442 10.56845947 11.17672903 11.70758495 14.28816743 13.41605196 11.6316746 11.4797624 13.3395301 10.98651306 11.89763531 12.46714593 13.90898049 11.17611058 9.20275126 11.10086732 11.66968305 12.88437259 13.37708455 14.17485782 14.81943565 15.65388549 17.7046208 18.76703333 19.22188971 19.03219028 19.63930168 19.26095788 20.54963754 21.87812196 y_est 24.04146967 23.85122822 22.59973819 21.46111998 22.25845431 22.52372129 22.37086056 22.82695733 22.7892778 22.78941678 21.42243054 21.49884121 21.46045752 21.53683577 19.86642817 19.75351002 19.29791125 17.24875682 16.56578255 14.97104209 10.11358571 9.733765266 9.165258769 8.102246321 8.216836276 6.659910277 6.774266283 2.865368769 3.05553263 4.459266668 4.193704362 4.420605884 4.26981189 3.547987191 4.042100815 4.876447865 5.294238544 6.090777216 3.966160063 4.762697865 v_abs 2.459007157 2.654334571 3.315403455 3.389573803 3.378566929 3.045539512 3.23011785 2.925099475 2.584998721 2.901642056 2.811892448 2.151019342 1.96347414 1.500567927 3.540985451 2.709992115 3.267972565 3.395063149 2.721779676 4.012212099 0.880854234 0.813933137 0.704372621 0.912089788 0.549592663 3.007799428 1.978963898 3.44757396 3.15737162 3.529382782 3.5556166 3.409764593 1.170765247 1.580709745 1.085228781 0.922279132 0.802698916 1.875894301 0.804975425 1.205954878 Vehicles data: id 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 frame 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 x_est 13.12181538 20.79676544 13.01830532 20.73447182 12.92056089 20.65301595 12.77984823 20.52422202 12.67147093 20.45207468 12.56504216 20.34050999 12.45704283 20.23392863 12.35122477 20.13000728 12.24825411 20.03070371 12.15111691 19.9359101 12.04685347 19.8497257 11.93870377 19.7593045 11.82507185 19.66695714 11.70785362 19.57813133 11.60090539 19.48509113 11.48174963 19.39305883 11.36790853 19.29876475 11.26066404 19.21938896 11.14538066 19.12658484 11.02149472 19.02832259 y_est 12.48631346 12.04945225 12.50757522 12.09243053 12.53150136 12.14119099 12.56295128 12.20178773 12.59034906 12.22451609 12.60945482 12.2800529 12.61885415 12.31947563 12.62842376 12.35799492 12.64016339 12.40403046 12.66696466 12.44343686 12.67398026 12.4828893 12.69217314 12.52761456 12.70404418 12.56747969 12.71201387 12.60893079 12.7352435 12.65130145 12.75948092 12.68935761 12.77039761 12.72362537 12.78182668 12.76864139 12.80477996 12.80719663 12.82141401 12.8529026 vel_est 2.607251494 2.480379041 2.607160714 2.479543211 2.60660445 2.478939563 2.60984436 2.482517229 2.610124888 2.479412431 2.610034708 2.481993315 2.609925021 2.483303075 2.609504867 2.484305008 2.608686875 2.485197239 2.607293512 2.485028991 2.606477634 2.483509058 2.60665034 2.483007857 2.607657856 2.482442343 2.609280114 2.481348408 2.609374438 2.481088229 2.612094232 2.480242567 2.613246915 2.479486909 2.612946522 2.476581726 2.614972124 2.475903808 2.618889831 2.47716006 Thanks in advance
Just think through the problem in words. For each pedestrian entry, for each car entry, if the frame number matches, compute the distance between them, and add a new row. Nothing to it. ped_id = "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39" ped_frame = "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1" ped_x_est = "15.31251243 14.62957291 14.81940554 16.261254 14.25065235 13.71913744 12.77037156 11.82149333 11.02452266 10.4550769 9.962500442 10.56845947 11.17672903 11.70758495 14.28816743 13.41605196 11.6316746 11.4797624 13.3395301 10.98651306 11.89763531 12.46714593 13.90898049 11.17611058 9.20275126 11.10086732 11.66968305 12.88437259 13.37708455 14.17485782 14.81943565 15.65388549 17.7046208 18.76703333 19.22188971 19.03219028 19.63930168 19.26095788 20.54963754 21.87812196" ped_y_est = "24.04146967 23.85122822 22.59973819 21.46111998 22.25845431 22.52372129 22.37086056 22.82695733 22.7892778 22.78941678 21.42243054 21.49884121 21.46045752 21.53683577 19.86642817 19.75351002 19.29791125 17.24875682 16.56578255 14.97104209 10.11358571 9.733765266 9.165258769 8.102246321 8.216836276 6.659910277 6.774266283 2.865368769 3.05553263 4.459266668 4.193704362 4.420605884 4.26981189 3.547987191 4.042100815 4.876447865 5.294238544 6.090777216 3.966160063 4.762697865" ped_v_abs = "2.459007157 2.654334571 3.315403455 3.389573803 3.378566929 3.045539512 3.23011785 2.925099475 2.584998721 2.901642056 2.811892448 2.151019342 1.96347414 1.500567927 3.540985451 2.709992115 3.267972565 3.395063149 2.721779676 4.012212099 0.880854234 0.813933137 0.704372621 0.912089788 0.549592663 3.007799428 1.978963898 3.44757396 3.15737162 3.529382782 3.5556166 3.409764593 1.170765247 1.580709745 1.085228781 0.922279132 0.802698916 1.875894301 0.804975425 1.205954878" veh_id = "900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901 900 901" veh_frame = "1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20" veh_x_est = "13.12181538 20.79676544 13.01830532 20.73447182 12.92056089 20.65301595 12.77984823 20.52422202 12.67147093 20.45207468 12.56504216 20.34050999 12.45704283 20.23392863 12.35122477 20.13000728 12.24825411 20.03070371 12.15111691 19.9359101 12.04685347 19.8497257 11.93870377 19.7593045 11.82507185 19.66695714 11.70785362 19.57813133 11.60090539 19.48509113 11.48174963 19.39305883 11.36790853 19.29876475 11.26066404 19.21938896 11.14538066 19.12658484 11.02149472 19.02832259" veh_y_est = "12.48631346 12.04945225 12.50757522 12.09243053 12.53150136 12.14119099 12.56295128 12.20178773 12.59034906 12.22451609 12.60945482 12.2800529 12.61885415 12.31947563 12.62842376 12.35799492 12.64016339 12.40403046 12.66696466 12.44343686 12.67398026 12.4828893 12.69217314 12.52761456 12.70404418 12.56747969 12.71201387 12.60893079 12.7352435 12.65130145 12.75948092 12.68935761 12.77039761 12.72362537 12.78182668 12.76864139 12.80477996 12.80719663 12.82141401 12.8529026" veh_vel_est = "2.607251494 2.480379041 2.607160714 2.479543211 2.60660445 2.478939563 2.60984436 2.482517229 2.610124888 2.479412431 2.610034708 2.481993315 2.609925021 2.483303075 2.609504867 2.484305008 2.608686875 2.485197239 2.607293512 2.485028991 2.606477634 2.483509058 2.60665034 2.483007857 2.607657856 2.482442343 2.609280114 2.481348408 2.609374438 2.481088229 2.612094232 2.480242567 2.613246915 2.479486909 2.612946522 2.476581726 2.614972124 2.475903808 2.618889831 2.47716006" import math import pandas as pd def convert(s,cvt): return [cvt(k) for k in s.split()] peds = list(zip( convert(ped_id,int), convert(ped_frame,int), convert(ped_x_est,float), convert(ped_y_est,float), convert(ped_v_abs,float) )) cars = list(zip( convert(veh_id,int), convert(veh_frame,int), convert(veh_x_est,float), convert(veh_y_est,float), convert(veh_vel_est,float) )) newdata = [] for ped in peds: for car in cars: if ped[1] != car[1]: continue dist = math.sqrt((ped[2]-car[2])**2 + (ped[3]-car[3])**2) newdata.append( (ped[0], car[0], dist, ped[4], car[4]) ) df = pd.DataFrame( newdata, columns=("ped id", "car id", "distance", "ped vel", "car vel")) print(df) Output: ped id car id distance ped vel car vel 0 0 900 11.760986 2.459007 2.607251 1 0 901 13.186566 2.459007 2.480379 2 1 900 11.464494 2.654335 2.607251 3 1 901 13.316012 2.654335 2.480379 4 2 900 10.254910 3.315403 2.607251 .. ... ... ... ... ... 75 37 901 6.153415 1.875894 2.480379 76 38 900 11.303343 0.804975 2.607251 77 38 901 8.087069 0.804975 2.480379 78 39 900 11.675921 1.205955 2.607251 79 39 901 7.366554 1.205955 2.480379 [80 rows x 5 columns]
Pandas GroupBy with special sum
Lets say I have data like that and I want to group them in terms of feature and type. feature type size Alabama 1 100 Alabama 2 50 Alabama 3 40 Wyoming 1 180 Wyoming 2 150 Wyoming 3 56 When I apply df=df.groupby(['feature','type']).sum()[['size']], I get this as expected. size (Alabama,1) 100 (Alabama,2) 50 (Alabama,3) 40 (Wyoming,1) 180 (Wyoming,2) 150 (Wyoming,3) 56 However I want to sum sizes with only the same type not both type and feature.While doing this I want to keep indexes as (feature,type) tuple. I mean I want to get something like this, size (Alabama,1) 280 (Alabama,2) 200 (Alabama,3) 96 (Wyoming,1) 280 (Wyoming,2) 200 (Wyoming,3) 96 I am stuck trying to find a way to do this. I need some help thanks
Use set_index for MultiIndex and then transform with sum for return same length Series by aggregate function: df = df.set_index(['feature','type']) df['size'] = df.groupby(['type'])['size'].transform('sum') print (df) size feature type Alabama 1 280 2 200 3 96 Wyoming 1 280 2 200 3 96 EDIT: First aggregate both columns and then use transform df = df.groupby(['feature','type']).sum() df['size'] = df.groupby(['type'])['size'].transform('sum') print (df) size feature type Alabama 1 280 2 200 3 96 Wyoming 1 280 2 200 3 96
Here is one way: df['size'] = df['type'].map(df.groupby('type')['size'].sum()) df.groupby(['feature', 'type'])['size_type'].sum() # feature type # Alabama 1 280 # 2 200 # 3 96 # Wyoming 1 280 # 2 200 # 3 96 # Name: size_type, dtype: int64
Graphically displaying BLAST alignments from local source
I have an issue that I am trying to work through. I have a large dataset of about 25,000 genes that seem to the product of domain shuffling or gene fusions. I would like to view these alignments in pdf format based on BLAST outfmt 6 output. I have BLAST output files for each of these genes with 1 sequence (the recombinogenic gene) and a varying number of subject genes with the following columns: qseqid sseqid evalue qstart qend qlen sstart send slen length I was hoping to parse the files through some code to produce images like the attached file, using the following example blast output file: Cluster_1___Hsap10003 Cluster_2___Hsap00200 1e-30 5 100 300 10 105 240 95 Cluster_1___Hsap10003 Cluster_2___Hsap00200 1e-10 200 230 300 205 235 30 95 Cluster_1___Hsap10003 Cluster_3___Aver00900 1e-20 5 100 300 10 105 125 100 Cluster_1___Hsap10003 Cluster_3___Atha00809 1e-20 5 110 300 5 115 120 105 Cluster_1___Hsap10003 Cluster_4___Ecol00002 1e-10 70 170 300 205 235 30 95 Cluster_1___Hsap10003 Cluster_4___Ecol00003 1e-30 75 175 300 10 105 240 95 Cluster_1___Hsap10003 Cluster_4___Sfle00009 1e-10 80 180 300 205 235 30 95 Cluster_1___Hsap10003 Cluster_5___Spom00010 1e-30 160 260 300 10 105 240 95 Cluster_1___Hsap10003 Cluster_5___Scer01566 1e-10 170 270 300 205 235 30 95 Cluster_1___Hsap10003 Cluster_5___Afla00888 1e-30 175 275 300 10 105 240 95 I am looking for the query sequence to be a thick coloured bar, and the alignment section of each subject to be thick colourful bars with thin black lines showing the rest of the gene length (one subject per line showing all alignment sections against the query). Does anyone know any software or know of any github code that may do something like this? Thanks so much!
Filtering records in Pandas python - syntax error
I have a pandas data frame that looks like this: duration distance speed hincome fi_cost type 0 359 1601 4 3 40.00 cycling 1 625 3440 6 3 86.00 cycling 2 827 4096 5 3 102.00 cycling 3 1144 5704 5 2 143.00 cycling If I use the following I export a new csv that pulls only those records with a distance less than 5000. distance_1 = all_results[all_results.distance < 5000] distance_1.to_csv('./distance_1.csv',",") Now, I wish to export a csv with values from 5001 to 10000. I can't seem to get the syntax right... distance_2 = all_results[10000 > all_results.distance < 5001] distance_2.to_csv('./distance_2.csv',",")
Unfortunately because of how Python chained comparisons work, we can't use the 50 < x < 100 syntax when x is some vectorlike quantity. You have several options. You could create two boolean Series and use & to combine them: >>> all_results[(all_results.distance > 3000) & (all_results.distance < 5000)] duration distance speed hincome fi_cost type 1 625 3440 6 3 86 cycling 2 827 4096 5 3 102 cycling Use between to create a boolean Series and then use that to index (note that it's inclusive by default, though): >>> all_results[all_results.distance.between(3000, 5000)] # inclusive by default duration distance speed hincome fi_cost type 1 625 3440 6 3 86 cycling 2 827 4096 5 3 102 cycling Or finally you could use .query: >>> all_results.query("3000 < distance < 5000") duration distance speed hincome fi_cost type 1 625 3440 6 3 86 cycling 2 827 4096 5 3 102 cycling
5001 < all_results.distance < 10000
Python barbs wrong direction
There is probably a really simple answer to this and I'm only asking as a last resort as I usually get my answers by searching but I can't figure this out or find an answer. Basically I'm plotting some wind barbs in Python but they are pointing in the wrong direction and I don't know why. Data is imported from a file and put into lists, I found on another stackoverflow post how to set the U, V for barbs using np.sin and np.cos, which results in the correct wind speed but the direction is wrong. I'm basically plotting a very simple tephigram or Skew-T. # Program to read in radiosonde data from a file named "raob.dat" # Import numpy since we are going to use numpy arrays and the loadtxt # function. import numpy as np import matplotlib.pyplot as plt # Open the file for reading and store the file handle as "f" # The filename is 'raob.dat' f=open('data.dat') # Read the data from the file handle f. np.loadtxt() is useful for reading # simply-formatted text files. datain=np.loadtxt(f) # Close the file. f.close(); # We can copy the different columns into # pressure, temperature and dewpoint temperature # Note that the colon means consider all elements in that dimension. # and remember indices start from zero p=datain[:,0] temp=datain[:,1] temp_dew=datain[:,2] wind_dir=datain[:,3] wind_spd=datain[:,4] print 'Pressure/hPa: ', p print 'Temperature/C: ', temp print 'Dewpoint temperature: ', temp_dew print 'Wind Direction/Deg: ', wind_dir print 'Wind Speed/kts: ', wind_spd # for the barb vectors. This is the bit I think it causing the problem u=wind_spd*np.sin(wind_dir) v=wind_spd*np.cos(wind_dir) #change units #p=p/10 #temp=temp/10 #temp_dew=temp_dew/10 #plot graphs fig1=plt.figure() x1=temp x2=temp_dew y1=p y2=p x=np.linspace(50,50,len(y1)) #print x plt.plot(x1,y1,'r',label='Temp') plt.plot(x2,y2,'g',label='Dew Point Temp') plt.legend(loc=3,fontsize='x-small') plt.gca().invert_yaxis() #fig2=plt.figure() plt.barbs(x,y1,u,v) plt.yticks(y1) plt.grid(axis='y') plt.show() The barbs should all mostly be in the same direction as you can see in the direction in degrees from the data. Any help is appreciated. Thank you. Here is the data that is used: 996 25.2 24.9 290 12 963.2 24.5 22.6 315 42 930.4 23.8 20.1 325 43 929 23.8 20 325 43 925 23.4 19.6 325 43 900 22 17 325 43 898.6 21.9 17 325 43 867.6 20.1 16.5 320 41 850 19 16.2 320 44 807.9 16.8 14 320 43 779.4 15.2 12.4 320 44 752 13.7 10.9 325 43 725.5 12.2 9.3 320 44 700 10.6 7.8 325 45 649.7 7 4.9 315 44 603.2 3.4 1.9 325 49 563 0 -0.8 325 50 559.6 -0.2 -1 325 50 500 -3.5 -4.9 335 52 499.3 -3.5 -5 330 54 491 -4.1 -5.5 332 52 480.3 -5 -6.4 335 50 427.2 -9.7 -11 330 45 413 -11.1 -12.3 335 43 400 -12.7 -14.4 340 42 363.9 -16.9 -19.2 350 37 300 -26.3 -30.2 325 40 250 -36.7 -41.2 330 35 200 -49.9 0 335 0 150 -66.6 0 0 10 100 -83.5 0 0 30 Liam
# for the barb vectors. This is the bit I think it causing the problem u=wind_spd*np.sin(wind_dir) v=wind_spd*np.cos(wind_dir) Instead try: u=wind_spd*np.sin((np.pi/180)*wind_dir) v=wind_spd*np.cos((np.pi/180)*wind_dir) (http://tornado.sfsu.edu/geosciences/classes/m430/Wind/WindDirection.html)