I have multiple text files with specific filename format in a directory, I want to concatenate all the content from all the files to a single .csv file and need to make an interactive 3D scatter plot using the specific data columns from the final CSV file. For this, I tried to concatenate the file's data into one. But my output has around 5000 entries instead of five hundred(after the 500 entries, the values repeating itself). Help me to find the error.
[Interactive plot : Able to zoom in / zoom out/ rotate the plot using mouse]
import fnmatch
import pandas as pd
data = pd.DataFrame()
for f_name in os.listdir(os.getcwd()):
if fnmatch.fnmatch(f_name, 'hypoDD.reloc.*'):
print(f_name)
df=pd.read_csv(f_name,header=None,sep="\s+|\t")
data=data.append(df,ignore_index=True)
#print(data)
data.to_csv('outfile.txt',index=False)
OR
I want to make an interactive single 3D scatter plot using specific data columns from each file, and each file's data should be represented by different scatter color. ( I have ~18 different files and I don't even know 18 different colour names!)
Finally, I am able to write the code, even though the figure needs some more modifications like ( Put axis limits, reduce the scatter size, give scatter colour according to each file, Z-axis direction should be downward)
Suggestions?
import os
import glob
mypath = os.getcwd()
file_count = len(glob.glob1(mypath,"hypoDD.reloc.*"))
print("Number of clusters is:" ,file_count)
# Get .txt files
import fnmatch
import pandas as pd
data = pd.DataFrame()
for f_name in os.listdir(os.getcwd()):
if fnmatch.fnmatch(f_name, 'hypoDD.reloc.*'):
print(f_name)
df=pd.read_csv(f_name,header=None,sep="\s+|\t")
data=data.append(df,ignore_index=True)
#print(data)
data.to_csv('outfile.txt',index=False)
latitude=data.iloc[:,1]
longitude=data.iloc[:,2]
depth=data.iloc[:,3]
scatter_data = pd.concat([longitude, latitude,depth], axis=1)
scatter_data.columns=['lon','lat','depth']
#------------------------------3D scatter--------------------------------
#----setting default renderer------------
import plotly.io as pio
pio.rrenderers
pio.renderers.default = "browser"
#-----------------------------------------
import plotly.express as px
fig = px.scatter_3d(scatter_data,x='lon', y='lat', z='depth')
fig.show()
fig.write_image("fig1.jpg")
Related
Context: I've two data frames that i read with pandas from .csv files, one of them (dfevents) has a latitude and longitude fields on it, the other dataframe (dfplacedetails) has multiple points that creates a polygon. I'm usign "intersets" properties to verify when the first data frame cross with the polygon of the other one. That actually works fine, but when I'm triying to plot both of the layers is just not posibble, they plot separete
My code is as follow:
# Libraries
from matplotlib import pyplot as plt
import geopandas as gp
import pandas as pd
# Creating data frames
dfevents = pd.read_csv (r'C:\Users\alan_\Desktop\TAT\Inputs\Get Events\Get_Events.csv')
print(dfevents)
dfplacedetails = pd.read_csv (r'C:\Users\alan_\Desktop\TAT\Inputs\Get Place Details\Get_Place_Details.csv')
print(dfplacedetails)
# Make them proper Geometrys
dfevents['point'] = gp.GeoSeries.from_xy(dfevents.longitude, dfevents.latitude)
dfplacedetails['polygon'] = gp.GeoSeries.from_wkt('POLYGON' + dfplacedetails.polygon)
# Make them GeoDataFrames
dfevents = gp.GeoDataFrame(dfevents, geometry='point')
dfplacedetails = gp.GeoDataFrame(dfplacedetails, geometry='polygon')
# Output (It works fine)
dfout = dfevents.intersects(dfplacedetails)
print(dfout)
# Plot
fig, ax =plt.subplots(figsize =(20,10))
dfplacedetails.plot(ax=ax, color='blue')
dfevents.plot(ax=ax, color='red',markersize=10)
ax.set_axis_on()
The result that i got when I plot as I described up in my code is as follow:
But when I plot separate both of the layers plot fine:
Is there any way to plot both of them in the same image?
Thanks for you help!
By the way i'm using Visual Studio Code
I have 5 csv files that I am trying to put into one graph in python. In the first column of each csv file, all of the numbers are the same, and I want to treat these as the x values for each csv file in the graph. However, there are two more columns in each csv file (to make 3 columns total), but I just want to graph the second column as the 'y-values' for each csv file on the same graph, and ideally get 5 different lines, one for each file. Does anyone have any ideas on how I could do this?
I have already uploaded my files to the variable file_list
Read the first file and create a list of lists in which each list filled by two columns of this file. Then read the other files one by one and append y column of them to the correspond index of this list.
You can simply call plot more than once. Assuming you from matplotlib.pyplot import plot, You can repeat the same x values, or have different ones and it will still work. Here is an example:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
files = list(Path("/path/to/folder/with/csvs").glob("*.csv"))
fig, ax = plt.subplots(figsize=(10, 10))
x_col, y_col = "x_column_name", "y_column_name"
for file in files:
file_name = file.stem
df = pd.read_csv(file)
df.plot(x=x_col, y=y_col, ax=ax, label=file_name, legend=True)
fig # If using a jupyter notebook, and you've run a cell with %matplotlib inline
Assuming your files are named File0.csv, File1.csv, File2.csv, File3.csv, File4.csv, you can loop over them, ignore the third column values and plot the x and y values. The following pseudo code will work for 3 columns
import numpy as np
import matplotlib.pyplot as plt
for i in range(5):
x, y, _ = np.loadtxt('File%s.csv' %i, unpack=True)
plt.plot(x, y, label='File %s' %i)
plt.legend()
plt.show()
I want to read csv files from a directory and plot them and be able to click the arrow button to step through a plot and look at a different plot. I want to specify which column and be able to title it as well as I have in the code below as well.
I am able to read the csv file and plot a single plot with specific columns but I am not sure how to do it with multiple. I've tried glob but it didn't work, I do not want to concatenate them to a single csv file. I have provided my code below. Any help would be appreciated. Thank you.
import pandas as pd
import matplotlib.pyplot as plt
cols_in = [1, 3]
col_name = ['Time (s), Band (mb)']
df = pd.read_csv("/user/Desktop/TestNum1.csv", usecols = cols_in, names =
col_name, header = None)
fig, ax = plt.subplots()
my_scatter_plot = ax.scatter(df["Time (s)"], df["Band (mb)"])
ax.set_xlabel("Time (s)")
ax.set_ylabel("Band (mb)")
ax.set_title("TestNum1")
plt.show()
You just need to add a for loop over all the files and use glob to collect them.
For example,
import pandas as pd
import matplotlib.pyplot as plt
import glob
cols_in = [1, 3]
col_name = ['Time (s), Band (mb)']
# Select all CSV files on Desktop
files = glob.glob("/user/Desktop/*.csv")
for file in files:
df = pd.read_csv(file, usecols = cols_in, names =
col_name, header = None)
fig, ax = plt.subplots()
my_scatter_plot = ax.scatter(df["Time (s)"], df["Band (mb)"])
ax.set_xlabel("Time (s)")
ax.set_ylabel("Band (mb)")
ax.set_title("TestNum1")
plt.show()
Keeping plt.show() inside the for loop will ensure each plot is plotted. It should be pretty easy to search for 'How to add a title to a plot in python' for answers to your other questions.
I wrote a python script to read in a distance matrix that was provided via a CSV text file. This distance matrix shows the difference between different animal species, and I'm trying to sort them in different ways(diet, family, genus, etc.) using data from another CSV file that just has one row of ordering information. Code used is here:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as mp
dietCols = pd.read_csv("label_diet.txt", header=None)
df = pd.read_csv("distance_matrix.txt", header=None)
ax = sns.heatmap(df)
fig = ax.get_figure()
fig.savefig("fig1.png")
mp.clf()
dfDiet = pd.read_csv("distance_matrix.txt", header=None, names=dietCols)
ax2 = sns.heatmap(dfDiet, linewidths=0)
fig2 = ax2.get_figure()
fig2.savefig("fig2.png")
mp.clf()
When plotting the distance matrix, the original graph looks like this:
However, when the additional naming information is read from the text file, the graph produced only has one column and looks like this:
You can see the matrix data is being used as row labeling, and I'm not sure why that would be. Some of the rows provided have no values so they're listed as "NaN", so I'm not sure if that would be causing a problem. Is there any easy way to order this distance matrix using an exterior file? Any help would be appreciated!
I haven't had much training with Matplotlib at all, and this really seems like a basic plotting application, but I'm getting nothing but errors.
Using Python 3, I'm simply trying to plot historical stock price data from a CSV file, using the date as the x axis and prices as the y. The data CSV looks like this:
(only just now noticing to big gap in times, but whatever)
import glob
import pandas as pd
import matplotlib.pyplot as plt
def plot_test():
files = glob.glob('./data/test/*.csv')
for file in files:
df = pd.read_csv(file, header=1, delimiter=',', index_col=1)
df['close'].plot()
plt.show()
plot_test()
I'm using glob for now just to identify any CSV file in that folder, but I've also tried just designating one specific CSV filename and get the same error:
KeyError: 'close'
I've also tried just designating a specific column number to only plot one particular column instead, but I don't know what's going on.
Ideally, I would like to plot it just like real stock data, where everything is on the same graph, volume at the bottom on it's own axis, open high low close on the y axis, and date on the x axis for every row in the file. I've tried a few different solutions but can't seem to figure it out. I know this has probably been asked before but I've tried lots of different solutions from SO and others but mine seems to be hanging up on me. Thanks so much for the newbie help!
Here on pandas documentation you can find that the header kwarg should be 0 for your csv, as the first row contains the column names. What is happening is that the DataFrame you are building doesn't have the column close, as it is taking the headers from the "second" row. It will probably work fine if you take the header kwarg or change it to header=0. It is the same with the other kwargs, no need to define them. A simple df = pd.read_csv(file) will do just fine.
You can prettify this according to your needs
import pandas
import matplotlib.pyplot as plt
def plot_test(file):
df = pandas.read_csv(file)
# convert timestamp
df['timestamp'] = pandas.to_datetime(df['timestamp'], format = '%Y-%m-%d %H:%M')
# plot prices
ax1 = plt.subplot(211)
ax1.plot_date(df['timestamp'], df['open'], '-', label = 'open')
ax1.plot_date(df['timestamp'], df['close'], '-', label = 'close')
ax1.plot_date(df['timestamp'], df['high'], '-', label = 'high')
ax1.plot_date(df['timestamp'], df['low'], '-', label = 'low')
ax1.legend()
# plot volume
ax2 = plt.subplot(212)
# issue: https://github.com/matplotlib/matplotlib/issues/9610
df.set_index('timestamp', inplace = True)
df.index.to_pydatetime()
ax2.bar(df.index, df['volume'], width = 1e-3)
ax2.xaxis_date()
plt.show()