I have the following string in Python (got many of those):
date = "00:01:43"
which represents the hour::minute::seconds. This comes from reading a csv file which contains many of those date.
Now I need to construct those I am reading from csv to some sort of array and then use it for a bar plot (matloblib.bar) as the x-axis
The question is how do I prepare the dates I am reading to be used in a bar plot:
with open('file.csv','r')
for line in file:
time = line.split(',')[0] ## this is read like "HH:MM:SS"
temp = line.split(',')[1] ## this is read like "Float as a string"
tempArray.append(float(temp))
QUESTION
How do I assembly the time into an array to then be used in the following:
plt.bar(timeArray, tempArray)
where the x-axis would still show "HH:MM:SS" format.
Since it looks like you have create a list for temp data, then you can just create another list for time data.
Then use ax.set_xticks({temp_data_list}, {time_data_list}) to group them together.
ax.set_xticks({temp_data_list}, {time_data_list})
Related
I think there is something wrong with the data in my dataframe, but I am having a hard time coming to a conclusion. I think there might be some missing datetime values, which is the index of the dataframe. Given that there are over 1000 rows, it isn't possible for me to check each row manually. Here is a picture of my data and the corresponding line plt. Clearly this isn't a line plot!
Is there any way to supplement the possible missing values in my dataframe somehow?
I also did a line plot in seaborne as well to get another perspective, but I don't think it was immediately helpful.
You have effectively done same as I have simulated. Really you have a multi-index date and age_group. plotting both together means line jumps between the two. Separate them out and plot as separate lines and it is as you expect.
d = pd.date_range("1-jan-2020", "16-mar-2021")
df = pd.concat([pd.DataFrame({"daily_percent":np.sort(np.random.uniform(0.5,1, len(d)))}, index=d).assign(age_group="0-9 Years"),
pd.DataFrame({"daily_percent":np.sort(np.random.uniform(0,0.5, len(d)))}, index=d).assign(age_group="20-29 Years")])
df.plot(kind="line", y="daily_percent", color="red")
df.set_index("age_group", append=True).unstack(1).droplevel(0, axis=1).plot(kind="line", color=["red","blue"])
I get ValueError: microsecond must be in 0..999999 when I try to plot two series using scatter plot.
I have two datasets(contains the posts made on a platform with the time they where created and number of comments each post receiced) the goal here is to understand what time if a post was created it will likely create a big number of comments.
hn_ask_sorted_data = hn_ask_data.sort_values(by = ['num_comments'],ascending=False)
hn_show_sorted_data = hn_show_data.sort_values(by = ['num_comments'],ascending=False)
hn_ask_sorted_data['created_at'] = pd.to_datetime(hn_ask_sorted_data['created_at'])
hn_show_sorted_data['created_at'] = pd.to_datetime(hn_show_sorted_data['created_at'])
I convert the column that contains time into a datetime object, but I am more interested with the time component of the object hence I only take the time component using the .dt.time
hn_ask_sorted_data['created_at'] = hn_ask_sorted_data['created_at'].dt.time
hn_show_sorted_data['created_at'] = hn_show_sorted_data['created_at'].dt.time
Then I make a scatterplot using two columns one containing number of comments on the post and the time during which the post was posted (ie the above created column) instead of getting the results I get the described error.
plt.scatter(hn_ask_sorted_data['created_at'],hn_ask_sorted_data['num_comments'])
plt.show()
plt.scatter(hn_show_sorted_data['created_at'],hn_show_sorted_data['num_comments'])
plt.show()
I want to make a loop to create a plot for each corresponding column in 2 different csv files such that column 1 in csv A and column 1 in csv B are plotted together with the same timestamp (pulled from another csv). I do not think I will have trouble when I modify my code to create the loop, but I have to get matplotlib to work for the first column before trying to construct a loop.
I have already tried checking to make sure that the correct data is being passed into the function and that is in the correct order. For example, I printed the zipped array as a list (t_array, b_array) and checked my csv files to verify that the data was in the correct order. I have also tried modifying the axes, ticks, and zoom to no avail. I have tried checking the helper functions which I lifted from my other projects and they all work as expected.
def double_plot():
before = read_file(r_before)
after = read_file(r_after)
time = read_file(timestamp)
if len(before) == len(after):
b_array = np.asarray(before[1])
a_array = np.asarray(after[1])
t_array = np.asarray(time[1])
plt.plot(t_array, b_array)
plt.plot(t_array, a_array)
plt.show()
else:
print(len(before))
print(len(after))
print("dimension failure")
read_file() is a helper function that reads csv files and saves the columns to dictionaries with the first column key indexed by key
"1" and so on down the columns. I know I should probably change it to index with 0 first, but this is a problem for later...
Images showing what I want the code to do and what it is doing
What I would like
What my code is actually doing
Thank you for your time. This is my first time posting so I apologize if something I did was incorrect. I did attempt to find the answer before posting.
Edits: data sample; read_file()
screenshot of excel
def read_file(read_file):
data = {}
with open(read_file, 'r') as f:
reader = csv.reader(f)
for row in reader:
col_num = 0
for col in row:
col_num += 1
if col_num in data:
data[col_num].append(col)
else:
ls = col
ls = [ls]
data[col_num] = ls
return data
edit again: ^ its much better to use pandas but I am leaving this here because its funny after seeing it done with dataframes
The arrays I was using with the plot function contained strings rather than floats.
These links explain the problem along with multiple ways to fix it:
Matplotlib y axis values are not ordered
In Python, how do I convert all of the items in a list to floats?
I am trying to extract multiple columns of data from two different text files. I am going to loop through those columns of data with additional code. How do I extract it, and format the data correctly so I can use it. There are probably 20 columns in one text file, and 15 columns in the other text file.
I have tried extract the data using genfromtext, but I get a weird format and mapping it doesn't help. I also can't use the extracted data in any additional loops or functions.
This is the code I was trying to use:
data = np.genfromtxt("Basecol_Basic_New_1.txt", unpack=True);
J_i2=data[0];
J_f2=data[1];
kH2=data[5:, :]
data = np.genfromtxt("Lamda_HeHCL.txt", unpack=True);
J_i1=data[1];
J_f1=data[2];
kHe=data[7:, :]
I also tried using this to format correctly, but it kept resulting in errors.
kHe = map(float, kHe)
kH2 = map(float, kH2)
kHe = np.array(kHe)
kH2 = np.array(kH2)
g = len(kH2)
However, once I have the columns of data, they are formatted differently than I am used to. They seem to be unusable.
I expect that the data will come out as multiple arrays [1,2,3], [4,5,6]. What I am currently getting is [[5.678e-8 ....] [7.893e-10 ...]].
It isn't in the right format and all my attempts to put it in the right format result in a size-1 error or similar.
From your code I'm assuming the data is separated by spaces. Then you can just read the file and format instead of using np.genfromtext
Edited for mapping float and column 5 to 10 inclusive (6 columns).
list=[]
with open ("Basecol_Basic_New_1.txt", 'r') as data:
for line in data:
list.append(map(float,line.strip().split(' ')[4:10]))
I have a dataframe which has one column shows price, and its index is datetime.
2018-09-18T02:29:56.5 524.6
2018-09-18T02:29:57.0 524.6
2018-09-18T02:29:57.5 524.8
2018-09-18T02:29:59.0 525.1
2018-09-18T02:29:59.5 525.1
2018-09-18T02:30:00.0 524.8
2018-09-19T21:00:00.5 527.1
2018-09-19T21:00:01.0 527.1
2018-09-19T21:00:01.5 527.3
2018-09-19T21:00:02.0 527.7
2018-09-19T21:00:02.5 527.5
2018-09-19T21:00:03.0 527.6
2018-09-19T21:00:03.5 527.4
im trying to plot the timeplot by matplotlib.pyplot.plot(df).
It gives a plot but with a long straight line connecting the discontinued datapoint (last data point on 2018-09-18T02:30:00.0 and the first data point on 2018-09-19T21:00:00.5). Is there a way to remove the connecting line between the data point gap?
sry... i think how i can do it... just use
df.plot(x=df.index.astype(str))
basically, convert my index from datetime to string, and use the strings as my x-axis