- 01/25/23 edit: FB engineers integrated the solution in this post into the package in version release 1.1.2. Due to some implementation differences it is still slightly slower than the solution in the end of this post, but not significantly.
- If you’re not interested in learning about vectorization and Prophet’s inner workings, and you just want to run Prophet faster, you can skip to the tl;dr at the end.
Facebook Prophet
Facebook’s Prophet package for time series forecasting was released in 2017 and has since become one of the most, if not the most, popular forecasting algorithms: According to PyPy, Prophet was downloaded ~20 million times. The stream of downloads continues, despite some criticism of the model’s accuracy and the general trend towards neural networks for many machine learning applications. Prophet’s popularity probably stems from its simple out-of-the-box use, its confidence intervals, clear visualizations, and the fact that it often outperforms classic statistical algorithms (ARIMA, ETS).
However popular, Prophet has one huge disadvantage: it is painfully slow and does not scale to hundreds of thousands of items.
Let’s try to run the following code, creating a random time series:
import pandas as pd
import numpy as np
import datetime
from fbprophet import Prophet
n = 100
some_data = pd.DataFrame({'ds':pd.date_range(datetime.datetime(2020,1,1,), freq='W', periods=n), 'y': np.random.rand(n)})
now let’s test the running time:
%%timeit
prophet = Prophet(interval_width=0.8)
prophet.fit(some_data)
output:
# 91.5 ms ± 1.66 ms per loop
About 0.1 seconds to fit the data. But the real pain comes in the "predict" stage:
%%timeit
prophet.predict(some_data)
output:
1.15 s ± 55.9 ms per loop
It takes more than a full second to get the prediction! This is surprising, since in most ML models, training is expensive, and prediction is cheap.
In this post, we will see why Prophet’s predict function is relatively slow, and how to get it to run 100X faster (note that you will still have to fit the model first, so fit+predict will only see a 5-20X improvement). Along the way, we will learn something about modeling time series uncertainty, statistical distributions, and vectorization.
Let’s first explain Prophet’s modeling of uncertainty, and some of its code, and then show how we can easily vectorize the function.
Finding the time-consuming function
If you profile the "predict" code, you can see where the fault lies.

About 98% of the time is spent on "predict_uncertainty". This function creates "yhat_upper" and "yhat_lower" in the result DataFrame, which are the edges of an 80% (or any other value given in interval_width) confidence interval – where the actual points are likely to lie.
What does Prophet do when it models uncertainty, that takes so much time?
Prophet’s uncertainty
(If you’re not familiar with Prophet’s additive model, you should probably start with a quick overview)
The Prophet modeling assumes that there are two sources of uncertainty in the data:
- Gaussian residuals around the trend line
- Changing slope values
When fitting, Prophet finds the best points in the training data for trend-changes to best fit the data – it is assumed that the trend is linear at any given point, but the slope of the trend can change. Naturally, the more trend-changes found, and the greater the delta between adjacent slopes, the greater the uncertainty in the future values, as can be seen in these examples.


If the trend changes significantly and often in the training data, the range of possible values for the future is large. In contrast, if the trend is relatively constant in the train data – we have more certainty it will continue in a straight path in the future, in which case the only remaining source of error is the residuals that we have already seen in the historic data (which does not grow with time).
We will not discuss how Prophet finds the change-points and their delta, that’s all done in the fit phase.
Predicting uncertainty in a fitted model
The following is a "paraphrase-code" (compressed, with some non-crucial parts removed for clarity) of Prophet’s predict_uncertainty (you can skip to the literal description):
Let’s summarize the process in words:
Here’s what these sampled trends look like:

The train data had two slope changes. Five future trend lines are presented, note that the number of change-points, the time they occur, and the delta between slopes are different for each line.
Prophet adds Gaussian noise to these trend lines, giving us:

Finally, it finds 10% and 90% quantiles for the values of each time step, giving us our confidence interval.
Optimizing runtime
Our goal is to make this process more efficient by vectorizing it. A detailed explanation of vectorization is beyond the scope of this post but suffice to say that if we remove for-loops and perform all the operations on a NumPy array, the process will run much faster.
Before we vectorize the code, let’s note that one part of this process is redundant: The size of the confidence interval around the mean only depends on the likelihood and size of slope-changes and the past residuals; we, therefore, don’t need to concatenate the slope, intercept, and deltas of the training data and assess their values. The mean of this interval depends on the final trend line, the yhat, which depends on the training data intercept and slopes, but which is computed before predict_uncertainty starts.
We can assume slope=0 and intercept=0, use the likelihood and size of slope changes find the 10%-90% interval for all time steps (e.g. it is +/-8). We then add this value to the yhat (e.g. 17) to get the confidence interval (9-25).



Note that as time increases the width of the confidence interval increases, (which intuitively makes sense). This is because the trend-changes could have a greater impact the more we forecast into the future.
Now let’s vectorize
Remember our goal is to create a matrix where each row is a sampled trend, and each column is some date in the future. E.g.

But with 1000+ rows.
First, we need to sample the number and location of slope-changes. This is where Prophet samples from a Poisson distribution for each row, and then randomly assigns the number of changepoints. We’re going to get the same outcome without Poisson. The likelihood of a changepoint occurring at any time step is determined by the number of changepoint observed in the training data, divided by the length of the train data. Assuming we have a trained prophet_obj and a forecast_df with the future dates to forecast, the likelihood is:
prophet_obj.changepoints_t is a list of times where a changepoint occurred in the training data. What is single_diff? It is a consequence of how Prophet models time progression.
Prophet’s time progression
In both the train and predict functions, Prophet converts the time ("ds") column in the Pandas DataFrame to an array called t which signifies time progression. For all kinds of reasons, it is convenient to have t in the range of 0-1 during training. So, in training, t is created by setting the interval between any two elements as 1/training_length. E.g., if there are 50 training data points, t is equal to [0, 0.02, 0.04, …, 0.98, 1.]. For the forecast, it will continue with the same intervals [1.02, 1.04 …] up to the length of forecast_df.
Since the t array represents the progression of time, it can be used to calculate progress over time. For example, some_coeft creates a linear trend, where some_coef represents the slope. Thus, a slope of 4 means that in each time step the series adds 4single_diff (4*0.02 in this example).

Back to vectorizing slope changes
We now have the likelihood of a slope-change at any given point. Let’s create a matrix of slope-changes:
where k is the number of samples (rows). The result is a Boolean matrix:

And this is where the magic happens. Take the sum of each row, plot the distribution, and you will get:

*A Poisson distribution with a mean of likelihood len(future_t_time)**! Exactly what Prophet samples from to set the n_changes for each row!
This derives straight from the definition of Poisson – if you have q independent draws where each has a likelihood of l being positive, you get a Poisson distribution with a mean of q*l.
We replaced Prophet’s for-loop sampling of the number of changepoints in a row – and then sampling their placement in the time-steps – and replaced it with a single line for an entire matrix!
But we don’t need a Boolean value of whether a change occurs – we need the delta of the change. This part is simple:
We create a new matrix of samples, this one from a Laplace distribution with a scale of the mean absolute training data deltas. Why Laplace? I don’t know if there’s a theoretical justification, but that’s what Prophet does, so we follow it (except that Prophet performs the sampling separately for each row, and we sample a matrix). Take the product of the two matrices to get:

The change in slope for each sampled trend, in each time step (remember row=sampled trend, column=time step).
We are almost done! Now we need to transition from slope changes to actual forecast values.
Slope-change to actual values
Imagine we have the following time series of deltas of slope (the change in slope values between adjacent steps): [0,1,0,0,-4,0,2]. Assuming we start with an actual slope of 0, the slope at each step will be [0,1,1,1,-3,-3,-1], as for each time step we take the sum of deltas up to it, or in other words: "cumsum".

But this only gives us the slope at each point (instead of the slope-delta). We want the actual values! A linear slope means we add the slope value at each iteration. If we have a slope of 2 for 3 time-steps, the actual values will be [2,4,6]; it should be obvious that this is another cumsum. Hence, to transition from a matrix of slope-deltas to the actual values of the time series:
Remember that a slope of 4 means that 4*single_diff is the actual value, so we take the product of this new matrix with single_diff.
Adding Gaussian residuals
Now we have a matrix of sample trends, we only need to add the Gaussian noise:
Where sigma is the standard deviation around the trend in the training data.
We finished vectorizing the creation of a matrix of samples. If we want the confidence interval between 10%-90%, we set
Simply adding the lower and upper edges of the interval to the yhat forecast. All done! Except for a few small issues (feel free to skip the next 3 sections).
Mid-step trend change
There’s a tiny modeling difference between the vectorized code and Prophet. Prophet allows the slope-changes to occur between time-steps. E.g if t = [1.02, 1.04, 1.06…], a slope change can be set at 1.028. Or in other words: if your data represents weekly values on Mondays, Prophet allows for slope changes Saturdays or any other day.
I’m not sure this modeling always makes sense for all datasets. Also, from my experience, the difference is negligible.
However, for the sake of completion: Let’s consider a trend change occurring before the first actual step (in 1.0–1.02). If it is close to 1.0 (Previous Monday), by the time-step 1.02 we will get a full step-worth of change (the 4*single_diff). But if it is close to 1.02 (Sunday), the slope change will not make a big difference for that time step. The changes can occur anywhere on that spectrum, making the best correction the mean of the two extremes: each time step with the one before it (with 0 for the first one).
Another approach is to create a matrix with k times more columns, and a chengepoint likelihood of 1/k of the previous likelihood. This matrix will represent smaller intervals than the ones in the time series. After the cumsum.cumsum we could extract every k-th column. But again – the difference is probably insignificant.
Logistic growth
By default, Prophet assumes the growth is linear, but you can set it to logistic, in which case the creation of the slope-changes remains identical, but the transition from the matrix to the sampled trends is completely different (non-linear). Prophet’s code for that transition is monstrous, but it is trivial to convert the code to a vectorized version. I will not explain it in this post, but the vectorized code is given below for your use.
Training data
One final issue – all the above applies to forecasting into the future, where the trend values are not known and therefore need to be sampled. However, sometimes we call the predict function on the training data. The uncertainty there only depends on the Gaussian noise around the trend, so we can set the sample_trends to be zeros.
Time and accuracy comparison
How much faster is the vectorized version?
%%timeit
add_prophet_uncertainty(p, forecast_df)
output:
2.13 ms ± 139 µs per loop
Compared with 1s+ for the original, non-vectorized, version of Prophet. This is a 500x improvement.
And does it return the same results as Prophet? Pretty close.





Why are the results not identical? Remember, this is a random process, the 1000 samples created by Prophet and by the vectorized code won’t have the exact same values. If the standard deviation is large – 1000 samples are not enough for the law of large numbers to take effect. Run it again and you will get a slightly different result. In fact, since the vectorized version is so much faster, I set k samples to be 10000, which gives a more consistent result, this is why it is smoother than Prophet’s forecast in the examples above. Even with 10000 samples, it still only takes ~13ms.
tl;dr
Start by defining these functions (If you never use logistic growth – you can remove prophet_logistic_uncertainty and its uses, which is more than half of the code):
Given some training_df, if you require the uncertainty interval for both train and forecast, run the following code:





