Better forecasting with NeuralProphet
Knowing what’s coming is critical in many business cases, and plenty of tools claim to be your crystal ball into the future. A popular one that caught my eye was Facebook’s Prophet tool in 2018. Business forecasts, traffic projections and even predicting how hot tomorrow will be are all things you can do with a basic laptop and a can-do attitude.
Well, it’s been a while since ‘the before times’, and as comforting and (relatively) simple as it is to run Facebook Prophet, I wanted to know if there are better options. If you’ve used Prophet before, you’ll know there are occasions when the numbers don’t look quite right, or it’s picked up on what looks like a seasonal pattern and completely gone off the rails. As far as crystal balls go, mine could definitely do with an upgrade.
This is a forecasting tool based on neural networks, inspired by Facebook Prophet. It effectively does the same thing as regular Prophet, but uses an auto-regressive neural network. This certainly sounds fancy, and in this article, I’m going to describe how I’ve taken hundreds of websites and predicted their Organic traffic to see which one of these tools wins overall.
What’s time-series forecasting?
Before jumping straight into the tests, here’s a quick explanation of what we mean when discussing time-series forecasting from a digital marketing perspective. We take lots of historic data points and make predictions based on trends to anticipate future events. In essence, we apply the seasonal patterns and overall trends to the future. Here’s a walkthrough of how to see seasonal trends and patterns, using Wikipedia’s traffic as an example.
Why is it useful?
From a Search perspective, time-series predictions add insights to otherwise trivial data. Being able to predict allows you to:
- Measure uplift: Knowing what traffic was likely to do helps measure incremental growth over the course of a campaign.
- Anomaly detection: Similar to the above, it becomes easy to spot any clear deviations from the projected path. This is helpful for diagnosing tracking issues or the effects of a site migration.
- Anticipate seasonal trends: Great for content planning, especially around highly seasonal products that may only get a few weeks of attention each year.
- Budget planning: Projections can be useful in helping budget allocation for paid search campaigns.
- Setting expectations: Being able to predict and aggregate search volumes for target search terms helps analysts steer clear of the overly ambitious ‘hockey stick’ forecast.
What gives NeuralProphet an edge?
There are many differences between Facebook Prophet and NueralProphet; we won’t go into all the subtleties here. Generally speaking, NueralProphet can put together more advanced models using neural networks. It has auto-regressive components that allow it to adjust for more than trends and seasonality, making it more capable of adapting to sudden changes or complex patterns.
While Facebook Prophet offers different modes of growth (linear, logistic or flat) NeuralPophet can model more complex non-linear relationships, largely due to its underlying neural network architecture. The diagram below shows an example of the network used in one of our tests:
Another major difference between the two versions is that NeuralProphet can run on your GPU. Because NeuralProphet is built on PyTorch, you can run it either on your CPU or on any attached GPU, which is great news if you have a decent graphics card and a lot of data to deal with.
Which Prophet is better? Prove it!
In order to know which version is better for forecasting Organic data, we’d need to test out both versions on the same datasets and see which Prophet is more accurate. We’d give each model enough data to capture seasonality patterns and then get it to project one month beyond the source data. We can then compare the predicted 30 days to the actual 30 days and see which version came closest to reality.
Of course, carrying out just a handful of tests isn’t likely to yield very significant results, so we’re going to have to do this test on a lot of time series,which is exactly what we did. Starting with a likely set of 1,000 Google Analytics profiles we would need sites that have been active for at least a two year period. And given the recent switchover to GA4 we’d want our chosen metric (organic sessions) to be unchanged, which means pulling from Universal Analytics profiles only.
After a bold attempt to screen out sites that weren’t perfectly active between June 2021 and June 2023 we’re left with a good data set of around 400 websites of varying traffic levels and website types. This should give us lots of opportunities to see how each version of Prophet handles the quirks of real website traffic.
NeuralProphet is significantly better at forecasting
After running all the tests and collating the results it was time to pick a metric to work out how “accurate” our forecasts turned out. There are many different ways of measuring the gap between our projected numbers and our recorded traffic, but the one we’ve gone for is called sMAPE, or Symmetric Mean Absolute Percentage Error.
This gives us a percentage error rate we can use to compare small sites with only a few visitors each day to very large sites which might have hundreds of thousands. The ‘symmetric’ part of this metric ensures that we treat under-forecasting and over-forecasting as equal – we just want to focus on how close to reality we got. In a perfect scenario, the sMAPE would be 0%. Whichever version of Prophet comes closest to 0% on average is our winner.
Here are the sMAPE scores summarised:
The figures show that the best result Facebook Prophet managed was 3.98%, while NeuralProphet was an improved 2.67%. This is cherry-picking our results, but it’s interesting to see that both versions of Prophet are capable of coming very close to perfect for a monthly traffic forecast.
On the flip side, we can see that both versions also have a maximum of 200%, meaning both versions also have the ability to come up with scenarios that are fantastically wrong.
The important figure to focus on here is the median sMAPE for each Prophet. Facebook Prophet had a median sMAPE of 25.26%, while NeuralProphet had a much improved median of 19.65%. In terms of relative improvement on the original, NeuralProphet improved forecasting accuracy with our data sets by about 22.2%*.
That’s a bold claim to make from one little table, so to make this easier to visualise, the chart below shows the spread of how close tests were for each version of Prophet. We can see that significantly more NeuralProphet forecasts were closer to the perfect 0% sMAPE than Facebook Prophet, though both versions follow a similar distribution overall.
While we’ve answered our “Which is better?” question, we should explore some examples of where each version performed better and worse. This should give us some understanding of scenarios where Prophet is strongest and weakest. To start with, let’s see an example of what good looks like.
The chart below shows an example of where both versions of Prophet came very close to the recorded session figures in June 2023. Both models have registered the clear weekly seasonality – it’s interesting to see that both versions anticipated that July onward would have a downward trend overall. This can be understood better when we zoom out.
This next chart includes all the historical data, aggregated by week to make it easier to visualise. (The end of June was a short week which is why it drops). For this website, the run from June into July has typically had a downward trend in previous years, and both versions of Prophet were able to build this yearly seasonality into their models.
What about an example where both versions of Prophet severely missed the mark? The next example shows both versions of Prophet shooting drastically higher than recorded sessions in June 2023, and there are also times where both models predict negative sessions – though Facebook Prophet’s model leans into that more heavily. What could cause such a wild deviation from what turned out to be a rather flat and unremarkable June?
The answer lies in the historic data that the model was built around. This website is unusual, in that while it was technically active in 2023 (one of the parameters for being accepted into the test set) there appears to be a long period of disuse from late 2022 to early 2023. What’s more, a huge spike in sessions in June 2022 has likely tricked our models into projecting for what turned out to be a one-off event.
While something of an outlier in our experiment, a result like this demonstrates the need for analyst input. Had this been a real forecast it’s likely that the analyst making it would mitigate the unseasonable spike, and potentially find an additional data source to account for what looks like a period of tracking failure.
Continuing to sift through the hundreds of results, we also have an example of where NeuralProphet diverged wildly from reality, while Facebook Prophet loosely sticks to a sensible forecast. Here we have a NeuralProphet model that looks like it adheres well to reality, and then goes wild a few days into our testing period. It’s utterly bizarre how this model manages to be so right, and so wrong all in the same chart.
When we zoom out to see all the historic training data we can see why. This traffic pattern starts at zero in mid 2021 and grows with significant fits and starts over the two year period. What could have happened here is Facebook Prophet detected a ‘changepoint’ in its model and adapted accordingly to a new trend which had set in by April 2023.
NeuralProphet, on the other hand, could have picked up on the somewhat unseasonal spikes and growth from zero, inferring another impending spike in sessions. It’s hard to be sure for certain and beyond the scope of our experiment of blind testing hundreds of data sets.
In the spirit of fairness, let’s pick an occasion where the reverse is true, NeuralProphet comes out on top, and Facebook Prophet plunges into the depths of negative sessions.
This is an interesting example to review since the recorded sessions are somewhat erratic. Both versions of Prophet have picked up and repeated a weekly pattern, and while NeuralProphet certainly does a better job of predicting June 2023 it’s by no means perfect.
When we zoom out to see what these models were trained on, we can see where the struggle comes from:
Whichever site this is, it’s had clear and erratic highs and lows from mid 2021 to an abrupt change in June 2022, where traffic more than halves in a few weeks. This may be a familiar pattern for anyone looking at ecommerce data during COVID periods. Perhaps this sharp drop in 2022 was simply the “new normal” for this site.
Looking back at the Facebook Prophet model we can see the end of year wiggle observed at the end of 2021 modelled in 2022. It would follow then that the mid-year drop was also perceived as a seasonal event, causing Facebook Prophet to project another catastrophic loss in June of 2023.
Pros and cons of both versions of Prophet
The purpose of this experiment was to establish which tool would be better for use in a web analytics setting, and we’ve established that NeuralProphet is significantly better for this purpose. However, it’s also a much more involved piece of software to use and might be too complicated for users who are just after a generalised projection. There’s definitely room for both versions of Prophet in your analytical toolbox; here are some of their pros and cons:
- Ease of Use: Prophet is known for its simplicity and is accessible even for those who have limited experience in time series forecasting.
- Interpretable Results: Offers easily interpretable components like trend, weekly seasonality, and yearly seasonality.
- Handling Missing Data: Capable of handling missing data and outliers without requiring preprocessing.
- Scalability: Well-suited for a broad range of data sizes, from small datasets to large-scale business problems.
- Open Source: Being open-source, it has a strong community and a wealth of online resources.
- Availability: Unlike NeuralProphet, you can use this version on both R and Python, making it more accessible.
- Limited Flexibility: While it does well for standard use-cases, it’s not as flexible for capturing complex non-linear relationships.
- Computation Time: For very large datasets, Prophet can be slower and more resource-intensive.
- No GPU Support: Limited to CPU-based computation, which can be a bottleneck for large-scale problems.
- Advanced Modelling: Built on PyTorch, it allows for more complex, non-linear modelling.
- GPU Support: Can be run on a GPU, which is beneficial for large datasets and speeds up computation.
- Regularisation Techniques: Offers various ways to prevent overfitting, making it more robust.
- Hyperparameter Tuning: Built-in support for hyperparameter optimisation, enhancing performance.
- Auto-Regressive Components: Can learn from past observations, improving forecast accuracy for certain types of data.
- Complexity: Might be overkill for simple time series data or for users looking for quick, interpretable results.
- Learning Curve: While it offers more advanced features, it also requires a deeper understanding of time series forecasting and neural networks.
- Resource Requirements: Due to its complex nature, it can be more resource-intensive, especially if not optimised properly.
Pitting two forecasting tools against each other is a fun experiment, but we should take a minute to appreciate the fundamental importance of forecasting.
Achieving accurate forecasts is not merely a matter of mathematics or picking the right tool; it’s also a matter of discipline and ethical responsibility. While having precise data, logical assumptions, and a sound methodology is crucial, these elements can only take you so far. The human element, often overlooked, can dramatically influence the outcome—for better or worse.
The tendency to adjust or ignore forecasts based on personal sentiment or convenience can be risky, and sometimes perilous. Sometimes, people tweak the numbers to make a graph look more appealing in a business pitch. Other times, it’s a tactic to avoid confronting uncomfortable truths with stakeholders. Altering forecasts based on subjective feelings and presenting them as truth can lead to disastrous consequences.
Consider the infamous example from August 2019, when a doctored forecast map of Hurricane Dorian was presented by the US President to the media. A neat-looking NOAA projection showed the storm’s path across the Bahamas with an apparent trajectory towards the coasts of Georgia, South Carolina and North Carolina. Next to this projection was a crude, hand-drawn loop extending the hurricane’s projected path to also include Alabama.
Such (illegal) alterations can sow confusion and lead to poor decision-making, putting lives and resources at risk. Though it was never officially confirmed who altered the map or why, the incident serves as a cautionary tale.
Here are some principles of best practice in forecasting
- Consistency: Employ a repeatable, evidence-based methodology.
- Validation: Continually test your model against real-world data to ensure accuracy.
- Transparency: Clearly state your assumptions and limitations, so those who rely on your forecasts understand their scope and reliability.
When it comes to forecasting, it’s better to face an inconvenient truth than to grapple with the consequences of a convenient lie.
Documentation and guidance on how to use Facebook Prophet can be found on their original github page. You can use this version on both R and Python.
Documentation on how to use NeuralProphet comes from neuralprophet.com, and realistically can only be used on Python (>=3.7,<3.11). If you’re wanting to use GPU acceleration I would recommend using an environment that has Python 3.9, torch 1.13.1, torchvision 0.14.1, torchaudio 0.13.1 and cuda 11.7.
*On calculating relative improvement between the versions of Prophet: