Black Friday is one of the busiest shopping days of the year, and as such, it's a great opportunity to evaluate the performance and reliability of your website. By analyzing how your website performed on Black Friday, you can identify areas for improvement and take steps to make your site more reliable for the next time traffic reaches these heights.
If your website traffic is not impacted by Black Friday, you can still use these tips to analyze your site's performance and reliability during any traffic spike.
This analysis should include data from the week before and after Black Friday. This will give you a more complete picture of how your site performed during the traffic spike period.
Run a wide scan of your performance: Check your observability platform - eg. Datadog or Grafana - to track your site's performance on Black Friday using your existing dashboards. Look for any critical monitors that have fired off during this period. It can also be useful to check for any recently filed user complaints that mention poor performance or increased errors.
This scan is intended to help you identify any obvious problems, such as slow loading times or errors, that may have been affecting your site's performance. The objective is to find hotspots that we will treat as entry points for deeper analysis. It is also likely that you'll find blind spots in observability that you can address in the future.
Analyze your website's traffic patterns: Analyzing your website's request traffic volume on Black Friday can give you valuable insights into how well your site can handle high levels of traffic. Look for patterns and trends to help you identify any potential bottlenecks. Important metrics to guage include: request count, concurrent users count, and the ramp-up period for these concurrent users. These metrics will give you a better understanding of your scaling needs.
Optimize your website's loading times: Slow loading times can be a major issue for websites, especially during peak traffic periods like Black Friday. It is good to start with the endpoints your application is spending the most time processing. An example of calculating this would be to use the sum of response time for all requests to a given endpoint.
I recommend comparing slow endpoint performance with their performance during normal traffic periods. Some of these endpoints may have scaling issues where their performance degrades with volume and others may be consistently slow but only cause site issues at higher volume. Depending on the situation, you can optimize these endpoints by optimizing their code or increasing their resources.
Optimizations on the frontend can include things like optimizing images, minifying CSS and JavaScript, and reducing the number of HTTP requests your site makes. Backend optimizations include introducing caches, restructuring data query patterns, adding indices, lowering cost of data serialization, and service deployment configuration.
Increase your website's capacity: If your website is struggling to handle the high levels of traffic on Black Friday, you may need to increase its capacity. This can include things like upgrading your application deployment strategy to add more resources, introducing cache instances, adding database replicas, and resizing servers. The strategy here varies heavily depending on your application's architecture and the resources it uses.
Be careful with defaulting to this approach, since it can be expensive and may not be necessary. It is important to understand the root cause of the performance issues before making any changes. Adding resources can be a good way to mitigate performance issues, but it may not be a long-term solution.
Test your website's reliability: Testing your website's reliability is a great way to identify any potential issues that may arise during high traffic periods. A good strategy for this is to run performance tests like load tests and stress tests. These tests can help you identify any potential issues with your application's architecture, such as bottlenecks, and help you identify any potential issues with your application's code, such as memory leaks. JMeter, Gatling, Locust, and k6 are some of the most popular tools for these types of tests.
Based on your performance analysis, you should always ask yourself whether there are any tests that you can implement which would have caught issues before they became a problem. If so, you should consider implementing these tests in the future. There may also be constraints preventing you from running these tests effectively, such has testing environments with limited resources. This is a great time to raise these issues to your team and discuss how to address them.
Remove observability blind spots: During these exercises, it's common to find new metrics that indicate areas of concern or improvement. It is important to update your observability tools - dashboards and monitors - to include these new metrics. This will help you to better monitor your site's performance in the future.
Closing Thoughts
By taking these steps, you can use Black Friday website performance to identify areas for improvement and make your site more reliable. This will help ensure that your website can handle the demands of peak traffic periods, and provide a better experience for your customers. Remember, Boxing Day is just around the corner!
I'm building Latency Lingo to help teams better leverage performance testing, so I'd encourage you to sign up and try it out. Feedback is always appreciated!
Have any questions or requests for additional content?