Retry Pattern in .NET Core

🔁 Retry pattern

The Retry pattern is a design pattern used to handle transient faults that occur when an application interacts with a remote service or resource. A transient fault is a temporary, self-correcting error, such as a brief network glitch, a database connection being temporarily dropped, or an external service being momentarily busy.

Instead of immediately failing and returning an error, a retry pattern automatically re-attempts the failed operation a specified number of times with a defined delay between attempts. This enhances the application's resilience and improves the user experience by transparently recovering from minor issues.

In .NET Core, the Polly library is the standard for implementing various resilience patterns, including Retry. It provides a fluent, thread-safe way to define these policies.

🔁 Retry Patterns in .NET Core

Retry patterns are used to handle transient faults—temporary issues like network glitches, timeouts, or service unavailability—that often resolve themselves. Instead of failing immediately, the application retries the operation after a short delay, improving resilience and user experience.

⚙️ How Retry Works

🔄 Automatically re-attempts failed operations
⏱️ Can include delays between retries (fixed or exponential backoff)
🎯 Targets transient errors, not permanent failures

🧩 Retry pattern example with Polly

The most common and robust way to implement a retry policy in ASP.NET Core is by integrating Polly with IHttpClientFactory, which is available since .NET Core 2.1.

The Polly library is the most popular way to implement retry patterns in .NET Core. It integrates seamlessly with IHttpClientFactory.

1️⃣ Install NuGet packages

Add the necessary Polly packages to your project.

//shell
dotnet add package Polly
dotnet add package Microsoft.Extensions.Http.Polly

2️⃣ Configure the retry policy in Program.cs

You can define a retry policy and attach it to your HttpClient using IHttpClientFactory in your application's startup. This keeps the retry logic centralized and separate from your business logic.

In this example, the policy retries up to three times with an exponential backoff.

//csharp
using Microsoft.Extensions.Http.Polly;
using Polly;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddHttpClient("resilientClient")
    .AddPolicyHandler(GetRetryPolicy());

var app = builder.Build();

// Other middleware and endpoints...

app.Run();

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    // Define a retry policy that handles transient HTTP errors
    // (network issues, 5xx status codes, and 408 Request Timeout).
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .OrResult(msg => msg.StatusCode == System.Net.HttpStatusCode.NotFound) // Optionally, retry on a 404
        .WaitAndRetryAsync(3, // Retry 3 times
            retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)), // Exponential backoff
            onRetry: (outcome, timespan, retryCount, context) =>
            {
                Console.WriteLine($"Request failed with {outcome.Exception?.Message}. Waiting {timespan.TotalSeconds}s before next retry. Retry attempt {retryCount}.");
            });
}

3️⃣ Use the resilient HttpClient

Inject IHttpClientFactory into your service or controller. When you create a client with the factory, the configured retry policy is automatically applied.

//csharp
public class MyApiService
{
    private readonly HttpClient _httpClient;

    public MyApiService(IHttpClientFactory clientFactory)
    {
        _httpClient = clientFactory.CreateClient("resilientClient");
    }

    public async Task<string> GetDataAsync()
    {
        try
        {
            var response = await _httpClient.GetAsync("https://some-external-api/data");
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadAsStringAsync();
        }
        catch (Exception ex)
        {
            // The policy failed after all retries.
            Console.WriteLine($"Failed to retrieve data after multiple retries: {ex.Message}");
            throw;
        }
    }
}

🔄 Different retry patterns

⏳ Fixed interval: Retries are attempted with a constant delay between each attempt. This is simple but can overwhelm a service if many clients retry simultaneously.
📈 Exponential backoff: The delay between each retry increases exponentially. This gives the service more time to recover from a load spike and is generally the recommended approach.
🎲 Exponential backoff with jitter: A randomized component (jitter) is added to the exponential backoff delay. This prevents a "retry storm," where a large number of clients retry at the exact same moment, potentially causing another failure.

✅ Advantages

💪 Increased resilience: The application can gracefully handle temporary service interruptions without user intervention.
😀 Improved user experience: Users are less likely to encounter a transient error message and more likely to see a successful result.
🛡 Fault tolerance: It makes a microservices architecture more tolerant of network latency and service fluctuations.
🧠 Reduced manual effort: Automated retries reduce the need for manual monitoring and restarts.
🛡️ Improves fault tolerance
📉 Reduces impact of transient errors
😊 Enhances user experience by avoiding immediate failures
🔧 Easy to configure and combine with other resilience patterns

⚠️ Disadvantages

🐢 Increased latency: Each retry adds a delay, potentially increasing the total time to complete a request.
🌩 Risk of retry storms: Without proper backoff and jitter, a surge of retries can cause more harm by overwhelming a struggling service.
🔥 Resource consumption: Retries consume additional resources (CPU, network), which can degrade performance if not managed properly.
🚫 Not for all failures: Retries are not suitable for permanent errors (e.g., authentication failures or invalid requests) where retrying would be futile.
🐢 Can introduce latency if retries are excessive
🔁 May overload services if not throttled properly
🧩 Requires careful tuning to avoid retry storms

💡 Best practices and tips

🎯 Use exponential backoff with jitter: This is the most effective strategy for managing load on a failing service and preventing retry storms.
🔢 Set finite retry limits: Use a fixed, finite number of retries. Retrying indefinitely is an anti-pattern that can lead to resource exhaustion.
📝 Log retry events: Log when retries occur and which exceptions caused them. This is valuable for monitoring your application's health.
🚧 Combine with a Circuit Breaker: For non-transient or persistent failures, a circuit breaker should be used. It prevents repeated attempts to a consistently failing service, allowing it to recover and avoiding a denial-of-service attack on yourself.
♻ Ensure idempotency: If an operation is not idempotent, a retry could lead to unintended side effects, such as duplicate database records.
⏱ Monitor consumer timeouts: For user-facing operations, the retry logic should respect the client's timeout. There's no point in retrying if the client has already given up.
⏳ Use exponential backoff with jitter to avoid synchronized retries
🔗 Integrate with IHttpClientFactory for centralized configuration
📊 Log retry attempts for diagnostics and monitoring
🧱 Apply retries selectively—only where transient faults are expected
🧰 Combine with fallback strategies for graceful degradation

📌 When to use

☁ When interacting with cloud services (e.g., Azure SQL, Cosmos DB, or AWS services) that have inherent transient faults.
🔗 For HTTP calls to external APIs or other microservices in a distributed system.
⚙ In background processes or message queue consumers where a longer, more patient retry strategy can be effective.
🌐 When calling external APIs or services prone to transient faults
🔗 In microservices architectures with network dependencies
📡 For operations that occasionally fail due to timeouts or connectivity

🚫 When not to use

❌ For persistent, non-transient failures (e.g., a 401 Unauthorized or 404 Not Found response). Retrying these won't help.
💥 For long-lasting, unrecoverable errors. A circuit breaker is a better pattern in this case.
👎 For synchronous, user-facing operations where any significant delay is unacceptable.
⚠ For idempotent operations that modify state, unless you can guarantee that duplicate attempts are safe.
💾 For local operations that fail due to logic errors
🧠 When failures are permanent or due to invalid input
⚠️ If retrying could cause data duplication or side effects

🛡️ Precautions

📈 Monitor retry metrics to detect patterns and tune settings
🔄 Combine with Circuit Breaker to avoid retrying during outages
🧪 Test under load to validate retry behavior

⚖️ Circuit Breaker vs Retry Pattern

While both patterns are used to improve application resilience, they serve different purposes and operate differently.

📌 Feature	🔌 Circuit Breaker	🔁 Retry Pattern
🎯 Primary Goal	Prevents repeated calls to a failing service, avoiding cascading failures and resource exhaustion.	Handles transient faults by retrying failed operations after a delay.
⚙️ Mechanism	Tracks failure rates and opens the circuit to block calls when a threshold is exceeded. Allows recovery via half-open state.	Retries operations a set number of times with optional delay strategies like exponential backoff.
📨 Response to Failure	Blocks calls immediately when in open state, returning a fast failure response.	Retries the operation after delay until success or retry limit is reached.
🧩 Use Case	Best for persistent failures or unstable services that need time to recover.	Best for temporary issues like network glitches or timeouts.
🔗 Integration	Often used with IHttpClientFactory and Polly for HTTP calls.	Also implemented with Polly and IHttpClientFactory for retry logic.
⚠️ Risk	May block healthy requests if thresholds are too sensitive.	May cause retry storms or overload services if not throttled.
💡 Best Practice	Combine with fallback and logging. Use separate breakers for different dependencies.	Use exponential backoff with jitter. Apply selectively to transient faults.

📚 Resources

Back to Index

Circuit Breaker Pattern in .NET

OCR In ASP.NET

Cloud Concepts

Project Mgmt. Concepts

Architecture & Design Patterns - 1

Architecture & Design Patterns - 2

Security Practices

.NET Core Essentials-1

.NET Core Essentials-2

.NET Core Essentials-3

.NET Core Essentials-4

.NET Core Essentials-5

.NET Core Essentials-6

.NET Core Essentials-7