How to Stop Fixing Distributed Systems and Start Living with Restate
Imagine this: you're writing a microservice for payment processing. Mid-process, the network flickers, the database times out, or the container simply restarts. What happens to the transaction? In a classic scenario, you need to manually implement retry logic, track idempotency, configure message queues, and most likely introduce some complex saga. It's painful, time-consuming, and turns your code into a maze of error handlers.
I recently discovered Restate, a project tackling this problem at the infrastructure level. The developers call it "Durable Execution" — fault-tolerant code execution. The idea is that your code becomes immortal: if a process crashes, Restate resumes it right where it left off, preserving all variables and state.
What even is this
Restate is a Rust binary that works as a proxy server and coordinator for your services. It takes on all the dirty work of state management and calls. You write regular code in TypeScript, Python, or Go, use the SDK, and suddenly your functions turn into reliable workflows.
The main difference from heavyweight solutions like Temporal is that Restate is much easier to get started with. You don't need a massive setup of databases and complex workers. You can just run a single binary or Docker container and start working.
How Restate helps in practice
The project includes several cool concepts that genuinely simplify your life.
Guaranteed execution
If you called a function through Restate, it will execute to the end. Period. If the server running your code crashes, Restate will wait for it to come back and continue execution from the last successful step. You no longer need to worry about whether the email was sent to the user twice or if money was charged twice.
Smart timers and promises
Usually, implementing a delay in a distributed system is a quest. You need to put a message in a delay queue or set up a cron job. In Restate, you just write ctx.sleep(duration). The thread doesn't block uselessly: the service can shut down entirely, and three days later Restate will "wake" it up and continue execution.
State right in the code
Restate lets you store K/V state bound to a specific entity (for example, a user ID). It looks like regular object work, but under the hood Restate guarantees that the data is consistent and always available alongside the request. This is especially convenient for serverless architectures where functions typically have no memory.
What it looks like in code
Let's say we need to implement a user registration process with email confirmation. In TypeScript using the Restate SDK, it would look something like this:
import * as restate from "@restatedev/restate-sdk";
const userService = restate.service({
name: "users",
handlers: {
register: async (ctx: restate.Context, user: { id: string, email: string }) => {
// Сохраняем состояние
ctx.set("status", "pending");
// Отправляем письмо (Restate гарантирует, что это случится 1 раз)
await ctx.run(() => sendWelcomeEmail(user.email));
// Ждем подтверждения или таймаута в 24 часа
const confirmed = await ctx.awakeable<boolean>("email-confirmed");
if (confirmed) {
ctx.set("status", "active");
}
}
}
});
Here ctx.run guarantees that the side effect (sending an email) will execute successfully, and the result will be cached. If the function crashes after sending, on restart Restate simply skips this step, knowing it's already done.
The technical side
The project is written in Rust, which gives excellent performance. Architecturally, Restate acts as an invoker. It receives incoming requests over HTTP/gRPC, writes them to its log, and calls your handlers.
Interestingly, Restate can "suspend" execution. If your code is waiting for a response from an external API or a timer, Restate frees up resources. When the event occurs, it restores the execution context. This allows running thousands of long-running processes on modest hardware.
Who should try it
Restate will perfectly fill the gaps in projects where:
- There are many call chains between microservices.
- You need to build complex chains of actions (sagas, workflows).
- You're using AI agents that need to wait a long time for LLM responses and preserve conversation context.
- There are delayed execution tasks (remind about an abandoned cart in 2 hours).
The project is actively developing, with nearly 4,000 stars on GitHub. SDKs are available for TypeScript/JavaScript, Java/Kotlin, Python, Go, and Rust.
Of course, you shouldn't rush to pull a new infrastructure component into a large bank's production tomorrow — first, you need to try it locally. But for startups or new features in existing projects, it could save weeks of development.
You can try it in literally a couple of minutes:
brew install restatedev/tap/restate-server
restate-server
And that's it, you have a local environment for running fault-tolerant applications. Perhaps this is the lowest barrier to entry into the world of Durable Execution at the moment.
Related projects