Your nodes can now talk to each other, sync files, and send commands — but honestly, that’s not quite useful enough yet.

Last week I was thinking: wouldn’t it be great if I could just throw tasks at any idle worker in the cluster, and have the results automatically come back? Like ordering food on a delivery app — the system automatically assigns the nearest rider, and when it’s delivered, you get an automatic confirmation.

So this time, let’s build something truly practical.

What Are We Building?

Simply put, I want these capabilities:

Build a distributed task queue - Like a food delivery dispatch system, automatically assigns work to idle workers

Route tasks to other nodes - Whether the node is in Shanghai or Beijing, one command sends it there

Results find their way back - After task completion, the result knows who sent it and returns automatically

Lay the foundation for MapReduce - Basically the “break big tasks into small ones, do them separately, then aggregate” pattern

Sounds cool, right? Actually, it’s not that complex to implement.

Step 1: Define Jobs and Results

First, we need to agree on what “tasks” and “results” look like. Like food delivery orders, you need order numbers, contents, delivery addresses:

#[derive(Debug, Serialize, Deserialize)]
pub struct Job {
    pub id: String,      // Task ID, like an order number
    pub data: String,    // Task content, e.g., "process user data"
}

#[derive(Debug, Serialize, Deserialize)]
pub struct JobResult {
    pub id: String,           // Task ID, corresponds to the Job above
    pub result: String,       // Processing result
    pub worker_node: String,  // Which node did the work
}

These two structs are our “delivery order” and “delivery confirmation”. Simple and clear, with all the necessary info.

Step 2: Create a Worker Actor

Now we need an actual worker to do the job. In my design, each node can run several Worker Actors — they’re like delivery riders: accept orders, do work, report back:

struct Worker;

#[async_trait::async_trait]
impl Actor for Worker {
    type Message = Job;

    async fn handle(&mut self, msg: Job) {
        println!("Received job: {:?}", msg);

        // Start working (this is just an example, actual work might be complex computation)
        let result = JobResult {
            id: msg.id.clone(),
            result: format!("Processed: {}", msg.data),
            worker_node: "node-b".into(),
        };

        // Send result back
        let json = serde_json::to_string(&result).unwrap();
        self.router.send("coordinator@node-a", &json).await;
    }
}

See? After the worker receives a task, processes it, and sends the result directly back to whoever assigned it. Just like a rider marking “delivered” in the app after dropping off food.

The key is this self.router.send() — it automatically routes the message to the correct node and Actor, we don’t worry about network transmission details.

Step 3: Build a Coordinator

With workers, we also need someone to assign and collect orders. In distributed systems, this role is called a Coordinator:

struct Coordinator;

#[async_trait::async_trait]
impl Actor for Coordinator {
    type Message = JobResult;

    async fn handle(&mut self, msg: JobResult) {
        println!("Result received from {}: {}", msg.worker_node, msg.result);
        // Here you can aggregate results, update database, etc.
    }
}

The coordinator’s job is simple: receive results, record them, maybe do some aggregation and statistics. Like the backend of a delivery platform, recording the completion status of each order.

Step 4: Cross-Node Task Assignment

Now for the interesting part — how do we send tasks to other nodes?

Assume node-a runs the Coordinator and node-b runs the Worker. On node-a, we send a task like this:

let job = Job {
    id: "job-123".to_string(),
    data: "Organize user data".to_string(),
};

let payload = serde_json::to_string(&job).unwrap();
router.send("worker@node-b", &payload).await;

That simple! You just need to know the other party’s “address” (worker@node-b), and the message automatically gets delivered.

But here’s a key question: how does the result know where to go back?

Step 5: The Secret of Return Addresses

This brings us to the design of our message format:

#[derive(Debug, Serialize, Deserialize)]
pub struct NetworkMessage {
    pub to: String,      // Recipient: "worker@node-b"
    pub from: String,    // Sender: "coordinator@node-a"
    pub payload: String, // Actual content (Job or JobResult)
}

See the from field? That’s the return address.

When the Worker receives a task, it can tell from the from field who sent it, and send results directly back there after processing. Like the sender’s address on a parcel — use it directly for returns.

This design makes bidirectional communication super simple, Workers don’t need to hardcode the Coordinator’s address.

Step 6: Smart Task Assignment

Now let’s upgrade: what if I don’t want to specify which node, but rather “just find any idle person to do the work”?

let peers = cluster.get_all_peers();
let target_node = choose_random(&peers);  // Pick randomly
let addr = format!("worker@{}", target_node);
router.send(&addr, &payload).await;

That’s the most basic load balancing. You can extend this further:

  • Choose the node with lowest CPU usage
  • Choose the node with lowest network latency
  • Round-robin assignment to each node

Just like food delivery dispatch algorithms, intelligently assign based on rider distance, order volume, ratings, etc.

What Can We Do?

At this point, we have a pretty powerful distributed task system. Let’s summarize the capabilities:

FeatureDescription
Job / JobResultStandard format for tasks and results
CoordinatorDispatcher, sends tasks, collects results
WorkerWorker, does work, returns results
from fieldLets results automatically return to sender
send(to, payload)Seamless cross-node message passing

This mechanism can support many real-world scenarios:

  • Video transcoding - Split videos and send chunks to different nodes
  • Data analytics - MapReduce-style big data computation
  • Web crawling - Distribute URLs to multiple crawler nodes
  • Image processing - Batch compression, watermarking, etc.

What Black Magic Can We Add?

Basic functionality is there, but production environments need to consider many edge cases:

Auto retry - Failed tasks automatically resend, max 3 attempts

Timeout mechanism - Track progress with Job.id, consider it failed after 30 seconds of no response

Performance monitoring - Record average task processing time per node

Task categorization - Route by type, e.g., “image processing” always goes to GPU nodes

Map-Shuffle-Reduce - Break big tasks into small ones, aggregate results

These optimizations transform the system from “works” to “production-ready”.

Next Up: Something Even Better

Now you have a task queue, next article we’ll talk about:

Fault Tolerance and Self-Healing - What if nodes crash? Tasks get lost?

Specifically:

  • Build a retry queue
  • Handle nodes suddenly going offline
  • Implement self-healing loops
  • Design supervisor patterns to rescue failed tasks

These are the real hard problems in distributed systems. But don’t worry, I’ll explain it all in the same simple way.

If You Want to Learn Rust In-Depth

This article is part of the Distributed Actor System series. If you’re having trouble keeping up, check out the basics:

Rust Basics Series:

  • Rust #1: Ownership — Explained for People Who’ve Used Variables
  • Rust #2: Functions, Structs & Traits — Cooler Than OOP
  • Rust #3: Enums & Pattern Matching — The Power You Didn’t Know You Needed
  • Rust #4: Error Handling — unwrap() Is Not a Strategy

Rust Async Series:

  • Async Programming Basics - What async/await Really Does
  • How Async Works Under the Hood - Future, poll() and State Machines
  • Building Real Concurrent Apps with tokio
  • Building Real Web Servers with axum

Rust Actor Series:

  • What Is the Actor Model? Build One From Scratch
  • Advanced Actor Messaging - Timeouts, Replies, Broadcasts
  • Actor Lifecycle & State Isolation
  • Supervisors & Restart Mechanisms

Distributed Actor Series (current):

Next article will be the series finale: Fault Tolerance, Retry Queues & Self-Healing Clusters.

Don’t Miss Updates

Follow “Dream Beast Programming” WeChat official account to get notified when new articles are published.

We also have a Rust tech discussion group with many developers actually building projects with Rust, sharing experiences and discussing problems.

Code speaks, theory is just the map. Try it yourself, and you’ll find Rust distributed development isn’t that scary.

See you next time!