close

DEV Community

Cover image for Building a Custom Autonomous Drone Stack - Part 2: Defeating the Safety Engine
Harsh Pandhe
Harsh Pandhe

Posted on

Building a Custom Autonomous Drone Stack - Part 2: Defeating the Safety Engine

Stop Using time.sleep(). Start Polling the EKF3 State Machine.

Writing a Python script to tell a drone to "Take Off" is easy.

Getting the drone to actually listen to you is the hard part.

Flight controllers are designed to be extremely paranoid. When you tell ArduPilot to arm the motors in an autonomous mode like GUIDED, it runs an extensive internal safety checklist. If it isn't completely confident in its sensors, it will silently reject your command.


The "Dumb Timer" Trap

When developers first start automating drones, their scripts usually look something like this:

# Tell the drone to arm
master.mav.command_long_send(
    ..., MAV_CMD_COMPONENT_ARM_DISARM, 1
)

time.sleep(3)  # Hope it armed...

# Tell the drone to take off
master.mav.command_long_send(
    ..., MAV_CMD_NAV_TAKEOFF, ...
)
Enter fullscreen mode Exit fullscreen mode

This is the Dumb Timer Trap.

If the Pixhawk rejected the arming command because the optical flow camera couldn't see the floor properly (PreArm: Need Position Estimate), your Python script has no idea.

It waits three seconds, assumes everything worked, and blindly sends a takeoff command while the drone sits motionless on the table.


Active Polling: Reading the Pixhawk's Mind

To build a reliable autonomous stack, your code must actively interrogate the flight controller.

Instead of guessing, we continuously listen for the formal COMMAND_ACK response.

Here's how to properly arm a drone:

master.mav.command_long_send(
    master.target_system,
    master.target_component,
    mavutil.mavlink.MAV_CMD_COMPONENT_ARM_DISARM,
    0,
    1,
    0, 0, 0, 0, 0, 0
)

armed_successfully = False
start_time = time.time()

# Listen for 5 seconds
while time.time() - start_time < 5.0:
    msg = master.recv_match(
        type=['STATUSTEXT', 'COMMAND_ACK'],
        blocking=False
    )

    if msg:

        if msg.get_type() == 'STATUSTEXT':
            print(f"[PIXHAWK STATUS]: {msg.text}")

        elif (
            msg.get_type() == 'COMMAND_ACK'
            and msg.command ==
            mavutil.mavlink.MAV_CMD_COMPONENT_ARM_DISARM
        ):

            if msg.result == 0:
                print("--> Pixhawk accepted arming!")
                armed_successfully = True
                break

            else:
                print(
                    f"--> Pixhawk rejected arming! "
                    f"(Error Code: {msg.result})"
                )
                break

if not armed_successfully:
    print("\n[!!!] Flight aborted.")
    sys.exit(1)
Enter fullscreen mode Exit fullscreen mode

Let the Flight Controller Tell You What's Wrong

By listening to the STATUSTEXT packets, your terminal will display the exact reason why the Pixhawk refuses to fly.

Examples include:

  • PreArm: Need Position Estimate
  • PreArm: Gyros inconsistent
  • PreArm: GPS 1: Bad fix
  • PreArm: Battery below minimum voltage

This one loop can save you hundreds of hours of hardware debugging.


The 32-Bit Overflow Bug

Once we finally got the drone armed, we started sending raw MAVLink thrust commands.

The instant the thrust command executed, Python crashed with:

struct.error:
'I' format requires 0 <= number <= 4294967295
Enter fullscreen mode Exit fullscreen mode

At first glance, this error makes absolutely no sense.

What happened?


Understanding time_boot_ms

Many MAVLink messages require a parameter called:

time_boot_ms
Enter fullscreen mode Exit fullscreen mode

Internally, this field is defined as an unsigned 32-bit integer (uint32_t).

Its maximum value is:

4,294,967,295
Enter fullscreen mode Exit fullscreen mode

Our code looked like this:

int(time.time() * 1000)
Enter fullscreen mode Exit fullscreen mode

Unfortunately, time.time() returns the Unix epoch timestamp.

Today, that's roughly:

1,700,000,000,000 milliseconds
Enter fullscreen mode Exit fullscreen mode

Python then tried to cram a 1.7 trillion millisecond timestamp into a field that can only hold 4.29 billion.

The MAVLink packet serializer immediately overflowed and crashed.


The Fix

The flight controller doesn't care about global Unix time.

It only cares about elapsed time since your script started.

Anchor a start time:

SCRIPT_START_TIME = time.time()
Enter fullscreen mode Exit fullscreen mode

Then calculate the delta:

def get_time_boot_ms():
    return int(
        (time.time() - SCRIPT_START_TIME) * 1000
    )
Enter fullscreen mode Exit fullscreen mode

This guarantees that the value always fits inside a 32-bit integer.


Final Thoughts

The biggest lesson we learned was this:

Never assume the drone did what you asked.

Always wait for confirmation.

Always inspect COMMAND_ACK.

Always monitor STATUSTEXT.

And never use time.sleep() as a substitute for state feedback.

In robotics, assumptions become bugs.

Bugs become crashes.

Crashes become broken propellers.


Up Next

In Part 3, we'll tackle one of the most frustrating problems in indoor flight:

The Zero-Throttle Drop

We'll explore:

  • Why drones suddenly fall after takeoff
  • Why thrust values don't behave linearly
  • How to achieve a stable indoor hover without GPS
  • Raw MAVLink thrust control
  • Tuning for smooth altitude hold

Stay tuned.

Top comments (0)