Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions dependencies.rosinstall
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,26 @@
local-name: ros_industrial_cmake_boilerplate
uri: https://github.com/ros-industrial/ros_industrial_cmake_boilerplate.git
version: 0.7.1
- git:
local-name: boost_plugin_loader
uri: https://github.com/tesseract-robotics/boost_plugin_loader.git
version: 0.4.3
- git:
local-name: tesseract
uri: https://github.com/tesseract-robotics/tesseract.git
version: 0.28.0
version: 0.33.0
- git:
local-name: tesseract_planning
uri: https://github.com/tesseract-robotics/tesseract_planning.git
version: 0.28.1
version: 0.33.1
- git:
local-name: trajopt
uri: https://github.com/tesseract-robotics/trajopt.git
version: 0.28.0
version: 0.33.0
- git:
local-name: descartes_light
uri: https://github.com/swri-robotics/descartes_light.git
version: 0.4.5
version: 0.4.9
- git:
local-name: opw_kinematics
uri: https://github.com/Jmeyer1292/opw_kinematics.git
Expand All @@ -29,4 +33,4 @@
- git:
local-name: ruckig
uri: https://github.com/pantor/ruckig.git
version: v0.9.2
version: v0.9.2
150 changes: 150 additions & 0 deletions docs/KINEMATICS_PLUGIN_FACTORY_ISSUE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# KinematicsPluginFactory Global State Issue

## Summary

Creating multiple `KinematicsPluginFactory` instances in the same process causes segmentation faults when the factories load different plugin types (e.g., KDL vs OPW). This issue affects Python bindings but originates in the C++ tesseract library.

## Reproduction

```python
# Test 1: Create KDL-based kinematics (IIWA)
factory1 = tesseract_kinematics.KinematicsPluginFactory(iiwa_plugins_yaml, locator1)
inv_kin1 = factory1.createInvKin("manipulator", "KDLInvKinChainLMA", scene_graph1, state1)
# Works fine

# Test 2: Create OPW-based kinematics (ABB)
factory2 = tesseract_kinematics.KinematicsPluginFactory(abb_plugins_yaml, locator2)
inv_kin2 = factory2.createInvKin("manipulator", "OPWInvKin", scene_graph2, state2)
# SEGFAULT here
```

## Error Messages

```
Warning: Failed to load symbol 'KDLFwdKinChainFactory'
at line 299 in kinematics_plugin_factory.cpp
```

or

```
Warning: Failed to create plugin instance 'OPWInvKinFactory' of type 'tesseract_kinematics::InvKinFactory'
Search Paths (including system folders)
- /path/to/install/lib
Search Libraries:
- libtesseract_kinematics_core_factories.dylib
- libtesseract_kinematics_kdl_factories.dylib
- libtesseract_kinematics_opw_factory.dylib
...
at line 356 in kinematics_plugin_factory.cpp
```

## Root Cause Analysis

The `KinematicsPluginFactory` uses the `tesseract_common` plugin loading infrastructure which appears to use global state:

1. First factory loads `tesseract_kinematics_kdl_factories` library
2. Plugin symbols are registered in a global registry
3. First factory is destroyed (Python GC, test teardown)
4. Second factory tries to load `tesseract_kinematics_opw_factories`
5. Global registry is in inconsistent state
6. Symbol lookup fails or accesses freed memory → segfault

The issue is in `tesseract_kinematics/core/src/kinematics_plugin_factory.cpp` around line 299-356.

## Evidence

- Tests pass when run **individually** in separate processes
- Tests fail when run **together** in the same Python process
- Using pytest `--forked` option causes `dlopen` handle issues (plugin libraries not found in forked child)
- The error occurs at the boundary between different plugin library loads

## Affected Code

- `tesseract_kinematics/core/src/kinematics_plugin_factory.cpp`
- `tesseract_common` plugin loading infrastructure
- Any code that creates multiple `KinematicsPluginFactory` instances with different plugin configs

## Potential Solutions

### Option 1: Per-Factory Plugin State (Recommended)

Make the plugin registry per-factory instead of global. Each factory maintains its own map of loaded plugins and dlopen handles.

```cpp
class KinematicsPluginFactory {
private:
std::map<std::string, PluginHandle> loaded_plugins_; // Per-instance
// Instead of global:
// static std::map<std::string, PluginHandle> loaded_plugins_;
};
```

### Option 2: Reference-Counted Plugin Handles

Keep track of how many factories reference each loaded plugin library. Only unload when reference count drops to zero.

### Option 3: Never Unload Plugin Libraries

Once loaded, keep plugin libraries loaded for process lifetime. Simplest fix but uses more memory.

### Option 4: Clear Plugin Registry API

Expose an API to explicitly clear/reset the plugin registry:

```cpp
// New API
void KinematicsPluginFactory::clearGlobalRegistry();
```

Python users would call this between tests or when switching between different robot configurations.

## Current Workaround

Run tests in separate process invocations:

```bash
# Instead of:
pytest tests/

# Use:
pytest tests/test_kdl.py
pytest tests/test_opw.py # Separate process
```

Or use pytest's `subprocess` isolation for affected tests.

## Impact

This issue affects:
- Python bindings (segfaults)
- C++ applications creating multiple factories
- Unit testing any code using KinematicsPluginFactory
- Applications switching between robot configurations at runtime

## Related Files

- `tesseract_kinematics/core/src/kinematics_plugin_factory.cpp`
- `tesseract_common/include/tesseract_common/plugin_loader.h`
- `tesseract_common/src/plugin_loader.cpp`

## Test Case for C++

```cpp
#include <tesseract_kinematics/core/kinematics_plugin_factory.h>

TEST(KinematicsPluginFactory, MultipleFactories) {
// Create first factory with KDL plugins
auto factory1 = std::make_shared<KinematicsPluginFactory>(iiwa_config, locator);
auto kin1 = factory1->createInvKin(...);
ASSERT_NE(kin1, nullptr);

// Destroy first factory
factory1.reset();

// Create second factory with OPW plugins
auto factory2 = std::make_shared<KinematicsPluginFactory>(abb_config, locator);
auto kin2 = factory2->createInvKin(...); // Should not crash
ASSERT_NE(kin2, nullptr);
}
```
180 changes: 180 additions & 0 deletions docs/NANOBIND_ISSUES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# nanobind Binding Issues for tesseract_python

Report on issues encountered porting tesseract_pybind to nanobind.

## 1. Reference Leaks in Poly Types

**Severity**: High - Causes warnings at interpreter exit, potential memory corruption

**Symptoms**:
```
nanobind: leaked 48 types!
- leaked type "WaypointPoly"
- leaked type "InstructionPoly"
...
nanobind: leaked 156 functions!
- leaked function "setManipulatorInfo"
- leaked function "getWaypoint"
...
```

**Root Cause**: Type-erased `*Poly` wrappers store function pointers in global registries. These functions implicitly depend on module namespace, creating reference cycles that Python's GC cannot break.

**Affected Types**:
- `WaypointPoly`, `CartesianWaypointPoly`, `JointWaypointPoly`, `StateWaypointPoly`
- `InstructionPoly`, `MoveInstructionPoly`
- `CompositeInstruction` (iterates over InstructionPoly)

**Fix Required**: Implement `tp_traverse` and `tp_clear` for cyclic GC support:

```cpp
// Example for WaypointPoly
int waypointpoly_traverse(PyObject* self, visitproc visit, void* arg) {
// Visit all Python object references held by this type
return 0;
}

int waypointpoly_clear(PyObject* self) {
// Clear member variables that form cycles
return 0;
}

nb::class_<tp::WaypointPoly>(m, "WaypointPoly",
nb::type_slots({
{Py_tp_traverse, (void*)waypointpoly_traverse},
{Py_tp_clear, (void*)waypointpoly_clear}
}))
...
```

**Reference**: https://nanobind.readthedocs.io/en/latest/refleaks.html

---

## 2. TaskComposer Async Execution Crash

**Severity**: Critical - Segfaults during planning

**Symptoms**: Crash at `future.wait()` after `executor.run(task, task_data)`

**Context**:
- `TaskComposerExecutor.run()` returns `std::unique_ptr<TaskComposerFuture>` (abstract base)
- Actual returned type is `TaskflowTaskComposerFuture` (derived class)
- nanobind needs both classes bound with proper inheritance for polymorphism

**Current Workaround**: Tests marked `@pytest.mark.skip`

**Attempted Fixes**:
1. Added `TaskflowTaskComposerFuture` binding with inheritance
2. Added `gil_scoped_release` during `wait()`
3. Both base and derived classes re-expose methods

**pybind11 Solution** (from tesseract_pybind):
- Uses trampoline class `PyTaskComposerFuture` for abstract base
- Explicit `std::unique_ptr<T>` holder type in class binding
- Both base and derived need the unique_ptr holder

**nanobind Equivalent Required**:
```cpp
struct PyTaskComposerFuture : tp::TaskComposerFuture {
NB_TRAMPOLINE(tp::TaskComposerFuture, 7);
void clear() override { NB_OVERRIDE_PURE(clear); }
bool valid() const override { NB_OVERRIDE_PURE(valid); }
bool ready() const override { NB_OVERRIDE_PURE(ready); }
void wait() const override { NB_OVERRIDE_PURE(wait); }
...
};

nb::class_<tp::TaskComposerFuture, PyTaskComposerFuture>(m, "TaskComposerFuture")
...
```

---

## 3. OPW Kinematics Plugin Loading Under pytest

**Severity**: Medium - Tests fail when using OPW-based robots

**Symptoms**:
```
Failed to create plugin instance 'OPWInvKinFactory'
Warning: This process is multi-threaded, use of fork() may lead to deadlocks
```

**Context**:
- OPW plugin loads fine in standalone Python scripts
- Fails when run under pytest due to fork/threading issues
- `boost::plugin_loader` uses thread-local caches that corrupt during fork

**Workaround Applied**: Use IIWA robot (KDL kinematics) instead of ABB IRB2400 (OPW) in tests

**Root Cause**: pytest's forking combined with boost::dll plugin loader's global state causes corruption. The plugin factory maintains thread-local caches that don't survive fork correctly.

**Long-term Fix Options**:
1. Run tests with `--forked` to isolate each test
2. Use threading-safe initialization in plugin loaders
3. Disable fork in pytest configuration

---

## 4. FilesystemPath Type Mismatch

**Severity**: Low - Fixed

**Symptoms**:
```
TypeError: __init__(): incompatible function arguments
```

**Root Cause**: SWIG compatibility wrapper defined FilesystemPath as Python `str` subclass. nanobind's strict type checking rejected this.

**Fix Applied**: Removed str subclass wrapper, use C++ binding directly:
```python
# Before (SWIG compatibility)
class FilesystemPath(str):
def __new__(cls, s): return str.__new__(cls, str(s))

# After (nanobind)
from tesseract_robotics.tesseract_common._tesseract_common import FilesystemPath
```

---

## 5. Cross-Module Type Inheritance

**Severity**: Low - Workaround applied

**Issue**: Types bound in different modules cannot directly inherit. For example, `TaskComposerPluginFactory` constructor takes `ResourceLocator` from tesseract_common module.

**Workaround**: Manual type checking with `nb::handle`:
```cpp
m.def("__init__", [](Factory* self, const std::string& config, nb::handle locator_handle) {
auto common_module = nb::module_::import_("tesseract_robotics.tesseract_common._tesseract_common");
auto grl_type = common_module.attr("GeneralResourceLocator");
if (!nb::isinstance(locator_handle, grl_type)) {
throw nb::type_error("locator must be a GeneralResourceLocator");
}
auto* locator = nb::cast<tc::GeneralResourceLocator*>(locator_handle);
new (self) Factory(config, *locator);
});
```

---

## Test Status Summary

| Module | Tests | Status |
|--------|-------|--------|
| tesseract_common | 20+ | PASS |
| tesseract_collision | 10+ | PASS |
| tesseract_geometry | 10+ | PASS |
| tesseract_environment | 15+ | PASS |
| tesseract_kinematics | 10+ | PASS (KDL only) |
| tesseract_command_language | 20+ | PASS |
| tesseract_task_composer | 2 | SKIP (async crash) |
| planning API | 35 | PASS (1 skip) |

**Known Limitations**:
- OPW kinematics tests require `--forked` or manual execution
- TaskComposer pipeline tests crash (need trampoline fix)
- Leaked types/functions warnings at exit (cosmetic, not functional)
Loading
Loading