Skip to content

Issue while running the training pipeline for CamelTrack #28

@yashb042

Description

@yashb042

Hi,

I am trying to run the final training step to generate camel_train.pklz (final weights) which can work on any video.
I am using this command as per the docs -
uv run tracklab -cn cameltrack_train dataset=dancetrack

But getting the error

Error executing job with overrides: ['dataset=dancetrack']
Traceback (most recent call last):
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 48, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 599, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 988, in _run
    self.strategy.setup(self)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/strategies/strategy.py", line 159, in setup
    self.setup_optimizers(trainer)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/strategies/strategy.py", line 139, in setup_optimizers
    self.optimizers, self.lr_scheduler_configs = _init_optimizers_and_lr_schedulers(self.lightning_module)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/core/optimizer.py", line 180, in _init_optimizers_and_lr_schedulers
    optim_conf = call._call_lightning_module_hook(model.trainer, "configure_optimizers", pl_module=model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 176, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/cameltrack/camel.py", line 356, in configure_optimizers
    num_warmup_steps=self.trainer.estimated_stepping_batches // 20,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1707, in estimated_stepping_batches
    self.fit_loop.setup_data()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py", line 275, in setup_data
    iter(self._data_fetcher)  # creates the iterator inside the fetcher
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fetchers.py", line 105, in __iter__
    super().__iter__()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fetchers.py", line 52, in __iter__
    self.iterator = iter(self.combined_loader)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/utilities/combined_loader.py", line 351, in __iter__
    iter(iterator)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/utilities/combined_loader.py", line 92, in __iter__
    super().__iter__()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/utilities/combined_loader.py", line 43, in __iter__
    self.iterators = [iter(iterable) for iterable in self.iterables]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/utilities/combined_loader.py", line 43, in <listcomp>
    self.iterators = [iter(iterable) for iterable in self.iterables]
                      ^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 491, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 422, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1199, in __init__
    self._reset(loader, first_iter=True)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1236, in _reset
    self._try_put_index()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 1486, in _try_put_index
    index = self._next_index()
            ^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 698, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/cameltrack/train/sampler.py", line 53, in __iter__
    yield from batched(self.sample_generator(), self.batch_size)
  File "/efs/notebook/yash/CAMELTrack/cameltrack/train/sampler.py", line 311, in batched
    batch = tuple(islice(it, n))
            ^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/cameltrack/train/sampler.py", line 35, in sample_generator
    key = self.rng.choice(range(len(samplers)), p=probs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "numpy/random/_generator.pyx", line 824, in numpy.random._generator.Generator.choice
ValueError: probabilities contain NaN

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/efs/notebook/yash/CAMELTrack/.venv/bin/tracklab", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/tracklab/main.py", line 47, in main
    module.train(tracking_dataset, pipeline, evaluator, OmegaConf.to_container(cfg.dataset, resolve=True))
  File "/efs/notebook/yash/CAMELTrack/cameltrack/cameltrack.py", line 330, in train
    trainer.fit(self.CAMEL, self.datamodule, ckpt_path=ckpt_path)
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 561, in fit
    call._call_and_handle_interrupt(
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/call.py", line 69, in _call_and_handle_interrupt
    trainer._teardown()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/trainer/trainer.py", line 1039, in _teardown
    loop.teardown()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fit_loop.py", line 502, in teardown
    self._data_fetcher.teardown()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fetchers.py", line 80, in teardown
    self.reset()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fetchers.py", line 142, in reset
    super().reset()
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/loops/fetchers.py", line 76, in reset
    self.length = sized_len(self.combined_loader)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/lightning_fabric/utilities/data.py", line 52, in sized_len
    length = len(dataloader)  # type: ignore [arg-type]
             ^^^^^^^^^^^^^^^
  File "/efs/notebook/yash/CAMELTrack/.venv/lib/python3.11/site-packages/pytorch_lightning/utilities/combined_loader.py", line 358, in __len__
    raise RuntimeError("Please call `iter(combined_loader)` first.")
RuntimeError: Please call `iter(combined_loader)` first.

I have done the setup correctly, have downloaded all 3 test, val and train datasets. As well as in the states directory, all 3 checkpoint files are available.

Here is my dir structure, and cameltrack_train.yaml

Image

Cameltrack_train.yaml

defaults:
  - cameltrack
  - override dataset: dancetrack
  - _self_

pipeline:
  - track

use_wandb: false
wandb:
  mode: disabled

state:
  load_file: "${dataset.dataset_path}/states/dancetrack-${dataset.eval_set}.pklz"
  save_file: null
  load_from_public_dets: true

modules:
  track:
    training_enabled: true
    use_wandb: false
    wandb:
      mode: disabled
    # Generate the training dataset setup: compile tracklets in a pickle file for each split
    datamodule_cfg:
      name: "camel"  # Name for the pickle files containing the training tracklets (will be appended with the split name)
      path: "${dataset.dataset_path}/states/camel_training"  # Where to store those training states
      tracker_states:
        train: "${dataset.dataset_path}/states/dancetrack-train.pklz"  # Update this path to your states
        val: "${dataset.dataset_path}/states/dancetrack-val.pklz"  # Update this path to your states

I am quite close to running my first ever training pipeline, any help is appreciated.
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions