Learner often suggests action -1 during training. Fails completely when actionsAtState is provided.

The learner will often suggest taking the invalid action `-1`. This happens while all Q-values in a state has the default value. 

This issue even occurs in the SARSA learner example, in the `main` method provided there.

The issue seems to be from bad handling of default values in the `Vec.indexWithMaxValue(Set<Integer> indices)` method. 

If the `*Learner.update` function is provided optional argument `actionsAtState`, it updates to either `NaN` or `-Infinity`, again due to inconsistent default handling of the `indexWithMaxValue` method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Learner often suggests action -1 during training. Fails completely when actionsAtState is provided. #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Learner often suggests action -1 during training. Fails completely when actionsAtState is provided. #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions