Skip to content

Order of variables when DF is fed to fit with CFA: either document or actually use the column names? #146

@petervanwylen

Description

@petervanwylen

As it is currently implemented, it appears that the fit function of confirmatory_factor_analyzer assumes that you'll give it an arraylike object or a dataframe. But it doesn't use the column names in the dataframe instead assuming (with no documentation?) that the order of the columns in the dataframe will be whatever the model_spec.variable_names ends up being. Am I correct in this?

This causes significant problems because the example in the documentation will not work if you're assuming as I did that confirmatory_factor_analyzer.fit() actually uses the column names if fed a data frame? For example with the current example, it only works because the model spec expects the columns to be in the order V1,V2,...,V8 and that dataframe happens to present them in that order.

But you cannot easily guess what parse_model_specification_from_dict will decide the variable_names object should be. So currently in order to make it work I'm having to do the following:

model_spec = ModelSpecificationParser.parse_model_specification_from_dict(df, model_dict)

# this is the key line that seems to be required but isn't documented anywhere
# it reorders the df to match the order of the variable_names that the model parser has decided it would like to receive them in
df = df[model_spec.variable_names]

cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
cfa.fit(df)

Thank you to anyone who can help confirm my thinking. If I'm correct here, then maybe you need to decide to either document this behavior clearly or decide to make the whole CFA functionality column-name aware instead of assuming to naively convert everything it is fed into an array with X.values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions