Issue migrating directly from Hive Metastore to Glue Data Catalog

I am trying to migrate my Hive Metastore (rds) to my Glue Catalog.

I configure the job to run as **spark job** with all kind of matching 
- spark 2.4/3.1
- python 2/3
- Glue version 3.0/2.0/1.0/0.9

I followed readme to migrate directly from Hive Metastore to AWS Glue Data Catalog, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:

<html>
<body>


2022-01-27 16:53:53,940 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):  
File "/tmp/import_into_datacatalog.py", line 130, in <module>    main()  
File "/tmp/import_into_datacatalog.py", line 126, in main    region=options.get('region') or 'us-east-1'  
File "/tmp/import_into_datacatalog.py", line 51, in metastore_full_migration    sc, sql_context, db_prefix, table_prefix).transform(hive_metastore)  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 753, in transform    ms_database_params=hive_metastore.ms_database_params)  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 734, in transform_databases    dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 336, in join_with_params    df_params_map = self.transform_params(params_df=df_params, id_col=id_col)  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 314, in transform_params    return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 326, in kv_pair_to_map    id_type = df.get_schema_type(id_col)  
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 199, in get_schema_type    return df.select(column_name).schema.fields[0].dataType  
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1671, in select    
jdf = self._jdf.select(self._jcols(*cols))AttributeError: 'str' object has no attribute '_jdf'

Actually i dunno how to manage this error. Could you give me some helps or suggestion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue migrating directly from Hive Metastore to Glue Data Catalog #112

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue migrating directly from Hive Metastore to Glue Data Catalog #112

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions