-
Notifications
You must be signed in to change notification settings - Fork 832
Description
I am trying to migrate my Hive Metastore (rds) to my Glue Catalog.
I configure the job to run as spark job with all kind of matching
- spark 2.4/3.1
- python 2/3
- Glue version 3.0/2.0/1.0/0.9
I followed readme to migrate directly from Hive Metastore to AWS Glue Data Catalog, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2022-01-27 16:53:53,940 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/import_into_datacatalog.py", line 130, in main()
File "/tmp/import_into_datacatalog.py", line 126, in main region=options.get('region') or 'us-east-1'
File "/tmp/import_into_datacatalog.py", line 51, in metastore_full_migration sc, sql_context, db_prefix, table_prefix).transform(hive_metastore)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 753, in transform ms_database_params=hive_metastore.ms_database_params)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 734, in transform_databases dbs_with_params = self.join_with_params(df=ms_dbs, df_params=ms_database_params, id_col='DB_ID')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 336, in join_with_params df_params_map = self.transform_params(params_df=df_params, id_col=id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 314, in transform_params return self.kv_pair_to_map(params_df, id_col, key, value, 'parameters')
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 326, in kv_pair_to_map id_type = df.get_schema_type(id_col)
File "/tmp/localPyFiles-0b1af0c4-b70f-4147-a11b-965a99faeb92/hive_metastore_migration.py", line 199, in get_schema_type return df.select(column_name).schema.fields[0].dataType
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 1671, in select
jdf = self._jdf.select(self._jcols(*cols))AttributeError: 'str' object has no attribute '_jdf'
Actually i dunno how to manage this error. Could you give me some helps or suggestion?