Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5

### System Information
- **Spark or PySpark**: 3.3.0
- **SDK Version**: 1.4.5
- **Spark Version**: 3.3.0

### Describe the problem
I just spend 3 days trying to fix this but to no avail. My setup on an AWS notebook instance:
jars:
aws-java-sdk-bundle-1.11.901.jar
aws-java-sdk-core-1.12.262.jar
aws-java-sdk-kms-1.12.262.jar
aws-java-sdk-s3-1.12.262.jar
aws-java-sdk-sagemaker-1.12.262.jar
aws-java-sdk-sagemakerruntime-1.12.262.jar
aws-java-sdk-sts-1.12.262.jar
hadoop-aws-3.3.1.jar
sagemaker-spark_2.12-spark_3.3.0-1.4.5.jar


Problem:

- Upon reading a file from S3 this error is thrown
this is caused by a bug in the httpclient jar dependency of pyspark and is reported here: https://issues.apache.org/jira/browse/HADOOP-18159?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17554677#comment-17554677 

Based on suggested workarounds in the article above I tried 4 things

1. upgrade `aws-java-sdk-bundle` to version 1.12.262 like the other jars → didn’t work
1. downgrade `httpclient` to version 4.5.10 → didn’t work
2. tried to set the `aws-java-sdk` to disable SSL certificate checking (https://github.com/aws/aws-sdk-java-v2/issues/1786 ) → didn’t work with "-Dcom.amazonaws.sdk.disableCertChecking=true"
3. try to read from a bucket that doesn’t contain dots (.) → works


### Minimal repo / logs
`22/08/30 11:00:22 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://comp.data.sci.data.tst/some/folder/export_date=20220822.
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://comp.data.sci.data.tst/some/folder/export_date=20220822: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: Unable to execute HTTP request: Certificate for <comp.data.sci.data.tst.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208)
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277)
        at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
        at scala.Option.getOrElse(Option.scala:189) `

- **Exact command to reproduce**:
Works: 
`df = spark.read.parquet("s3a://aws-bucket-with-dashes/file_0_1_0.snappy.parquet")`
Doesn't work:
`df = spark.read.parquet("s3a://aws.bucket.with.dots/file_0_1_0.snappy.parquet")`

It's not possible to rename the bucket due to the many data consumers that depend on them. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 #149

System Information

Describe the problem

Minimal repo / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong version of aws-java-sdk-bundle in sagemaker-spark 1.4.5 #149

Description

System Information

Describe the problem

Minimal repo / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions