-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
The below Function only returns the count and later we use for on this count variable to fetch backup files.
def s3_count_partitions(s3_client,bucket,topic):
"""It will return number of objects in a given s3 bucket and s3 bucket path."""
try:
return s3_client.list_objects_v2(
Bucket=bucket,
Prefix=topic + "/",
Delimiter='/'
)['KeyCount']
except NoCredentialsError as e:
logging.error(e)
exit(1)Edge Case:
Initial Partition Count: 10
Backup script copies partitions : 0,1,4,6,7,8 only
Restore script aka above-mentioned function will return Count: 6
Now when following for loop on for p in range(_pc): will only consider 0,1,2,3,4,5 partitions and few of them won't even
exists in S3 so it will keep failing for them
def s3_download(bucket,topic,tmp_dir,retry_download_seconds=60):
s3_client = boto3.client('s3')
while True:
_pc = Download.s3_count_partitions(s3_client,bucket,topic)
# create temp. topic directory
for p in range(_pc):
os.makedirs(os.path.join(tmp_dir,topic,str(p)),exist_ok=True)
for p in range(_pc):
os.makedirs(os.path.join(tmp_dir,topic,str(p)),exist_ok=True)
for _pt in range(_pc):
_ck = checkpoint.read_checkpoint_partition(tmp_dir,topic,str(_pt))
_partition_path = os.path.join(topic,str(_pt))
_s3_partition_files = Download.s3_list_files(s3_client,bucket,_partition_path)Error
davinderpal@DESKTOP-07TAJVL:~/projects/apache-kafka-backup-and-restore$ python3 restore.py example-jsons/restore-s3.json
{ "@timestamp": "2022-10-17 23:30:34,750","level": "INFO","thread": "Kafka Restore Thread","name": "root","message": "retry for more files in /tmp/davinder.test after 100" }
{ "@timestamp": "2022-10-17 23:30:34,853","level": "INFO","thread": "MainThread","name": "root","message": "Test messeage" }
{ "@timestamp": "2022-10-17 23:30:34,861","level": "INFO","thread": "MainThread","name": "botocore.credentials","message": "Found credentials in environment variables." }
{ "@timestamp": "2022-10-17 23:30:34,909","level": "WARNING","thread": "MainThread","name": "root","message": "[Errno 2] No such file or directory: '/tmp/davinder.test/0/checkpoint'" }
{ "@timestamp": "2022-10-17 23:30:34,931","level": "WARNING","thread": "MainThread","name": "root","message": "[Errno 2] No such file or directory: '/tmp/davinder.test/2/checkpoint'" }
{ "@timestamp": "2022-10-17 23:30:34,938","level": "WARNING","thread": "MainThread","name": "root","message": "[Errno 2] No such file or directory: '/tmp/davinder.test/3/checkpoint'" }
{ "@timestamp": "2022-10-17 23:30:34,946","level": "WARNING","thread": "MainThread","name": "root","message": "[Errno 2] No such file or directory: '/tmp/davinder.test/4/checkpoint'" }
{ "@timestamp": "2022-10-17 23:30:34,985","level": "INFO","thread": "MainThread","name": "root","message": "download success for /tmp/davinder.test/4/20221017-230228.tar.gz and its sha256 file " }
{ "@timestamp": "2022-10-17 23:30:34,985","level": "WARNING","thread": "MainThread","name": "root","message": "[Errno 2] No such file or directory: '/tmp/davinder.test/5/checkpoint'" }
{ "@timestamp": "2022-10-17 23:30:34,993","level": "INFO","thread": "MainThread","name": "root","message": "retry for new file after 100s in s3://kafka-backup/davinder.test" }Potential Solution:
Instead of returning the count of partitions, we can return the actual number of partitons but with regex or split method to extract from
list_objects_v2 method call
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request