Skip to content

Multiple batches via Census API #192

@Chris-Larkin

Description

@Chris-Larkin

Hi! Just stumbled on your package and love it! Really really cool stuff. I've got a few thoughts on possible improvements (see below), and apologies in advance for not being able to write PRs for these myself. I'm afraid my R dev skills are nowhere near the required level...

Breaking up large address lists into smaller batches

The Census API currently has a limit of 10k per batch. For large address lists, this means the user has to split the data up and feed each segment of addresses with n < 10k into geocoder in some kind of loop or iterative function.

It would be great if tidygeocoder handled this on the fly. censusxy has a fix for this, see lines 177-182 of this script for their solution in a parallelised implementation, and line 201 for a non-parallelised implementation, which should be portable to tidygeocoder.

Also, my understanding from reading censusxy's documentation is that while the Census batch limit is 10k, running smaller batches is actually quicker. So even if a user passes through 10k addresses, it would still be optimal to split it into ~10 batches of ~1k or fewer.

Progress bar for multi-batch implementation

I see you have a progress bar for single-address encoding. Actually, a progress bar is most useful when the geocoding is going to take many hours, i.e. in a multi-batch implementation (as described above). I'm currently using censusxy, which sadly does not have a progress bar of any kind; but if you were to implement multi-batch processing, an indicator of how many batches have been made, and how many have been geocoded would be incredibly helpful for the user. At the moment, i've been running my programme for ~18 hours and have no idea if that's 10% done or 99% done.

Parallelized implementation

Consider letting users implement parallelized geocoding.

Cacheing

I agree with other issues around cacheing. I would have this as an argument, e.g. cache = FALSE, and the output is a local .csv file of geocoded addresses. With large batches (or a large number of batches if you decide to implement multi-batch processing), if the user loses internet connection after X hours of geocoding, they would currently have to start from scratch.

Let me know if you like any of these ideas or you want more info on use cases etc. Thanks so much for writing, developing, and maintaining tidygeocoder! It fills a big gap

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions