Skip to content

File crawling performance #2

@ehmicky

Description

@ehmicky

Crawling the publish directory might be slow for some big sites. There might be a few opportunities of optimizing it:

  • Each readdir already performs a stat syscall, so doing it again in
    const { mtime } = await stat(file);
    might be redundant
  • If no exclude input is specified, there is no need to perform a test() on the filename. Even though the default regular expression a^ should be fast and never match, it might become more expensive when performed thousands of times.
  • Directories part of exclude might not need to crawled

There might also be some potential bugs with the directory crawling. For example, if a file was a symlink to one of its parent directory, would the crawline keep running until memory is exhausted?

I am wondering whether using a tried-and-tested library like readdirp might help fix all of this, and also simplify the code? What are your thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions