It seems that helper.unpackAnnotations(trainCats, ...) only returns annotations of such images that contain all of the provided training categories. In the example of persons and cars, this method would only use annotations of images that contain both, persons and cars. Thus, we might lose relevant training data from person-only or car-only images.
Digging into the CocoAPI, this behavior arises from taking the intersection
for i=1:length(t), ids=intersect(ids,t{i}); end
in CocoAPI.m, as this is called from line 11
imgIds = coco.getImgIds('catIds',catIds);
in unpackAnnotations.m.
I just stumbled over this behavior when adapting the Mask R-CNN example to my own training set. But as I'm not that familiar with the codebase, this might as well be intended behavior.