XEN-3141: Support Elastic #73

AFaust · 2025-11-05T21:09:27Z

This PR provides changes to support ElasticSearch with regards to index health check and fix functionalities. This includes handling to determine whether SOLR or ElasticSearch plugins are actually relevant based on checks on the Search subsystem activation state (checking the sourceBeanName property dynamically).

The ElasticSearch support includes the following capabilities:

checking existence of node index entries
- filtering based on limitations for event generation (stores which do not support behaviours at all - which are necessary for events to work - and node types filtered out by ACS event filtering)
- doing checks for all nodes, including those deleted and not expected in the index (filtered by SOLR plugin setup), since deletion events may have been lost/missed
- checking internal ALIVE state which seems to only be partially used by ACS (i.e. Admin Console only counts ALIVE index entries when reporting indexed nodes)
- checking PATH field if path indexing is expected (issues during indexing based on timing / missed events may cause index entries not to have a PATH field)
purging node index entries that should not exist via an explicit direct delete request to ElasticSearch
triggering a re-index operation via the live indexer app by issueing a "fake" node creation event
- if any out-of-process extension is relying on events, such events may interfere with that extension unless the extension checks and takes into account the actual creation date of the node or does secondary check that it has already processed a previous creation event for the same node
- if issueing "fake" node creation events interferes with out-of-process extensions, it may be possible to set up an ACS instance running health processor which processes events via a separate ActiveMQ instance than is used for "real" events
- additional effort to reverse-engineer the internal logic + events used between the live indexer mediator and other constitutent index components of the indexer app may in the future enhance the functionality of this PR to directly issue indexing events instead of using "fake" node creation events

The following issues have been encountered and identified with the ElasticSearch-based search subsystem in general:

it is not supported to perform any kind of indexing before having an ACS instance with ElasticSearch subsystem enabled perform the initial index setup including model mapping
- indexer apps are technically able to perform index updates, but when ACS is later switch to use the prepared ElasticSearch instance, conflicts in the model mappings will be reported that indicate search behaviour will not be as expected
- this is the reason the integration test setup in this PR for the Elastic plugins need to explicitly delete the index + mappings created during Docker bootstrap to ensure a cleanly initialised empty index
when activating the ElasticSearch search subsystem in ACS takes about 10-20 seconds to perform asynchronous index initialisation / mapping updates
- operations relying on index-based searches (e.g. integration tests) may need to include some additional delay / wait time before they can reliably perform any kind of lookups
- there is no way to check / identify whether the initialisation has completed apart from checking the log file
PATH indexing is extremely brittle, can cause a lot of error messages in the indexer app log, introduce significant delays, and cause nodes not to be found when the ACS Admin Console actually shows all nodes to be indexed
- path information is not part of node events, only parent/ancestor node IDs are included (primaryHierarchy and secondaryParents)
- the path indexing sub-component of the indexer app uses parent node IDs to lookup their index entry to lookup PATH values from there, and use these as prefixes to build the PATH values for the node to be indexed
- if an index entry for any parent cannot be found or does not include a PATH field, an error is thrown and event processing is retries with a bit of a processing delay
- path indexing is processed via a single consumer, so any error + retry with one node holds up indexing other nodes as well
- when bulk index operations are necessary, the order of events may significantly influence the stability of path indexing, and maybe even its correctness
  - ACS sends node events for a transaction in the order the first updates are applied on nodes, so if node A and node B are initially not in a hierarchy, but then B is moved below A, and then A is renamed (all in the same txn), the event for node B is sent before the event for node A ==> it is unclear, but potentially possible, that this may result in an incorrect PATH for node B because the rename of node A may not be reflected in its PATH index values yet when they are used as prefixes to build the new PATH values of node B
- the default ACS outbound event filtering based on node types blocks events from being sent for some types of nodes that may be relevant for the PATH of others
  - the default filter includes the wildcard sys:* type filter - this prevents indexing of any sys:container or sys:store_root nodes, which would prevent the Elastic node index fixer plugin of this PR from fixing PATH issues for any nodes immediately below such nodes and transitively for potentially more (the sys:store_root might actually block ANY PATH indexing unless the official re-indexer app was used to ensure the sys:store_root has been indexed)
  - it is highly advisable to adapt the event filter for node types on any ACS instances to avoid missing PATH index values and errors in the indexer app
  - the official re-indexer app indexes more nodes than would be indexed by only reacting to node events, and with the minimal suggested change to the event node type filter, the Elastic index fixer plugin will cause more nodes to be indexed than are estimated to be indexable in the ACS Admin Console Search Services tool
  - the suggested type filter for events is: sys:deleted,sys:descriptor,sys:reference,sys:lost_found,sys:archiveUser, rule:*, act:*, fm:*, cm:thumbnail, cm:failedThumbnail, cm:rating, rma:rmsite include_subtypes, usr:user
  - the default type filter for events is: sys:*, fm:*, cm:thumbnail, cm:failedThumbnail, cm:rating, rma:rmsite include_subtypes, usr:user
  - the suggested filter adds exclusions for nodes related to folder rules and persisted actions, which will always cause errors during PATH indexing because the node grouping all rules for a folder is of type cm:systemFolder, which is hard-coded to be filtered from event sending (as are all sub-types of the system folder type) - it seems questionable any out-of-process extension would ever be interested to react to any events on rules/actions
  - the suggested filter expands the sys:* wildcard to filter out any types of the system model except for the sys:container and sys:store_root - the former is relevant for PATH indexing of person or group authority nodes, the latter is relevant if the Elastic index fixer plugin of this PR is used to trigger an index build from a completely empty ElasticSearch index
- NOTE: There seem to be functionally breaking changes between ACS 25.1 and 25.2 - in ACS 25.2 the node type event filter seems to be effectively ignored, which is due to the "Sync Service" overriding the default EnterpriseEventGenerator implementation
  - the "Sync Service" implementation is supposed to be virtually identical with additions only where necessary for its functionality, but apparently cause the type of a node to not be loaded yet when the event filter is applied, thus bypassing this particular filter
  - this proved to be a signficant source of confusion when testing this Health Processor PR on both versions of ACS (25.1 in integration tests, 25.2 in a Docker Compose setup external to the project source structure), as the breaking change actually meant ACS 25.2 was successfully running the index fixer plugins even without adapting the node type event filter configuration

… logic - other packages also include plugins, so 'plugins' was too ambiguous a name

…co version configurability - integration test AMP failed to install due to description in module.properties

… (2nd part)

…refactorings

AFaust · 2025-11-05T21:12:07Z

@pvriel Would it be possible to add build secrets to the project so that the integration tests using Enterprise ACS 25.1 may run successfully? Unfortunately, since ACS Community does not support ElasticSearch, it was neccessary to add a sub-project for Enterprise to test this particular PR's feature.

pvriel · 2025-11-18T10:53:11Z

...essor-api/src/main/java/eu/xenit/alfresco/healthprocessor/reporter/api/NodeHealthReport.java

 @AllArgsConstructor(access = AccessLevel.PRIVATE)
 public class NodeHealthReport implements Serializable {

+    private static final long serialVersionUID = -939101152283421058L;


While I don't see a problem with doing this at the moment, I don't understand why we are adding this now?

Not blocking for this PR, though.

This was just fixing a compiler warning. Any serialisable class should have a serial version UID to ensure Java can detect when attempting to de-serialise an object from binary with an incompatible state. I believe when not explicitly set, JVM may dynamically generate a UID. It could be left out, much like other compiler warnings I tried to address so I wasn't cluttered with warnings to a degree I was missing any I might introduce.

pvriel · 2025-11-18T14:25:34Z

...a/eu/xenit/alfresco/healthprocessor/checker/ElasticIndexValidationHealthProcessorPlugin.java

+            };
+            try
+            {
+                ElasticResult searchResult = elasticRequestExecutor.checkNodeIndexed(searchEndpoint, expectedNodeRefStatuses);


Is checkNodeIndexed an expensive operation? I'm wondering whether it makes sense to use a parallel stream (or thread executor) here instead of just using a for-loop.

checkNodeIndexed is quite inexpensive. It is a multi-retrieval-by-ID kind of operation, and the post-processing (matching nodes into state buckets) is done based on node state already loaded into txn cache before checkNodeIndexed is even called (during the endpoint selection).

pvriel · 2025-11-18T14:31:09Z

...a/eu/xenit/alfresco/healthprocessor/checker/ElasticIndexValidationHealthProcessorPlugin.java

+                    entry.getKey().getNodeRef(), messages);
+            healthReport.data(NodeIndexHealthReport.class).addAll(entry.getValue());
+            return healthReport;
+        }).filter(Objects::nonNull).forEach(healthReports::add);


I'm missing something here: how can this stream produce null values?

It no longer can. At some point in the development it might have, and I just forgot to remove the now redundant nonNull.

pvriel · 2025-11-18T14:34:41Z

...java/eu/xenit/alfresco/healthprocessor/checker/SolrIndexValidationHealthProcessorPlugin.java

+    @Override
+    protected Set<NodeHealthReport> doProcess(Set<NodeRef> nodeRefs)
+    {
+        Set<NodeHealthReport> healthReports = new HashSet<>(nodeRefs.size());


The process of finding a search endpoint seems to be similar to the one from ElasticIndexValidationHealthProcessorPlugin. Do you think it's worth exploring some abstraction possibilities here?

An abstraction for the for-loop would be possible and trivial. It felt wrong to move it into the abstract class I had already introduced for the subsystem-specific enablement logic, and I did not feel like it was worth to introduce yet another intermediate class at the time.

pvriel · 2025-11-18T15:32:45Z

...enit/alfresco/healthprocessor/endpoint/elastic/ElasticEndpointSelectorBeanPostProcessor.java

+ */
+@Slf4j
+@AllArgsConstructor
+public class ElasticEndpointSelectorBeanPostProcessor implements BeanDefinitionRegistryPostProcessor


While this is not that difficult to understand, I'm wondering if we can have a more clean solution here instead of dynamically adjusting constructor arguments.
Would it be possible to:

use the endpoint properties as a @ConfigurationProperties bean;

use it as a @ConstructorBinding for the aggregate selector.

Unless I'm misunderstanding this code.

Personally, I would not use constructor arguments at all. I was mostly trying to follow the existing Lombok-heavy patterns in the health processor module. @ConfigurationProperties or @ConstructorBinding are Spring Boot annotations that should not be available in ACS. My trust in Spring annotations beyond trivial @Value is very low, especially in non-trivial systems like ACS. My personal idea of a cleaner solution would be to turn most (or everything) into regular Spring XML property instead of constructor arguments.

pvriel · 2025-11-18T15:34:43Z

...a/eu/xenit/alfresco/healthprocessor/endpoint/solr/SolrEndpointSelectorBeanPostProcessor.java

            return;
        }
+
+        BeanDefinition aggregateSelector = registry.getBeanDefinition("eu.xenit.alfresco.healthprocessor.endpoint.solr." + AggregateSearchEndpointSelector.class.getSimpleName());


Cf. ElasticEndpointSelectorBeanPostProcessor

pvriel · 2025-11-18T15:41:56Z

...rm/src/main/java/eu/xenit/alfresco/healthprocessor/executors/ElasticRequestExecutorImpl.java

+        {
+            if (!first)
+            {
+                sb.append(',');


Just to double-check: why only the first one? Or are you only expecting two Statuses?

Other way around: For all BUT the first one we are adding a comma, so that the JSON is structurally valid. This needs to support an arbitrary number of entries depending on the configured batch size in the processor / indexing strategy.

sonarqubecloud · 2025-11-19T12:49:02Z

Quality Gate failed

Failed conditions
38.4% Coverage on New Code (required ≥ 80%)
D Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

AFaust added 8 commits October 15, 2025 10:13

XEN-3141: Rename plugins package to denote it contains actual 'check'…

66c28ec

… logic - other packages also include plugins, so 'plugins' was too ambiguous a name

XEN-3141: Update versions, fix local integration test, prepare Alfres…

6eddff3

…co version configurability - integration test AMP failed to install due to description in module.properties

XEN-3141: Fix minor compiler warnings

5e0b57a

XEN-3141: Restructure + abstract/generalise code; prepare for Elastic

76da460

XEN-3141: Restructure + abstract/generalise code; prepare for Elastic…

78f5422

… (2nd part)

XEN-3141: Fix minor compiler warnings

8c16fce

XEN-3141: Add Elastic support and finally adapt all configuration to …

904c1ce

…refactorings

XEN-3141: Adapt integration test

b4d1669

AFaust requested a review from pvriel November 5, 2025 21:10

XEN-3141: Add some trace logging for other index states

5b92a4e

pvriel reviewed Nov 18, 2025

View reviewed changes

[XEN-3141] fix the missing secrets for CI/CD

7f88cef

pvriel force-pushed the XEN-3141-Support-Elastic branch from 2bbd8b8 to 7f88cef Compare November 19, 2025 10:01

[XEN-3141] fix broken Sonarcloud setup

162fcdc

pvriel force-pushed the XEN-3141-Support-Elastic branch from 98b9131 to 162fcdc Compare November 19, 2025 10:49

pvriel added 5 commits November 19, 2025 12:04

[XEN-3141] improve readability

c2a15a1

[XEN-3141] fix CI/CD

51bf183

update workflow

c6e00f5

Update ci.yml

894a86f

Update ci.yml

0e43f07

XEN-3141: Support Elastic #73

Are you sure you want to change the base?

XEN-3141: Support Elastic #73

Uh oh!

Conversation

AFaust commented Nov 5, 2025

Uh oh!

AFaust commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 19, 2025

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants