Skip to content

Duplicate entries for security_groups_spaces are possible in CCDB #4602

@klapkov

Description

@klapkov

In the Cloud Controller DB, there are two tables that store running and staging security groups mappings to spaces. If we take a look at both of them:

                             Table "public.security_groups_spaces"
          Column           |  Type   | Collation | Nullable |             Default
---------------------------+---------+-----------+----------+----------------------------------
 security_group_id         | integer |           | not null |
 space_id                  | integer |           | not null |
 security_groups_spaces_pk | integer |           | not null | generated by default as identity
Indexes:
    "security_groups_spaces_pkey" PRIMARY KEY, btree (security_groups_spaces_pk)
    "security_groups_spaces_space_id_index" btree (space_id)
    "sgs_spaces_ids" btree (security_group_id, space_id)
Foreign-key constraints:
    "fk_security_group_id" FOREIGN KEY (security_group_id) REFERENCES security_groups(id)
    "fk_space_id" FOREIGN KEY (space_id) REFERENCES spaces(id)
                                                       Table "public.staging_security_groups_spaces"
              Column               |  Type   | Collation | Nullable |             Default              | Storage | Compression | Stats target | Description
-----------------------------------+---------+-----------+----------+----------------------------------+---------+-------------+--------------+-------------
 staging_security_group_id         | integer |           | not null |                                  | plain   |             |              |
 staging_space_id                  | integer |           | not null |                                  | plain   |             |              |
 staging_security_groups_spaces_pk | integer |           | not null | generated by default as identity | plain   |             |              |
Indexes:
    "staging_security_groups_spaces_pkey" PRIMARY KEY, btree (staging_security_groups_spaces_pk)
    "staging_security_groups_spaces_ids" UNIQUE, btree (staging_security_group_id, staging_space_id)
Foreign-key constraints:
    "fk_staging_security_group_id" FOREIGN KEY (staging_security_group_id) REFERENCES security_groups(id)
    "fk_staging_space_id" FOREIGN KEY (staging_space_id) REFERENCES spaces(id)
Access method: heap

We see that while the staging_security_groups_spaces table has the

"staging_security_groups_spaces_ids" UNIQUE, btree (staging_security_group_id* staging_space_id) 

the security_groups_spaces table does not. This means that it allows storing duplicate entries that have the same security_group_id and space_id.

This was not causing problems until CF-Networking-Release v3.79.0. This release brings some optimizations on vxlan-policy-agent and security groups in general, but the important change to this topic is this PR. Here we migrate the data in the running_spaces and staging_spaces from the security_groups table in the networkpolicyDB and create two separate tables:

                                           Table "public.running_security_groups_spaces"
       Column        |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 space_guid          | character varying(36) |           | not null |         | extended |             |              |
 security_group_guid | character varying(36) |           | not null |         | extended |             |              |
Indexes:
    "running_sg_spaces_pk" PRIMARY KEY, btree (space_guid, security_group_guid)
Foreign-key constraints:
    "running_sg_spaces_fk" FOREIGN KEY (security_group_guid) REFERENCES security_groups(guid) ON DELETE CASCADE
Access method: heap
                                           Table "public.staging_security_groups_spaces"
       Column        |         Type          | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 space_guid          | character varying(36) |           | not null |         | extended |             |              |
 security_group_guid | character varying(36) |           | not null |         | extended |             |              |
Indexes:
    "staging_sg_spaces_pk" PRIMARY KEY, btree (space_guid, security_group_guid)
Foreign-key constraints:
    "staging_sg_spaces_fk" FOREIGN KEY (security_group_guid) REFERENCES security_groups(guid) ON DELETE CASCADE
Access method: heap

Before this change, if there were duplicates in the CCDB, they would be synced to the networkpolicyDB and everything was fine. But these new tables do not allow duplicates. This causes the asg-syncer to crash with:

{"level":"error","source":"cfnetworking.policy-server-asg-syncer","message":"cfnetworking.policy-server-asg-syncer.exited-with-failure","data":{"error":"Exit trace for group:\nasg-syncer exited with error: replacing running space associations: creating new associations: executing batched statement: pq: duplicate key value violates unique constraint \"running_sg_spaces_pk\"\nasg-lock exited with nil\n"}}

Once this happens, the networkpolicyDB does not get any new updates to security groups since the syncer is crashing and cannot update the rules. To fix it, the duplicate entry needs to be deleted from CCDB, so that the asg-syncer can once again start to work and update the rules in networkpolicyDB.

So we would like to propose that the same database UNIQUE rule is added to the security_groups_spaces table in CCDB, so such duplicates are not possible in the first place.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions