Skip to content

TESTING only#8596

Draft
justin-stephenson wants to merge 83 commits intoSSSD:masterfrom
justin-stephenson:failover_justin_port_ldap_and_ad
Draft

TESTING only#8596
justin-stephenson wants to merge 83 commits intoSSSD:masterfrom
justin-stephenson:failover_justin_port_ldap_and_ad

Conversation

@justin-stephenson
Copy link
Copy Markdown
Contributor

No description provided.

justin-stephenson and others added 10 commits April 13, 2026 13:54
…pec file

Add the sssd-minimal provider package to the spec file following the
same pattern as other providers (ldap, ipa, ad, etc.). This packages
the libsss_minimal.so library that was added in recent commits.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
And also disable codeql for the minimal provider. The
provider is for testing only, it does not make sense to
fix any issue there.
This crafts and implements the new failover interface,
it does not provide complete implementation of the failover
mechanism yet. It brings the code to a state were the public
and private interfaces are stable, working and testable so
the following tasks can be split and work on in parallel.

What is missing at this state:
- server configuration and discovery
  (failover_server_group/batch/vtable_op)
- server selection mechanism (sss_failover_vtable_op_server_next)
- kerberos authentication
- sharing servers between IPA/AD LDAP and KDC
- online/offline callbacks (resolve callback should not be needed)

But especially it is possible to start refactoring SSSD code to start
using the new failover implementation.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new failover mechanism for SSSD, replacing the previous connection management and retry logic. It includes the addition of a new failover framework, updates to the AD and LDAP providers to utilize this framework, and significant refactoring of the data provider interface to simplify error handling. A critical bug was identified in the AD provider initialization where the Global Catalog failover context was not being correctly validated.

Comment thread src/providers/ad/ad_init.c Outdated
init_ctx->gc_fctx = sssm_ad_init_failover(init_ctx, be_ctx,
init_ctx->id_ctx->sdap_id_ctx->opts,
"AD_GC", 3268);
if (init_ctx->fctx == NULL) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The condition if (init_ctx->fctx == NULL) is checked twice for the same variable init_ctx->fctx instead of checking init_ctx->gc_fctx for the second initialization. This is a critical bug as it prevents proper error handling for the Global Catalog failover context initialization.

Comment thread src/providers/ad/ad_id.c Fixed
Comment thread src/providers/ad/ad_id.c Fixed
Comment thread src/providers/ad/ad_subdomains.c Fixed
Comment thread src/providers/ldap/sdap_async_initgroups.c Fixed
@justin-stephenson justin-stephenson force-pushed the failover_justin_port_ldap_and_ad branch 8 times, most recently from 1bdc462 to d5a524a Compare April 20, 2026 14:51
@justin-stephenson justin-stephenson force-pushed the failover_justin_port_ldap_and_ad branch 4 times, most recently from 969e9e4 to ccef551 Compare April 24, 2026 14:54
Comment thread src/providers/ad/ad_id.c Fixed
Comment thread src/providers/ad/ad_id.c Fixed
@justin-stephenson justin-stephenson force-pushed the failover_justin_port_ldap_and_ad branch 5 times, most recently from ff6f8cd to a5eaab0 Compare April 30, 2026 13:14
pbrezina and others added 14 commits May 4, 2026 09:53
Reviewed-by: Alexey Tikhonov <atikhono@redhat.com>
Add @pytest.mark.flaky(reruns=3)
using the existing flaky marker

Signed-off-by: Madhuri Upadhye <mupadhye@redhat.com>
Reviewed-by: Alexey Tikhonov <atikhono@redhat.com>
Reviewed-by: Jakub Vávra <jvavra@redhat.com>
Ported following test case:

- kpasswd: BZ 847039: login works when krb5_kpasswd is
  unresolvable (kpasswd not needed for auth).
- high UID: BZ 798655: auth and logs stay clean with a
  setuid(-1) helper process running.
- password change: GH 677: SSH passwd with
  chpass_provider=krb5 logs initial auth in krb5_child.log.

Signed-off-by: Madhuri Upadhye <mupadhye@redhat.com>
Reviewed-by: Alejandro López <allopez@redhat.com>
Reviewed-by: Scott Poore <spoore@redhat.com>
test_kcm__tgt_renewal_updates_ticket_as_configured
no longer uses a fixed sleep(5) and flaky retries.
The test now polls klist every 0.5s and asserts renewal
by checking that TGT start or end time advances,
with a detailed failure message that prints both
initial and last timestamps.

To match KCM renewal behavior (renewal only starts
after roughly half the ticket lifetime),
the test uses a short renewable ticket (-r 5s -l 5s) and
a bounded polling window (9s) so it stays fast while still
waiting long enough for renewal to be attempted.
Also removed the temporary CI comment and the flaky marker from this test.

Assited by: Cursor(Claude Opus 4.6)

Signed-off-by: Madhuri Upadhye <mupadhye@redhat.com>
Reviewed-by: Alexey Tikhonov <atikhono@redhat.com>
Reviewed-by: Scott Poore <spoore@redhat.com>
Reviewed-by: Scott Poore <spoore@redhat.com>
Reviewed-by: Sumit Bose <sbose@redhat.com>
This crafts and implements the new failover interface,
it does not provide complete implementation of the failover
mechanism yet. It brings the code to a state were the public
and private interfaces are stable, working and testable so
the following tasks can be split and work on in parallel.

What is missing at this state:
- server configuration and discovery
  (failover_server_group/batch/vtable_op)
- server selection mechanism (sss_failover_vtable_op_server_next)
- kerberos authentication
- sharing servers between IPA/AD LDAP and KDC
- online/offline callbacks (resolve callback should not be needed)

But especially it is possible to start refactoring SSSD code to start
using the new failover implementation.
Assisted by: Claude code (Sonnet 4.6)
In low level ldap search functions
The data provider handler methods now return a single output
argument. Remove 'dp_error/dp_err' and 'error_message' usage
across provider code.

The getAccountDomain method still needs to return 'domain_name' string.

Assisted by: Claude code (Sonnet 4.6)
This will be done differently inside the failover code
@justin-stephenson justin-stephenson force-pushed the failover_justin_port_ldap_and_ad branch from 4a1f8b6 to f974135 Compare May 4, 2026 13:58
To allow system tests to run in upstream PRCI
** Temporary commit as each provider is ported to new failover **

These provider tests will often fail because they have not yet been ported
to the new failover code, and they call into ldap functions which
are utilizing the new failover. For example crash in :

 #0  0x00007fdb8d72cb41 in sss_failover_transaction_ex_send () from /usr/lib64/sssd/libsss_ldap_common.so
 #1  0x00007fdb8d72ccbd in sss_failover_transaction_send () from /usr/lib64/sssd/libsss_ldap_common.so
 #2  0x00007fdb8d6e367b in groups_by_user_send () from /usr/lib64/sssd/libsss_ldap_common.so
 #3  0x00007fdb8d6e6a88 in sdap_handle_acct_req_send () from /usr/lib64/sssd/libsss_ldap_common.so
 #4  0x00007fdb8d799076 in ipa_id_get_account_info_get_original_step () from /usr/lib64/sssd/libsss_ipa.so
 #5  0x00007fdb8d7a038b in ipa_id_get_account_info_send () from /usr/lib64/sssd/libsss_ipa.so
 #6  0x00007fdb8d7a2560 in ipa_account_info_send () from /usr/lib64/sssd/libsss_ipa.so
 #7  0x00007fdb8d7a2877 in ipa_account_info_handler_send () from /usr/lib64/sssd/libsss_ipa.so
* test_failover tests are expected to fail due to failover interface changes.

* test_logging 'offline' tests and test_autofs__propagate_offline_status_for_multiple_domains
fail because they are asserting offline related log messages which are not in the new
failover code.

* test_ad__user_authentication_when_provider_is_set_to_ldap_with_gss_spnego(samba) test fails because
failover service finds hard-coded LDAP servers when trying to lookup LDAP service, it should re-use
dc.samba.test

* test_multithreaded_pac_client - see https://redhat.atlassian.net/browse/IDM-6014
After switching to new failover code, processing offline gpos function
was no longer reached if backend is offline during AD access checks.
@justin-stephenson justin-stephenson force-pushed the failover_justin_port_ldap_and_ad branch from f974135 to cb6d470 Compare May 4, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.