AGTMETRICS-485 Fix logic races in the forwarder.#46860
Draft
AGTMETRICS-485 Fix logic races in the forwarder.#46860
Conversation
sendHTTPTransactions reads domainForwarders field, which is replaced with an empty map at the end of Stop. Atomic read of internalState is not sufficent to prevent that, Stop may occur between the check and the read from domainForwarders. Proceeding to send transaction then results in a nil panic.
f.m is now locked for the duration of sendHTTPTransactions and will prevent concurrent access to queueDurationCapacity field.
Forwarder state transitions are now fully synchronized by f.m, which makes the use of atomic for internalState field unnecessary.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
sendHTTPTransactions was insufficiently synchronized and could panic if transactions were submitted while the forwarder was beging stopped. It could access domainForwarders after it has been cleared, causing a nil-dereference panic, or send transaction to a domain forwarder whose channel was closed, causing a 'send on closed channel' panic.
This PR makes sendHTTPTransactions acquire the same mutex used in
StartandStop, preventing it from accessing stale data. This is not expected to increase contention, sincesendHTTPTransactionsalso acquired another exclusive mutex few lines later, which is now redundant and was removed. Internal state was downgraded from atomic to a regular field, since all accesses are now synchronized.Motivation
Fix a rare crash on shutdown in Otel agent.
Describe how you validated your changes
Unit and e2e tests.
Additional Notes