-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: replace deadline and connection polling with separate goroutine #25585
Comments
I think this was done @andreimatei right? |
No this was not. We still poll. |
Is this one still actual? |
@Zyqsempai thanks for your interest! Yes, this issue is still available. The current landscape is that we create a cockroach/pkg/sql/pgwire/conn.go Line 237 in 1c7351e
in order to have connections listen for context cancellation when the server is about to shut down (canceled in this method): cockroach/pkg/sql/pgwire/server.go Line 332 in 1c7351e
Read s block, we periodically wake the connection up and have it check its context, which results in higher than necessary CPU usage. The proposal is to call CloseRead on this connection (as a tcp connection) in drainImpl so that this causes the Read to unblock when necessary. The reason a full Close cannot work, is that the server must still send a message to the client as to why the connection was closed (see AdminShutdownErr ). Note that CloseRead only exists on the TCPConn implementation of net.Conn so we'll have to cast net.Conn s as TCPConn s (they can also sometimes be tls.Conn s, so we'll have to get the underlying conn in that case), I think doing this without introducing new error cases will probably be the hardest part.
|
@asubiotto Hi, thanks for explanation. cockroach/pkg/sql/pgwire/conn.go Line 1879 in 1c7351e
we have that Read function, which check the exit conditions, does it make sense to rename the type and change the behavior of that function?
|
The |
@asubiotto Thanks for pointing me, I have strong willing to start contributing to cockroach, so no worries;) |
…sors Previously, we were treating request/response processors as forwarder methods, and the original intention was to perform a connection migration inline with the forwarding. However, this has proved to cause confusions and complications. The new connection migration design uses a different approach by performing the connection migration out-of-band with the forwarding processors. For this to work, we would need to be able to suspend and resume those processors. This commit implements support for that. Additionally, we no longer wrap clientConn with a custom readTimeoutConn wrapper as that seems to be problematic with idle connections. There's a similar discussion here: cockroachdb#25585. Due to that, context cancellations from the parent no longer close the connection if we are blocked on IO. We could theoretically spin up a new goroutine in the forwarder to check for this, and close the connections accordingly, but this case is rare, so we'll let the caller handle this. Release justification: sqlproxy-only change, and only used within CockroachCloud. Release note: None
…sors Previously, we were treating request/response processors as forwarder methods, and the original intention was to perform a connection migration inline with the forwarding. However, this has proved to cause confusions and complications. The new connection migration design uses a different approach by performing the connection migration out-of-band with the forwarding processors. For this to work, we would need to be able to suspend and resume those processors. This commit implements support for that. Additionally, we no longer wrap clientConn with a custom readTimeoutConn wrapper as that seems to be problematic with idle connections. There's a similar discussion here: cockroachdb#25585. Due to that, context cancellations from the parent no longer close the connection if we are blocked on IO. We could theoretically spin up a new goroutine in the forwarder to check for this, and close the connections accordingly, but this case is rare, so we'll let the caller handle this. Release justification: sqlproxy-only change, and only used within CockroachCloud. Release note: None
…sors Previously, we were treating request/response processors as forwarder methods, and the original intention was to perform a connection migration inline with the forwarding. However, this has proved to cause confusions and complications. The new connection migration design uses a different approach by performing the connection migration out-of-band with the forwarding processors. For this to work, we would need to be able to suspend and resume those processors. This commit implements support for that. Additionally, we no longer wrap clientConn with a custom readTimeoutConn wrapper as that seems to be problematic with idle connections. There's a similar discussion here: cockroachdb#25585. Due to that, context cancellations from the parent no longer close the connection if we are blocked on IO. We could theoretically spin up a new goroutine in the forwarder to check for this, and close the connections accordingly, but this case is rare, so we'll let the caller handle this. Release justification: sqlproxy-only change, and only used within CockroachCloud. Release note: None
We have marked this issue as stale because it has been inactive for |
Right now, the early exit condition is:
That means if we switch to using a separate goroutine, then we need to check |
This issue is pretty old, so I believe there are a few out of date comments from the initial thread.
I don't think this is the case anymore. The only time AdminShutdownError is sent is when draining starts. If the context is cancelled, the current code will just give up on reading and start returning errors: cockroach/pkg/sql/pgwire/server.go Lines 865 to 870 in db5d44d
Even so, I guess using CloseRead won't hurt.
I don't think it will be possible to do this in drainImpl, since that function doesn't have a reference to the connection. (Nor should it IMO, since that would mean it needs to hold a reference to all the connections.) So instead, we should do this in a separate goroutine right before calling |
124373: pgwire: remove readTimeoutConn in favor of a channel r=yuzefovich a=rafiss Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received. I wrote a simple benchmark of the idle connection loop to generate CPU profiles. With the old readTimeoutConn: <img width="861" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/9e0c88fc-eda7-4a51-8060-280b0d95a7a7"> With the new goroutine approach: <img width="567" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/535ea962-ec03-46f7-a63b-382c65553553"> There's definitely less noise and overhead. fixes #25585 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
From #25328 (comment):
The reason we use a deadline approach in the first place is because:
However, this can be worked around by half closing the connection. That is, we can close the connection for reading but still write to it. See TCPConn.CloseRead.
Jira issue: CRDB-5704
The text was updated successfully, but these errors were encountered: