-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: remove readTimeoutConn in favor of a channel #124373
Conversation
ce2e425
to
17e388a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this! The change overall seems good to me.
I'm not too familiar with this code, so I want to double check my understanding. IIUC currently for each connection we have two goroutines:
- the "reader" goroutine that continuously loops in
serveImpl
and see what the client is requesting us to do - and the "processor" goroutine for the connExecutor that executes commands added into
stmtBuf
by the "reader" goroutine.
This PR introduces a third "watchdog" goroutine that monitors the state for cancellation and draining. In other words, we make a trade-off between 1) periodically forcing net.Conn
to wake up in the "reader" due to a read deadline to see whether the "reader" should quit and 2) a separate goroutine that only monitors things with no read deadline set for the "reader". In practice, this "watchdog" goroutine should be idle most of the time, so it should remove some system calls. Does this sound right?
pkg/sql/pgwire/server.go
Outdated
@@ -945,6 +942,35 @@ func (s *Server) serveImpl( | |||
authOpt authOptions, | |||
sessionID clusterunique.ID, | |||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the last part of the comment for serveImpl
needs an update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/sql/pgwire/server.go
Outdated
tcpConn, _ = underConn.(*net.TCPConn) | ||
} | ||
if tcpConn == nil { | ||
_ = c.conn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't close the connection in both directions before this patch. Based on the comment to serveImpl
, it seems that its the connExecutor's job to do so. Why add this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm good point. it seems safer to only close the read side.
this matches the previous behavior, where if the context was cancelled, then the readTimeoutConn would start returning errors when Read
is called:
cockroach/pkg/sql/pgwire/server.go
Line 1039 in ceb19c5
typ, n, err := c.readBuf.ReadTypedMsg(&c.rd) |
that should cause the main loop of the reader goroutine to exit here:
cockroach/pkg/sql/pgwire/server.go
Lines 1216 to 1225 in ceb19c5
// If we can't read data because of any one of the following conditions, | |
// then we should break: | |
// 1. the connection was closed. | |
// 2. the context was canceled (e.g. during authentication). | |
// 3. we reached an arbitrary threshold of repeated errors. | |
if netutil.IsClosedConnection(err) || | |
errors.Is(err, context.Canceled) || | |
repeatedErrorCount > maxRepeatedErrorCount { | |
break | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although, thinking about it more, the risk is that if the type switch no longer works as expected (like for example, there's a new implementation of net.Conn to deal with later), then not having a default case would make the connection hang forever.
maybe it's safer to keep the Close as a fallback? WDYT?
// DrainRequest. This will make the processor quit whenever it finds a | ||
// good time. | ||
_ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{}) | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a comment (that eventually the drain request will be processed by the server and all connections will be closed, which is achieved by cancelling the context) would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a third "watchdog" goroutine that monitors the state for cancellation and draining. In other words, we make a trade-off between 1) periodically forcing net.Conn to wake up in the "reader" due to a read deadline to see whether the "reader" should quit and 2) a separate goroutine that only monitors things with no read deadline set for the "reader". In practice, this "watchdog" goroutine should be idle most of the time, so it should remove some system calls. Does this sound right?
yes sounds right. it's worth noting that a goroutine that does nothing apart from wait on a channel is very efficient in go. the scheduler should never try to run it.
pkg/sql/pgwire/server.go
Outdated
@@ -945,6 +942,35 @@ func (s *Server) serveImpl( | |||
authOpt authOptions, | |||
sessionID clusterunique.ID, | |||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// DrainRequest. This will make the processor quit whenever it finds a | ||
// good time. | ||
_ /* err */ = c.stmtBuf.Push(ctx, sql.DrainRequest{}) | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/sql/pgwire/server.go
Outdated
tcpConn, _ = underConn.(*net.TCPConn) | ||
} | ||
if tcpConn == nil { | ||
_ = c.conn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm good point. it seems safer to only close the read side.
this matches the previous behavior, where if the context was cancelled, then the readTimeoutConn would start returning errors when Read
is called:
cockroach/pkg/sql/pgwire/server.go
Line 1039 in ceb19c5
typ, n, err := c.readBuf.ReadTypedMsg(&c.rd) |
that should cause the main loop of the reader goroutine to exit here:
cockroach/pkg/sql/pgwire/server.go
Lines 1216 to 1225 in ceb19c5
// If we can't read data because of any one of the following conditions, | |
// then we should break: | |
// 1. the connection was closed. | |
// 2. the context was canceled (e.g. during authentication). | |
// 3. we reached an arbitrary threshold of repeated errors. | |
if netutil.IsClosedConnection(err) || | |
errors.Is(err, context.Canceled) || | |
repeatedErrorCount > maxRepeatedErrorCount { | |
break | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/pgwire/server.go
line 968 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
although, thinking about it more, the risk is that if the type switch no longer works as expected (like for example, there's a new implementation of net.Conn to deal with later), then not having a default case would make the connection hang forever.
maybe it's safer to keep the Close as a fallback? WDYT?
Sounds reasonable to me.
I traced where Close
is called currently, and I think it's a few layers up in the defer
in in the async task in netutil.TCPServer.ServeWith
, so the comment (that "serveImpl always closes the network connection before returning") is misleading before your patch but will actually become closer to reality after the patch. It might be worth adjusting the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the updates haven't been pushed yet?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/pgwire/server.go
line 968 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Sounds reasonable to me.
I traced where
Close
is called currently, and I think it's a few layers up in thedefer
in in the async task innetutil.TCPServer.ServeWith
, so the comment (that "serveImpl always closes the network connection before returning") is misleading before your patch but will actually become closer to reality after the patch. It might be worth adjusting the comment.
I was about to push the change, then realized that another fallback option is to call conn.SetReadDeadline(now())
, which might be less disruptive. I'll do that instead.
17e388a
to
d7150f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/pgwire/server.go
line 968 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
I was about to push the change, then realized that another fallback option is to call
conn.SetReadDeadline(now())
, which might be less disruptive. I'll do that instead.
Nice, I like it.
Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received. Release note: None
This benchmark was only useful for capturing a CPU profile, it doesn't measure anything else. Release note: None
d7150f9
to
1351385
Compare
tftr! bors r=yuzefovich |
124373: pgwire: remove readTimeoutConn in favor of a channel r=yuzefovich a=rafiss Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received. I wrote a simple benchmark of the idle connection loop to generate CPU profiles. With the old readTimeoutConn: <img width="861" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/9e0c88fc-eda7-4a51-8060-280b0d95a7a7"> With the new goroutine approach: <img width="567" alt="image" src="https://github.com/cockroachdb/cockroach/assets/1320573/535ea962-ec03-46f7-a63b-382c65553553"> There's definitely less noise and overhead. fixes #25585 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Build failed: |
bors r=yuzefovich |
Rather than using a connection that polls for the context being done every second, we now spin up an additional goroutine that blocks until the connection context is done, or the drain signal was received.
I wrote a simple benchmark of the idle connection loop to generate CPU profiles.
With the old readTimeoutConn:
With the new goroutine approach:
There's definitely less noise and overhead.
fixes #25585
Release note: None