Expose metaflow logger and monitor via singleton #1794

talsperre · 2024-04-08T01:08:48Z

Expose logger and monitor via a `system_current` singleton

Currently, users don't have the ability to use their implementation of logger and monitor sidecars in their own code. Additionally, if platform developers want to instrument their Metaflow extensions, they have to pass in the logger/monitor constructs to each one of their plugins, leading to code duplication.

This PR exposes two new singleton objects called _system_logger and _system_monitor that can be used to access the monitor and the logger anywhere.

The monitor/logger can be accessed both within and outside a Metaflow flow. This allows us to instrument Metaflow plugins like metaflow.S3, which is often used outside of a flow as well.

Usage

The monitor/logger sidecar can be used in the following manner:

with _system_monitor.count("<your_metric_name>"): 
      # your code
      pass

_system_logger.log(payload):

romain-intel

Some initial comments.

metaflow/task.py

romain-intel · 2024-04-11T16:49:36Z

metaflow/task.py

+                pass
+            system_current.logger.log(
+                {
+                    "log_type": "ERROR",


could we have constants for these?

romain-intel · 2024-04-11T16:51:32Z

metaflow/metaflow_metrics_manager.py

+
+    @contextmanager
+    def measure(self, metric_name, qualifer_name=None):
+        timer, counter = self.monitor.get_measure_metrics(metric_name)


could we ideally make this look symetric for both?

we can also just return a single "payload". Internally, the send_metric would know to stop the timer for example.

romain-intel

A few initial comments but I think this goes in the direction we agreed to so we shouldn't be too far.

romain-intel · 2024-04-30T06:45:40Z

metaflow/__init__.py

@@ -100,6 +100,12 @@ class and related decorators.
 # current runtime singleton
 from .metaflow_current import current

+# system monitor runtime singleton


I would add a comment like (for internal metaflow use only). It's kind of explicit from the name but we can be extra explicit :)

romain-intel · 2024-04-30T06:46:59Z

metaflow/cli.py

@@ -1066,6 +1068,10 @@ def start(
    if decospecs:
        decorators._attach_decorators(ctx.obj.flow, decospecs)

+    # We create an instance of SystemMonitor and SystemLogger respectively


nit, I would move this up to where ctx.obj.monitor and event_logger are created.

romain-intel · 2024-04-30T06:50:14Z

metaflow/metaflow_system_logger.py

+
+    @property
+    def flow_name(self):
+        return self._flow_name


assumes that flow_name will not be called prior to flow which may not be correct. In general, I would have one init method that inits all 4 (flow, flow_name, environment and logger) and controlled by one single flag.

This comment is still not addressed I think.

romain-intel · 2024-04-30T06:51:09Z

metaflow/metaflow_system_logger.py

+        return self._logger
+
+    @logger.setter
+    def logger(self, logger):


we don't really want this to be settable independently do we? can we just keep set_logger and have things set directly on the _ names? Or is there a use you are thinking of of setting things independently?

romain-intel · 2024-04-30T06:51:29Z

metaflow/metaflow_system_logger.py

+from typing import Dict, Any
+
+
+class SystemLogger(object):


nit: please add type hints where appropriate.

romain-intel · 2024-04-30T06:52:20Z

metaflow/metaflow_system_monitor.py

+from contextlib import contextmanager
+
+
+class SystemMonitor(object):


same comments as for system_logger

romain-intel · 2024-04-30T06:55:08Z

metaflow/task.py

-                "ts": round(time.time()),
-            }
-            logger.log(msg)
+            with _system_monitor.count("metaflow.task.start"):


I would keep things the same here -- ie: use self.event_logger and self.logger

romain-intel · 2024-04-30T06:55:17Z

metaflow/task.py

-                "flow_name": self.flow.name,
-            }
-            logger.log(tsk_msg)
+            print("I am here in metaflow")


nit: remove.

romain-intel · 2024-04-30T06:56:00Z

metaflow/task.py

-                "runtime": round(end),
-            }
-            logger.log(msg)
+            _system_monitor.gauge("metaflow.task.duration", duration)


A measure would be more appropriate here than a gauge.

Question is should we be measuring the task duration only until the task "ends" or consider the portion dealing with metadata register as part of the task duration as well?

If it's the former then it would be difficult to implement using the existing context manager construct - hence why we use the gauge metric.

That's a good point so we may need to add non-context specific ones for this. The issue is that a gauge isn't the same as a measure. Gauge are supposed to measure levels of things (like # of machines in a pool, etc) not disparate point measurements like this.

savingoyal · 2024-05-09T15:52:50Z

metaflow/metaflow_system_logger.py

+from typing import Dict, Any, Optional, Union
+
+
+class SystemLogger(object):


Consider moving this and monitor under metaflow/system so that we can have a rather clean import - from metaflow.system import monitor rather than from .metaflow_system_monitor import _system_monitor or from metaflow import _system_monitor

savingoyal · 2024-05-09T15:54:40Z

metaflow/metaflow_system_logger.py

+
+class SystemLogger(object):
+    def __init__(self):
+        self._flow = None


why both _flow and _flow_name?

savingoyal · 2024-05-09T15:55:37Z

metaflow/metaflow_system_logger.py

+from typing import Dict, Any, Optional, Union
+
+
+class SystemLogger(object):


It seems that the implementation of this file depends on its usage outside of the flow context. Can you add comments in the code with the use cases so that it is easier to review and maintain going forward?

Do we want it as part of the code or in a separate README?

metaflow/metaflow_system_logger.py

savingoyal · 2024-05-09T15:59:50Z

metaflow/metaflow_system_logger.py

+            self._logger = LOGGING_SIDECARS[DEFAULT_EVENT_LOGGER](
+                flow=self.flow, env=self.environment
+            )
+            self._debug("Started logger outside of a flow")


Would it be cleaner if self._logger is started when logger is constructed?

This would be the case when we access logger outside of a flow. Otherwise it is started in cli.py itself.

savingoyal · 2024-05-09T16:08:15Z

metaflow/task.py

+            "project_flow_name": current.get("project_flow_name"),
+            "trace_id": trace_id or None,
+        }
+        self.event_logger.send(


why use send instead of log? iirc send was a workaround for a very specific use case (I need to dig through the slack conversations) and we would want to avoid introducing it too much in the code base.

send is used to update the event_logger context so that we have all these values available as additional tags in the event stream. We thus log this event with message type set to MUST_SEND instead of BEST_EFFORT that is normally used in log.

Yes, the intent is to set a "common set of tags" (in the lingo of the other measure thing (datadog)). I think it may make sense to actually have a specific method self.event_logger.set_common_tags() or something like that. Backends would be free to do whatever they want with that. Internally, we could then use send to send a MUST_SEND message or just keep it around (the MUST_SEND was to avoid overloading the pipe to the sidecar with all the context all the time).

I wonder if you did a simple self.event_logger.log(payload) where the payload had the message and the context (common tags, etc.), and then in your internal implementation of the logger, you could make multiple calls with MUST_SEND or BEST_EFFORT as intended. Currently, it seems that this bit of logic in task.py is leaking implementation detail on how the logger backend works.

Maybe an in-person conversation might be quicker here - it is likely that we can preserve the same signature for self.event_logger.log(msg) as before and move the complexity of handling what needs to be sent as MUST_SEND or BEST_EFFORT to your internal sidecar implementation. That way, these changes in task.py are entirely decoupled from any changes in the sidecar implementation.

Yes, lets discuss this in the call. What you said should be possible - depending on the presence of certain specific key/value pair in the payload we can use MUST_SEND or BEST_EFFORT appropriately. My only concern would be that if someone reads the code in cli.py then it would be confusing for them as to why we use different message types for different payloads.

savingoyal · 2024-05-09T16:10:33Z

metaflow/task.py

                    # initialize parameters (if they exist)
                    # We take Parameter values from the first input,
                    # which is always safe since parameters are read-only
                    current._update_env(
                        {
                            "parameter_names": self._init_parameters(
-                                inputs[0], passdown=False
+                                inputs[0], passdown=True


can you highlight the rationale for this change?

Seems to be an issue with the diff. I didn't change it and it's still False as we can see on line number 601.

savingoyal · 2024-05-09T16:12:34Z

metaflow/task.py

+                            "traceback": traceback.format_exc(),
+                        }
+                    )
+                    pass


why is this needed?

Replace it with something like event_logger.log(msg="", type="").

savingoyal · 2024-05-09T16:14:22Z

metaflow/task.py

+                            "event_value": 1,
+                        }
+                    )
+                    pass


same here - why is a pass needed? is it to signify end of an indent block or something more? if the former, we can avoid introducing pass to maintain consistency of style in the code base

Will remove in updated PR

savingoyal · 2024-05-09T16:15:06Z

metaflow/task.py

+        with self.monitor.measure("metaflow.task.duration"):
+            try:
+                with self.monitor.count("metaflow.task.start"):
+                    self.event_logger.log(


why is this log necessary?

We want the event to be available in the log stream in addition to our metrics.

romain-intel

Some comments. Let's discuss tomorrow and finalize.

romain-intel · 2024-05-22T00:20:38Z

metaflow/event_logger.py

+
+        Parameters
+        ----------
+        msg : str


nit: Optional if they can be None (same for all others.

Also add type hints in the signature.

romain-intel · 2024-06-06T07:05:28Z

metaflow/system/metaflow_system_logger.py

+        if self._flow_name == "not_a_real_flow":
+            self.logger.terminate()
+
+    def init_environment_outside_flow(


missing return type?

romain-intel · 2024-06-06T07:07:33Z

metaflow/system/metaflow_system_logger.py

+        -------
+        None
+        """
+        print("system logger: %s" % msg, file=sys.stderr)


nit: maybe squelch unless debug is turned on?

romain-intel · 2024-06-06T07:07:54Z

metaflow/system/metaflow_system_logger.py

+
+        Parameters
+        ----------
+        msg : str, optional default None


nit: optional, default None

romain-intel · 2024-06-06T07:09:14Z

metaflow/system/metaflow_system_monitor.py

+        if self._flow_name == "not_a_real_flow":
+            self.monitor.terminate()
+
+    def init_environment_outside_flow(


this method does not depend on self. I would move it to a util or something since it is common across both logger and monitor.

romain-intel · 2024-06-06T07:10:03Z

metaflow/system/metaflow_system_monitor.py

+        self.monitor.gauge(gauge)
+
+
+_system_monitor = SystemMonitor()


Remove line -- no longer used since in init.py

romain-intel · 2024-06-06T07:17:45Z

metaflow/task.py

+            task_id,
+        )
+        with _system_monitor.count("metaflow.task.clone"):
+            self.event_logger.log_event(


nit: _system_logger to be consistent.

romain-intel · 2024-06-06T07:20:00Z

metaflow/system/metaflow_system_logger.py

+        Parameters
+        ----------
+        msg : str, optional default None
+            Message to log.


What does it mean to log a "none" message, just empty? Maybe add a comment to that effect.

romain-intel · 2024-06-06T07:28:10Z

metaflow/task.py

+            "project_flow_name": current.get("project_flow_name"),
+            "trace_id": trace_id or None,
+        }
+        self.event_logger.send(


Yes, the intent is to set a "common set of tags" (in the lingo of the other measure thing (datadog)). I think it may make sense to actually have a specific method self.event_logger.set_common_tags() or something like that. Backends would be free to do whatever they want with that. Internally, we could then use send to send a MUST_SEND message or just keep it around (the MUST_SEND was to avoid overloading the pipe to the sidecar with all the context all the time).

savingoyal · 2024-06-06T15:24:47Z

metaflow/event_logger.py

@@ -24,6 +24,31 @@ def log(self, payload):
            msg = Message(MessageTypes.BEST_EFFORT, payload)
            self._sidecar.send(msg)

+    def log_event(self, msg=None, event_name=None, log_stream=None, other_context=None):


do we need this method? can we not reuse def log(self, payload) where the payload contains the msg, event_name, log_stream and other_context?

We can reuse the method log(self, payload). But we had discussed previously that it would be better to expose a method like log_event with explicit parameters so that it is clear in any logger implementation that it needs to use/ignore the fields msg, event_name, and log_stream.

savingoyal · 2024-06-06T15:27:42Z

metaflow/system/metaflow_system_logger.py

+        if self._flow_name == "not_a_real_flow":
+            self.logger.terminate()
+
+    def init_environment_outside_flow(


can you add comments here for future readers as to why/how the logger needs to be/is constructed in this manner?

Will address

savingoyal · 2024-06-06T15:30:25Z

metaflow/system/metaflow_system_logger.py

+        self._flow_name = flow_name
+        self._logger = logger
+
+    def init_logger_outside_flow(self):


do we expect this method to be used outside of this class? if not, consider adding a _ prefix.

Agreed - will address

savingoyal · 2024-06-06T15:32:41Z

metaflow/system/metaflow_system_logger.py

+            Additional context to log with the event. The additional context will have to be handled by
+            the event logger implementation.
+        """
+        self.logger.log_event(msg, event_name, log_stream, other_context)


maybe

Suggested change

self.logger.log_event(msg, event_name, log_stream, other_context)

self.logger.log({"msg": msg,

"event_name": event_name,

"log_stream": log_stream,

"other_context": other_context or {},

})

savingoyal · 2024-06-06T15:41:02Z

metaflow/system/metaflow_system_logger.py

+        if self._flow_name == "not_a_real_flow":
+            self.logger.terminate()
+
+    def init_environment_outside_flow(


it seems that this method exists to ensure the appropriate MetaflowEnvironment is picked for constructing the logger (and monitor) object. Currently, the interface doesn't strictly expect an environment to be passed to the constructor. When using the system logger and monitor outside of a flow, the utility of recording the environment is limited - can you consider making the environment optional in your internal implementation of logger and monitor and using a dummy value as a default instead (nullEnvironment)? It would clean up this implementation significantly.

I think there is reasonable benefit for passing MetaflowEnvironment when used outside of a flow as well. Primarily, the environment provides us with additional tags/context like the version of metaflow used, platform, the user name etc. With that in mind, I just referred to the way the logger/monitor were instantiated in cli.py here.

savingoyal · 2024-06-06T15:44:30Z

metaflow/task.py

+        with _system_monitor.count("metaflow.task.clone"):
+            self.event_logger.log_event(
+                event_name="metaflow.task.clone",
+                msg=msg,


is there a reason to drop task_id, step_name, run_id, flow_name and ts from the payload here?

The task_id, step_name etc are not part of the payload of an event usually. They are added once at the beginning as additional tags and then are automatically added to all events. But yes, I should add another call to add these additional tags/context like we did in the run_step function.

savingoyal · 2024-06-06T15:48:24Z

metaflow/task.py

+            "project_flow_name": current.get("project_flow_name"),
+            "trace_id": trace_id or None,
+        }
+        self.event_logger.send(


I wonder if you did a simple self.event_logger.log(payload) where the payload had the message and the context (common tags, etc.), and then in your internal implementation of the logger, you could make multiple calls with MUST_SEND or BEST_EFFORT as intended. Currently, it seems that this bit of logic in task.py is leaking implementation detail on how the logger backend works.

savingoyal · 2024-06-06T15:54:21Z

metaflow/task.py

+            "project_flow_name": current.get("project_flow_name"),
+            "trace_id": trace_id or None,
+        }
+        self.event_logger.send(


Maybe an in-person conversation might be quicker here - it is likely that we can preserve the same signature for self.event_logger.log(msg) as before and move the complexity of handling what needs to be sent as MUST_SEND or BEST_EFFORT to your internal sidecar implementation. That way, these changes in task.py are entirely decoupled from any changes in the sidecar implementation.

talsperre requested review from romain-intel and savingoyal April 8, 2024 17:13

romain-intel reviewed Apr 11, 2024

View reviewed changes

talsperre force-pushed the dev/sidecar-update branch from e35dd85 to a5984a2 Compare April 29, 2024 23:23

romain-intel reviewed Apr 30, 2024

View reviewed changes

savingoyal requested changes May 9, 2024

View reviewed changes

romain-intel reviewed Jun 6, 2024

View reviewed changes

talsperre added 9 commits June 6, 2024 07:27

Expose metaflow logger and monitor via singleton

5f8540a

Update task metrics, add gauge functionality to metrics manager

7831f35

add stub methods to null event logger

0c288bd

Fix bug when monitor msg is none

025c6e2

Replace system current singleton with logger and monitor singletons

78a0367

Address comments

21d3287

Move monitor/logger to metaflow.system, address comments

5a51ddf

Revert passdown change

2828bc0

Remove extra new line

9f779ae

talsperre force-pushed the dev/sidecar-update branch from 78950dc to 9f779ae Compare June 6, 2024 14:28

savingoyal reviewed Jun 6, 2024

View reviewed changes

savingoyal requested changes Jun 6, 2024

View reviewed changes

		from contextlib import contextmanager


		class SystemMonitor(object):

		from typing import Dict, Any, Optional, Union


		class SystemLogger(object):

-        self.logger.log_event(msg, event_name, log_stream, other_context)
+        self.logger.log({"msg": msg,
+                "event_name": event_name,
+                "log_stream": log_stream,
+                "other_context": other_context or {},
+            })

Expose metaflow logger and monitor via singleton #1794

Are you sure you want to change the base?

Expose metaflow logger and monitor via singleton #1794

Conversation

talsperre commented Apr 8, 2024 • edited

Expose logger and monitor via a system_current singleton

Usage

romain-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romain-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

talsperre May 18, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romain-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

talsperre commented Apr 8, 2024 •

edited

Expose logger and monitor via a `system_current` singleton

talsperre May 18, 2024 •

edited