I have tried adding the line to conf but seems not working as well... are u able to run with proper logging?
EnviousStarfish54 quick update, regardless of the logging.config.dictConfig
issue, I will make sure that even when the logger is removed, the clearml logging will continue to function 🙂
The commit will be synced after the weekend
The log is missing, but the Kedro logger is print to sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
I tried to step in the debugger, I can't quite see the clearml handlers in logging._handlers, the dict is empty, where is the clearml handler stored? AgitatedDove14
https://github.com/allegroai/clearml/commit/164fa357ed01704b11db67b8a7ac19791fbc49d1
This works. So it is still in master and should be included in 1.0.5?
Is this a logging
issue, or clearml issue ?
EnviousStarfish54 good news, this is fully reproducible
(BTW: for some reason this call will pop the logger handler clearml installs, hence the lost console output)
Thanks EnviousStarfish54
Let me check if I can reproduce it
Any commits/PR I can reference to the fix? Thanks.
FYR, I have shared the information to Kedro team.
https://github.com/quantumblacklabs/kedro/issues/792
It seems that they have their own justification too, maybe there are better config to make both work, I don't know yet.
Sorry for late reply AgitatedDove14
The code that init Task is put inside the first node. https://github.com/noklam/allegro_test/blob/6be26323c7d4f3d7e510e19601b34cde220beb90/src/allegro_test/pipelines/data_engineering/nodes.py#L51-L52
repo: https://github.com/noklam/allegro_test
commit: https://github.com/noklam/allegro_test/commit/6be26323c7d4f3d7e510e19601b34cde220beb90
entrypoint: kedro run
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml
as well as the console
but the logger info is missing.
What do you mean? Can I reproduce it ?
BTW: The code sample you shared is very similar to how you create pipelines in ClearML, no?
(also could you expand on how you create the Kedro node ? from te face o fit it looks like another function in the repo, but I have a feeling I'm missing something)
AgitatedDove14 Yes, as I found as Kedro's pipeline start running, the log will not be sent to the UI Console anymore. I tried both Task.init before/after the start of kedro pipeline and the result is the same. The log is missing, but the Kedro logger is print to sys.stdout in my local terminal.
I think it's related to the fix that use "incremental: true", this seems to fix 1 problem, but at the same time it will ignore all other handlers.
AgitatedDove14
The core of Kedro is pipeline (multi-nodes), where you can stitch different pipeline together. For the data part, they use something called DataCatalog, which is a YAML file defined how your file is going to be saved/loaded, and where is the file. Kedro also resolve the DAGs of your pipeline, so you actually don't define what's the order of execution (it's defined by the input/output dependencies). The default is a sequentialRunner, optionally, you can use ParallelRunner where Kedro will try to run nodes without dependencies in parallel. Kedro comes with a pre-defined template for data science project. i.e. catalog.yaml for data catalog, parameters.yaml for your pipeline parameters
This works.
great!
So it is still in master and should be included in 1.0.5?
correct, RC will be released soon with this fix included
Could u give me some pointers where ClearML auto-capture log/stdout? I suspect as Kedro has configuration on logging
and somehow ClearML fail to catch it.
Will the new fix avoid this issue and does it still requires the
incremental
flag?
It will avoid the issue, meaning even when incremental is not specified, it will work
That said the issue any other logger will be cleared as well, so, just good practice ...
From the
logging
documentation ...
Hmmm so I guess Kedro should not use dictConfig ?! I'm not sure on the exact use case, but just clearing all loggers seems like a harsh approach
I don't think it is running in subprocess, stdout/stderr is output in terminal. If I use print() it actually logged, but the logger info is missing.
AgitatedDove14 Let me share the exact code and commit and entry point to you later. Thanks!
EnviousStarfish54 quick update, regardless of the
logging.config.dictConfig
issue, I will make sure that even when the logger is removed, the clearml logging will continue to function 🙂
The commit will be synced after the weekend
Will the new fix avoid this issue and does it still requires the incremental
flag?
From the logging
documentation
Thus, when the incremental key of a configuration dict is present and is True,
the system will completely ignore any formatters and filters entries
, and process only the level settings in the handlers entries, and the level and propagate settings in the loggers and root entries.*
EnviousStarfish54
Can you check with the latest clearml from github?pip install git+
I can confirm this seems to fix this issue, and I have reported this issue to kedro
team see what's their view on this. So it seems like it did remove the TaskHandler
from the _handler_lists
Thank you EnviousStarfish54 !
This is very helpful!
I'm looking at Kedro and the project you shared, and a few thoughts came to mind:
I very much like the idea of using functions as "nodes" (and to extend, using notebook cells with tags as nodes). This got me thinking, I'm pretty sure we could have a similar imlmentation with ClearML. My thinking is using inspect
or dill
to convert the functions/cells into plain text code, automatically analyze the runtime requirements, and create a "single-script" Task from it. I'm also trying to figure out, and I might be totally wrong here, how you are supposed to use Kedro. Or in other words, is it a frame work to basically do Process(target=node_function)
for you? is there anything I'm missing (well maybe you might have to add a Queue to pass some data)?
Regrading the actual issue console logging issue.
Try adding the following line to your cleaml.conf
, one of the by products is that any os.fork will pass through ClearML and subprocess will be logged as well. This behavior should happen bu default, by on some cases the patching is a must.sdk.development.report_use_subprocess = false
Cool! Will have a look at the fix when it is done. Thanks a lot AgitatedDove14
EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
Hi, sorry for really late update.
I finally get some more information about the logging issue. After diving in Kedro source code, I found there was a line that link to the log issue.
logging.config.dictConfig(conf_logging)
This line will cause ClearML fail to pick up the log, any idea? AgitatedDove14 SuccessfulKoala55
A snippets to test is available, try commenting the line:6, the log will pop up, when line:6 is enable, event print() is not shown up.
` import logging
from clearml import Task
conf_logging = {"version":1, "formatters":{"simple":{"format":"%(asctime)s - %(name)s - %(levelname)s - %(message)s"}}}
t = Task.init(project_name="test")
logging.config.dictConfig(conf_logging)
logging.info("INFO!")
logging.debug("DEBUG!")
logging.warning("WARN!")
print("PRINT!") `
The "incremental" config seems does not work well if I add handlers in the config. This snippets will fail with the incremental
flag.
` import logging
from clearml import Task
conf_logging = {
"version": 1,
"incremental": True,
"formatters": {
"simple": {"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "simple",
"stream": " ",
}
},
}
t = Task.init(project_name="test")
logging.config.dictConfig(conf_logging)
logging.info("INFO!")
logging.debug("DEBUG!")
logging.warning("WARN!")
print("PRINT!") `