
Reputation
Badges 1
11 × Eureka!Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
Do note that the needed module is just a local folder with scripts.
Oh that is the issue, is it in the git repo ?
Thank you! 😊
report_text does not, this is very weird
Okay this seems to be the issue.
Just making sure the Task status is "running" and task.get_logger().report_text("something")
does not report a thing ?
Do you see it on your screen?
Can you test without the "Task.debug_simulate_remote_task / init" ?
So it is the automagic that is not working.
Can you print the following before calling Both Task.debug_simulate_remote_task
and Task.init
, Notice you have to call Task.initprint(os.environ)
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec 🙂
in this week I have met at least two people combining ClearML with other tools (one with Kedro and the other with luigi)
I would love to hear how/what is the use case 🙂
If I run the pipeline twice, changing only parameters or code of taskB, ...
I'll start at the end, yes you can clone a pipeline in the UI (or from code) and instruct it to reuse previous runs.
Let's take our A+B example, Let's say I have a pipeline P, and it executed A and then B (which relies on A's output...
Hi ShinyWhale52
Luigi's approach is basically an extension of a functional dag, where each node is a single function. Let's think of Kedro as extension of this approach.
With both the assumption is that a node is a single function (sometimes it really is) and we just want to create a meta execution path (i.e. the execution dag, quite similar to TF v1).
ClearML pipelines are a different story (in a way).
The main difference is that with ClearML each node is a Task, not a function. That mean...
Hi MagnificentSeaurchin79
Unfortunately there is currently no way to reorder the plots, but you have a valid point. I suggest a GitHub UX issue ?
Regrading the debug samples, the difference is that the confutation matrix report is actually metadata, you can get these numbers by the API or the download, but the debug samples are static images ...
BTW: you can try to produce an interactive side by side confusion matrix with plotly, and use report_plotly_figure
Hi CooperativeFox72
Sure 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Hi TightElk12
it would raise an error if the env where execution happens is not configured to track things on our custom server to prevent logging to the public demo server ?
What do you mean by that? catching the default server instead of the configured one ?
I see TightElk12
You can always setup the OS environments : CLEARML_API_HOST CLEARML_WEB_HOST CLEARML_FILES_HOST with the correct configuration Or you can simply set CLEARML_NO_DEFAULT_SERVER=1 which will prevent any usage of the default demo serverwdyt?
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
one of the two experiments for the worker that is running both experiments
So this is the actual bug ? I need some more info on that, what exactly are you seeing?
and what are their names ?
worker:0 worker:1 etc ?
JitteryCoyote63
Picks a new experiment on top of the long one running
This is very very strange. Is the long running experiment being logged (i.e. do you still see console output in the UI)?
(just using local server not connected to Internet), am I right?
You can if you host your own git server, Or if your code is a single file / jupyter notebook, then the entire code is stored on the Task.
btw: what is the exact setup, how come there is no git repo?
Its stored on the Task, you can see it under the execution tab in the UI
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
Can you please provide a log? I'm not sure what it means stuck
Hi PanickyFish98
It verifies it has access to it when actually creating the Task, maybe it should be a warning?!
fyi: you can also change the value from the UI (under Execution output) or have a default one set in the clearml.conf
used by the agent
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? 😁
Weird issue, I'll make sure we fix compatibility with python 3.9
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...
I think this is due to the label map including some keys with a
.
in them.
Hi TenseOstrich47 what do you mean "label"
I will create a minimal example.
Many thanks ReassuredTiger98 !
One option is definitely having a base image that has the things needed. Anything else? Thanks!
This is a bit complicated, to get the cache to kick in you have to mount an NFS file into the pod as the cache (to create a persistent cache)
Basically, spin NFS pod to store the cache, change the glue job template yaml to mount it into the pod (see default cache folders:
/root/.cache/pip and /root/.clearml/pip-download-cache)
Make sense ?
but this is not different from not using clearml-data,
ReassuredTiger98 just making sure we are on the same page. clearml-data immutability is fixed, the user cannot change the content of the dataset (it is actually compressed and uploaded). If you want to change it, you create a new child version