Reputation
Badges 1
25 × Eureka!i.e. change them per experiment ?
I think this is the only mount you need:
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
SuccessfulKoala55 is this correct ?
I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R
I think the main issue is running with python -m module.name --args
Which is a bit different, when trying to "understand" what is the actual repository.
Can you try to run it from the repository folder (same command, just to see if it will have any effect on the detected packages)
ok, but this happens in my local machine, not in the agent
resource monitoring is always running in the background, even on local machines. (of course you can turn it off)
error in my-package setup command:
Okay this seems like an error in the setup.py you have in the "mypackage" folder
WackyRabbit7
regular trains-agent modus operandi is one job at a time (i.e. until the Task is done, no other Tasks will be pulled from the queue).
When adding --services-mode, it is Not 1-1 but 1-N, meaning a single trains-agent will launch as many Tasks as it can.
The trains-agent pulls a job from the queue and spins a docker (only dockers are supported for the time being) and lets the job run in the background (the job itself will be registered as another "worker" in the system). Then the...
DeliciousBluewhale87 basically any solution that is compliant with S3 protocol will work. An example:output_uri="
:PORT/bucket/folder"
Are you sure Nexus supports this protocol ?
I "think" nexus sits on top of a storage solution (like am object storage), meaning we can use the same storage solution Nexus is using.
Just to clarify we do not support the artifactory protocol Nexus provides for storing models/artifacts. But we do support it as a source for python packages used by the a...
Hi GiganticTurtle0
You can keep clearml following the dictionary auto updating the UI
args = task.connect(args)
they are efs mounts that already exist
Hmm, that might be more complicated to restore, right ?
Yes! That's exactly what I meant, as you can see the Triton backend was not able to load your model. I'm assuming because it was Not converted to torch script, like we do in the original example
https://github.com/allegroai/clearml-serving/blob/6c4bece6638a7341388507a77d6993f447e8c088/examples/pytorch/train_pytorch_mnist.py#L136
GrievingTurkey78
Both are now supported, they basically act the same way 🙂
and log overrides + the final omegaconf
For classification it's F1 score but for other task it maybe and I don't think that's problem. we just have to log it right?
Correct 🙂
Give me few days, I will work on your sugestions and then let you know if I am not able to do this
Sounds good!
BTW:previous_tasks = Task.get_tasks(task_filter={'tags': 'best'}) local_model_file = previous_tasks[0].artifcats['my_model'].get_local_copy()
Thanks SmallDeer34 !
Would you like us to? How about a footnote/acknowledgement?
How about a reference / footnote ?@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from
}, url={
}, author = {allegro.ai}, }
Hi JitteryCoyote63
Signal 9 is killed signal, could it be someone killed the process ? Do you have other logs to share ? Is this reproducible ?
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?
Is it possible to substitute these steps using containers instead.
I'm not sure I follow, could you expand ?
Verified, and already fixed with 1.0.6rc2
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
for example, one notebook will be dedicated to explore columns, spot outliers and create transformations for specific column values.
This actually implies each notebook is a standalone "process", which makes a ton of sense. But this is where notebooks and proper SW design break, in traditional SW, the notebooks are actually python files, and then of course you can import one from another, unfortunately this does not work in notebooks...
If you are really keen on using notebooks I wou...
Hi JitteryCoyote63 could you provide a bit more details, is this a repo for a python module (i.e. in the installed pacakges) or a submodule ?
Hi @<1619867994005966848:profile|HungryTurtle13>
I'm using Python's joblib library and the Parallel class to run an experiment in multiple parallel threads.
I believe joblib creates subprocesses not threads, but yes you are correct,
Basically once Task.init is called, every forked/spawned process will be automatically logged to the main process Task (you can, and probably should call either Task.init or Task.current_task() from the forked processes, but this is just a detial)
The mai...
This, however, requires that I slightly modify the clearml helm chart with the aws-autoscaler deployment, right?
Correct 🙂
Nice! TrickySheep9 any chance you can share them ?
Well (yes, I think), the environment section is used mostly for logging, the next version will have full support by the clearml-agent (due next week), and the next release of clearml-server will add basj-script support.
So is there any tutorial on this topic
Dude, we just invented it 🙂
Any chance you feel like writing something in a github issue, so other users know how to do this ?
Guess I’ll need to implement job schedule myself
You have a scheduler, it will pull jobs from the queue by order, then run them one after the other (one at a time)