Reputation
Badges 1
25 × Eureka!Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...
Yes, I think you are correct, verified on Firefox & Chrome. I'll make sure to pass it along.
Thanks SteadyFox10 !
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker()
?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
Hi RoughHedgehog31
I'm assuming your git diff is just too big to be stored as is (probably some binary files)
it should not really have any effect on the execution, it just means the clearml-agent will not be able to reproduce the uncommitted changes.
Make sense ?
load_model
will get a link to a previously registered URL (i.e. it search a model pointing to the specific URL, if it finds it, it will get you the Model object)
No -- that section is blank,
This is the main issue, it should be filled with the requirement being auto detected.
The entire script was executed from within vscode, and the Task was created but it was not prefilled with anything ?
Just making sure, you called Task.init
inside your code ?
@<1541954607595393024:profile|BattyCrocodile47> first let me say I ❤ the dark theme you have going on there, we should definitly add that 🙂
When I run
python set_triggers.py; python basic_task.py
, they seem to execute, b
Seems like you forgot to start the trigger, i.e.
None
(this will cause the entire script of the trigger inc...
BTW: there is a full Pipeline class that does everything for you, example here:
https://github.com/allegroai/clearml/tree/master/examples/pipeline
EnviousStarfish54 data versioning on the open source leverages the artifacts and storage and caching capabilities of Trains.
A simple workflow
- Upload data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py - Preprocessing data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py - Using data
https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
Just to make sure, the first two steps are working ?
Maybe it has to do with the fact the "training" step specifies a docker image, could you try to remove it and check?
BTW: A few pointers
The return_values
is used to specify multiple returned objects stored individually, not the type of the object. If there is a single object, no need to specify
The parents
argument is optional, the pipeline components optimizes execution based on inputs, for example in your code, all pipeline comp...
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
OH I see. I think you should use the environment variable to override it:
None
so add to the docker args something like
-e CLEARML_AGENT__AGENT__PACKAGE_MANAGER__POETRY_INSTALL_EXTRA_ARGS=
Are you saying you have a single line in the console output of the component Task?
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
Correct, you have to have SSH inside the container so that git can use it.
You can always install with the following setup inside your agent's clearml.conf:extra_docker_shell_script: ["apt-get install -y openssh-client", ]
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L145
Yes, the same will work with artifacts, use pass the full url to the artifact_object
it should just register it as is.
Okay I have an idea, it could be a lock that another agent/user is holding on the cache folder or similar
Let me check something
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
Hi @<1661542579272945664:profile|SaltySpider22>
question 1: are parallel writes to a dataset with the same version possible?
When you are saying parallel what do you mean? from multiple machines ?
Whats the recommended way to append the dataset in a future version?
Once a dataset was finalized the only way to add files is to add another version that inherits from the previous one (i.e. the finalized version becomes the parent of the new version)
If you are worried about multip...
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
This is exactly what I was looking for. Thanks!
Yes that makes sense, I think this bug was fixed a long time ago, and this is why I could not reproduce it.
I also think you can use a later version of clearml 🙂
Thank you @<1523701949617147904:profile|PricklyRaven28> !!!
Let me see if we can reproduce and how to solve it
Hi @<1715175986749771776:profile|FuzzySeaanemone21>
and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.
I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component
Thanks MagnificentSeaurchin79 !
Let me check what's the status with this one, could it be the same as this one?
https://github.com/allegroai/clearml/issues/322
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
Just call the Task.init before you create the subprocess, that's it 🙂 they will all automatically log to the same Task. You can also call the Task.init again from within the subprocess task, it will not create a new experiment but use the main process experiment.
Ohh try to add --full-monitoring
to the clearml-agent execute
None
MagnificentSeaurchin79 no need for the detection api (yes definitely a mess to setup), it will be more helpful to get a toy example.
IrritableJellyfish76 point taken, suggestions on improving the interface ?
Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?