Reputation
Badges 1
25 × Eureka!So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?
JitteryCoyote63 the agent.cuda_version (or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDN...
Hi @<1523701797800120320:profile|SteadySeagull18>
...the job -> requeue it from the GUI, then a different environment is installed
The way that it works is, in the "originating" (i.e. first manual) execution only the directly imported packages are listed (no derivative packages that re required by the original packages)
But when the agent is reproducing the job, it creates a whole clean venv for the experiment, installs the required packages, then pip resolves the derivatives, and ...
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
Yes that makes sense, especially from IT failure perspective
Hi RipeGoose2 all PR's are welcome, feel free to submit :)
Hi ThankfulOwl72 checkout TrainsJob object. It should essentially do what you need:
https://github.com/allegroai/trains/blob/master/trains/automation/job.py#L14
You can always access the entire experiment data from python
'Task.get_task(Id).data'
It should all be there.
What's the exact use case you had in mind?
In case of scalars it is easy to see (maximum number of iterations is a good starting point
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
Hi TrickyRaccoon92
If you are reporting to tensor-board, then "iteration" equals step. Is this the case?
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell π
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
Hi PunyGoose16 ,
I think the website is probably the easiest π
https://clear.ml/contact-us/
I think they get back to quite quickly
BTW: what happens if you pass the same s3://bucket to Task.init output_uri ? I assume you are getting the same access issue ?
TrickyRaccoon92 actually Click is on the to do list as well ...
Hi WorriedParrot51 , what do you mean by "call get_parameters_as_dict() from agent" ?
Do you mean like change the trains-agent to run the task differently?
Or inside your code while the trains agent runs it?
From the code itself (regardless off how you run it) you can always call, and get the current states parameters (i.e. from backend if running with trains-agent, or copied from the code, if running manually)task.get_parameters_as_dict()
Yey! okay let me make sure we add this feature to the Task.init arguments so one can control it from code π
I though the dataset was only linked to the fileserver and not to the specific url used to upload it.Β (
ShinyRabbit94 yep exactly! the idea is that you can actually do the storage on any solution (S3/GS etc.) the file server is just the default one π
Yes, in tandem with the experiments (because they constantly log to the server).
That said, with 0.16 we added offline mode, so you can run in offline mode, then import the experiment into the system.
Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/examples/pytor...
Hi @<1715900760333488128:profile|ScaryShrimp33>
hi everyone! Iβm trying to save my modelβs weights to storage. And I canβt do it.
See example here: None
or
task.update_output_model(model_path="/path/to/model.pt")
C will be submitted to a different queue and I donβt care as much
Is there a way to define βtask affinityβ in this way?
Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?
OutrageousSheep60 before I can answer, maybe you can explain why "zipping" them does not fit your workfow ?
Hi ConvolutedBee40
If we deploy a task to
clearml-server
, will it automatically scale?
The way it works is with agents and agent glue, basically using k8s as a resource allocator and the clearml agent as orchestrator, did that answer the question ?
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
I meant even just a link to a blank comparison and one can then add the experiments from that view
Just making sure you are aware, once you are in comparison you can always add Tasks (any Task):
Notice you can press on the "Add experiments", then select Any experiment (including all projects! as filters)
Notice you need to remove all filters (right side red x on the filter Icon)
@<1527459125401751552:profile|CloudyArcticwolf80> what are you seeing in the Args section ?
what exactly is not working ?
What's the trains-server version?
PompousParrot44 with pleasure. If during your search for a solution you come across something that solves it, and might integrate to the agent, do not hesitate to suggest it :)