Reputation
Badges 1
25 × Eureka!Do you think the local agent will be supported someday in the future?
We can take this ode sample and extent it. can't see any harm in that.
It will enable very easy to ran "sweeps" without any "real agent" installed.
I'm thinking roll out multiple experiments at once
You mean as multiple subprocesses, sure if you have the memory for it
Hi BurlyRaccoon64
What do you mean by "custom_build_script" ? not sure I found it in "clearml,conf"
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf
But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
The idea is you run it locally once (think debugging your code, or testing it)
While running the code the Task is automatically created, then once in the system you can clone / launch it.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here.
We are working on documenting the full process, ...
Failed to initialize NVML: Unknown Error
yeah this is a driver issue. I think you need to check the VM image if the drivers match the GPU on that machine
Hi CleanPigeon16
can I make the steps in the pipeline use the latest commit in the branch?
Yes:
manually clone the stesp's Task (in the UI), and in the UI edit the Execution section and change to "last sommit on branch" and specify the branch name programmatically (as the above, clone+edit)
ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
Seems like the "run_experiment" step is not defined. Could that be ...
I see TrickyFox41 try the following:--args overrides="param=value"Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params
task._wait_for_repo_detection()You can use the above, to wait until repository & packages are detected
(If this is something users need, we should probably make it a "public function" )
JitteryCoyote63 hacky but sure π
` from trains.config import config_obj
print(config_obj) `
CleanWhale17 per your request :)
An automated ML Pipeline π Automated Data Source Integration π Data Pooling and Web Interface for Manual Annotation of Images(Seg. / Classif) [Allegro Enterprise] or users integrate with open-source Storage of Annotation output files(versioned JSON) π Online-Training Β Support(for Dataset Shifts) [Not Sure what you mean] Data Pre-processessing (filter/augment) [Allegro Enterprise] or users integrate with open-source Data-set visualization(stats...
you are correct, I was referring to the template experiment
I think that what you need is to create an OutputModel , then call update weights file when you have the better model, this will also allow you to tag the model object. Would that help? Or would it make sense to use Task.models and count on the auto logging?
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
Hi UnsightlySeagull42
How can I reproduce this behavior ?
Are you getting all the console logs ?
Is it only the Tensorboard that is missing ?
Hi SubstantialElk6
Unicodeencodeerror:'ascii' codec can't encode characters in position 296-297: ordinal not in range (128)Β (edited)
I'm assuming this is the usual UTF8 missing from the container.
Can you try to launch it with PYTHONIOENCODING=utf-8 ?
actually no
hmm, are those packages correct ?
Could it be the code is not in a git repository ?clearml support either a single script or a git repository, but Not a collection of standalone files. wdyt?
VexedCat68 are you manually creating the OutputModel object?
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58
@<1523712386849050624:profile|NastyFox63>
is there a limit to the search depth for this?
Yes, the Task.init auto package listing is Only the first depth (i.e. directly imported),
the reason is that the derivative packages should be resolved by pip, when the agent remotely executes that Task.
Now when the Agent is installing the task the Entire python environment is stored, so that it is always fully reprpoducible,
Make sense ?
and the step is "queued" or is it "queued" in the pipeline state (i.e. the visualization did not update) ?
Hi DilapidatedDucks58
trains-agent tries to resolvethe torch package based on the specific cuda version inside the docker (or on the host machine is if used in virtual-env mode). It seems to fail finding the specific version "torch==1.6.0.dev20200421+cu101"
I assume this version was automatically detected by trains when running manually. If this version came from a private artifactory you can add it to the trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L...
Thanks!
Hmm from here : None
Could it be you do not have privileges to the resource, or that you did not provide credentials ?
Did that autoscaler work before ?
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
AttributeError: 'NoneType' object has no attribute 'base_url'
can you print the model object ?
(I think the error is a bit cryptic, but generally it might be the model is missing an actual URL link?)print(model.id, model.name, model.url)
TroubledHedgehog16 generally speaking you can expect about 10 api calls per minute if you have many reports, and about 3 per minute on low report. We just optimized the sdk so in cases there are lots of consequential reports they are better batched, I would recommend the latest RC
Hi RoundMosquito25
This is a bit old but probably a good start:
https://clear.ml/blog/stacking-up-against-the-competition/
tl;dr
ClearML advantages (at least a few I can think of)
Scales way better Enables out of the box experiment orchestration (i.e. remote execution etc) Data management Nicer UI Full RestAPI Full MLops platform Model serving Query-able model repositoryProbably more π
this is very odd, can you post the log?