Reputation
Badges 1
25 × Eureka!
- Be able to trigger the βpureβ function (e.g. train()) locally, without anyΒ
Β code running, while driving it from a configuration e.g. path to the data.
When you say " without anyΒ http://clear.ml Β code" do mean without the agent, or without using the Clearml.Dataset ?
Be able to trigger the β
Β decoratorβ (e.g.Β train_clearml()) while driving it from configuration e.g. dataset_id
Hmm I can think of:
` def train_clearml(local_folder=None, dataset_id=None):
...
JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze
Hi RoughTiger69
How about using the pipeline decorator as a way to run this logic?
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
I think I'm missing the context of where the code is executed....
btw: you can now set the configuration_objects directly when calling add_step π
https://clearml.slack.com/archives/CTK20V944/p1633355990256600?thread_ts=1633344527.224300&cid=CTK20V944
Hmm yes we should probably provide metrics:client.workers.get_stats(..., items=[dict(key='cpu_usage'), dict(key='gpu_usage')])
Hi SubstantialElk6
you can do:from clearml.config import config_obj config_obj.get('sdk')
You will get the entire configuration tree of the SDK section (if you need sub sections, you can access them with '.' notation, e.h. sdk.storage
)
Right, so this "vault" design is built into the paid tiers of ClearML to achieve exactly that. Long story short, users can put their credentials/configs on the clearml-server and the agent (or the clients) will pull and merge them into the execution.
It's very cool and works really nice, but not part of the open source (or the SaaS tier).
What you could do is store these configurations on the Task itself (one way o r another). Maybe for example have an empty definitions.py
file part of ...
why doesn't this happen on my other experiments?
same 100+ reports ?
(My new theory is that calling Task.reload() will fix it, and it might be called internally for the other experiments, like when reporting models/artifacts)
Could that be the case ?
How about this one:
None
so would that be "tags" "parents" ?
Hi LazyLeopard18 ,
So long story short, yes it does.
Longer version, to really accomplish full federated learning with control over data at "compute points" you need some data abstraction layer. Without data abstraction layer, federated learning is just averaging derivatives from different location, this can be easily done with any distributed learning framework, such as horovod pr pytorch distributed or TF distributed.
If what you are after is, can I launch multiple experiments with the sam...
LazyLeopard18 could you explain some more on the specific use case you have in mind?
TroubledHedgehog16 generally speaking you can expect about 10 api calls per minute if you have many reports, and about 3 per minute on low report. We just optimized the sdk so in cases there are lots of consequential reports they are better batched, I would recommend the latest RC
Hi @<1523701111020589056:profile|DefiantSpider5>
So there are two answers here, I'll start with the open-source version of both
Is there a way in clear ml to interactively view subsets of images based on a lasso of embedding plots
The ClearML Datasets have no "query" capabilities of the data inside the dataset. That means you can see preview images, statistics and download the datasets, but no query capabilities. On the other hand, there is no limitation on the type and format of me...
Hi FierceHamster54
Do I need to instantiate a task inside my component ? Seems a bit redundant....
Yes, so the idea is that the Task (along the code) will be automatically linked with the output model, for better traceability.
That said you can "import" a model into the system (i.e. it was created somewhere else and you want to register it with InputModel.import_model
https://clear.ml/docs/latest/docs/clearml_sdk/model_sdk#importing-models
I guess "Input" from that perspecti...
Yeah I think using voxel for forensics makes sense. What's your use case ?
I'm hoping i can find an end to end solution that also includes experiment management
Well of course biased here, but ClearML with the hyperdatasets is probably the most complete one.
Specifically with model performance analysis I would add voxel open-source to dissect specific results. but the combination of the abstraction and query capabilities of hyperdatasets, orchestration and experiment management are really unmatched for.
(and again of course I'm biased, but really there is n...
do I still need to specify a OutputModel
No need, only if you want to upload a local model file (but I assume in this case, no new model is created)
Hi @<1533982060639686656:profile|AdorableSeaurchin58>
Notice the scalars and console are stored on the elasticsearch DB, this is usually under/opt/clearml/data/elastic_7
Hi @<1523704157695905792:profile|VivaciousBadger56>
You should replace
task.mark_completed()
with:
task.close()
To your point
parameters = task.connect(parameters)
Will be retrieved with:
task.get_parameters()
fyi:
connect_configuration -> get_configuration_objects
What happened in the server configuration that all of a sudden you have zero ports open?
In the documentation it warns about
.close()
"Only call Task.close if you are certain the Task is not needed."
Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts
Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
Hi BoredHedgehog47
Just make sure it is installed as part of the "installed packages" π
You should end up with something likegit+
You can actually add it from your code:Task.add_requirements("git+
") task = Task.init(...)
Notice you can also add a specific commit or branch git+
https://github.com/user/repo.git@ <commit_id_here_if_needed>
Is this what you are looking for ?
EDIT:
you can also do "-e ." that should also work:
` Task.add_requirements("-e .")
task = Ta...
StickyBlackbird93 the agent is supposed to solve for the correct version of pytorch based on the Cuda in the container. Sounds like for some reason it fails? Can you provide the log of the Task that failed? Are you running the agent in docker-mode , or inside a docker?
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
The way I understand it is that K8s glue agent is enabled by default (and I do see a Deployment for
clearml-k8sagent
SarcasticSquirrel56
Good start, when you say you see the Task in ""k8s_scheduler" queue, originally did you enqueue it to "default" ?
StickyLizard47 apologies for the https://github.com/allegroai/clearml-server/issues/140 not being followed (probably slipped through the cracks of backend guys, I can see the 1.5 release happened in parallel). Let me make sure it is followed.
SarcasticSquirrel56 specifically, did you also spin a clearml-k8s glue? or are the agents statically allocated on the helm chart?
This is good news, that means the k8s glue created a k8s job and pushed the Task into the "k8s_scheduler" queue, for visibility (i.e. it is now the k8s job to launch the pod).
Can you check on the Task Info tab what is the status/message ? (it should reflect the k8s pod status)
is no agent listening to the "k8s_scheduler"
There should not be one, this is purely "virtual" , so users understand the k8s cluster is spinning their pod (sometimes it takes time, imagine EKS etc. , just visibility)
unfortunately I can't get info from the cluster
You should be able the pod in the cluster no?!
What's the Task Info panel say, can you share a screen shot ?