Long story short, work in progress.
BTW: are you referring to manual execution or trains-agent
?
This only talks about bugs reporting and enhancement suggestions
I'll make sure this is fixed š
why is pushing into the services queue required ...
The services queue is usually connected with an agent running in "services mode" which means this agent is executing multiple tasks in parallel (as opposed to regular agent that only launches one Task at a time, the assumption is that "service" Tasks are usually not heavy on cpu/ram so multiple instances make sense)
Hmm, you can delete the artifact with:task._delete_artifacts(artifact_names=['my_artifact']
However this will not delete the file itself.
Do delete the file I would do :remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']
Maybe we should have a proper interface for that? wdyt? what's the actual use case?
Hi RipeGoose2
Any logs on the console ?
Could you test with a dummy example on the demoserver ?
We abuse the object description here to store the desired file path.
LOL, yep that would work, I'm assuming you have some infrastructure library that does this hack for you, but really cool way around it š
And last but not least, for dictionary for example, it would be really cool if one could do:
Hmm what you will end up now is the following behaviour,my_other_config['bar']
will hold a copy of my_config
, if you clone the Task and change "my_config" it will hav...
I think I found something,
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/storage/helper.py#L1442
What's the boto version you have installed?
Hi ProudChicken98
How about saving it as a local YAML and upload the file itself as an artifact?
FrothyShark37 any chance you can share snippet to reproduce?
JuicyFox94 maybe you can help here?
can we somehow in clearml-session choose the pool of ports for work?
Yes, I think you can.
How do you spin the worker nodes? Is it Kubernetes ?
Hi @<1549202366266347520:profile|GorgeousMonkey78>
how do I integrate sagemaker with clearml ,
you mean to launch an experiment, or just to log it?
I guess i need to do something like the following after the task was created:
...
Yes!
Why use the "post" callback and not the "pre" callback?
The post get's back the Model object. The pre allows you to decide if you actually want to log in the first place (come to think about it, maybe you want that as well š )
OK - the issue was the firewall rules that we had.
Nice!
But now there is an issue with the
Setting up connection to remote session
OutrageousSheep60 this is just a warning, basically saying we are using the default signed SSH server key (has nothing to do with the random password, just the identifying key being used for the remote ssh session)
Bottom line, I think you have everything working š
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
That's the only explanation ...
But the weird thing is, it did not work on my linux box?!
Sounds good let's work on it after the weekend, š
Sorry @<1689446563463565312:profile|SmallTurkey79> just notice your reply
Hmm so I know the enterprise version has a built-in support for slurm, which would remove the need to deploy agents on the slurm cluster.
What you can do is on the SLURM login server (i.e. a machine that can run sbatch), write a simple script that pulls the Task ID from the queue and calls sbatch with clearml-agent execute --id <task_id_here>
, would this be agood solution
You mean to design the entire pipeline from YAML?
(this assumes your Tasks know how to process links to artifacts)
Is this what you are after?
(BTW: any reason for working with YAML files instead of coding it?)
The function
a delete request with a
raise_on_errors=False
flag.
Are you saying we should expose raise_on_errors
it to _delete_artifacts() function itself?
If so, sure seems logic to me, any chance you want to PR it? (please just make sure the default value is still False so we keep backwards compatibility)
wdyt?
Hi SpicyOtter88plt.plot([0, 1], [0, 1], 'r--', label='')
ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?
Correct š
btw: my_dict_with_conf_for_data
can be any object, not just dict. It will list all the properties of the object (as long as they do not start with _)
or at least stick to the requirements.txt file rather than the actual environment
You can also for it to log the requirements.txt withTask.force_requirements_env_freeze(requirements_file="requirements.txt") task = Task.init(...)
GreasyPenguin14 let me check with the guys when is the next version .
Are you using the self-hosted server of the community server ?
CheerfulGorilla72 sounds like a great idea, I'll pass along to documentation ppl š
I guess the thing that's missing from offline execution is being able to load an offline task without uploading it to the backend.
UnevenDolphin73 you mean like as to get the Task object from it?
(This might be doable, the main issue would be the metrics / logs loading)
What would be the use case for the testing ?
Hi BoredPigeon26
what do you mean by "reuse the task" ? is this manual execution (i.e. from code)?
How about archiving the old version?
You can also force Task.init to always create a new Task (which preserves the previous run alongside the execution tab)
Basically what's the specific use case ?
Hi SteepCockroach81CLEARML_CONFIG_FILE
point to the configuration file being used
See here:
https://clear.ml/docs/latest/docs/configs/env_vars#server-connection
DistressedGoat23 check this example:
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.pyaSearchStrategy = RandomSearch
It will collect everything on the main Task
This is a curial point for using clearml HPO since comparing dozens of experiments in the UI and searching for the best is just not manageable.
You can of course do that (notice you can actually order them by scalars they report, and even do ...
Well done man!
Just dropping this here but I've had some funky compressions with very small datasets!
Odd deflate behavior ...?!
Basically it solves the remote-execution problem, so you can scale to multiple machines relatively easy :)