
Reputation
Badges 1
25 × Eureka!JitteryCoyote63 no you should not (unless you already have the Task.init call in your code)clearml-data
add the Task.init call at the beginning of the code in the entry point.
This means you should be able to get Task.current_task()
and get back the object.
What do you have under the "uncommitted changes" on the Task that was created?
UnevenDolphin73 clearml.config.get_remote_task_id()
will return the Task ID not the Task object. in order to get automagic to work, one h...
You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)
Is there any references (vlog/blog) on deploying real-time model and do the continuous training pipeline in clear-ml?
Something along the lines of this one ?
https://clear.ml/blog/creating-a-fully-automatic-retraining-loop-using-clearml-data/
Or this one?
https://www.youtube.com/watch?v=uNB6FKIi8Wg
Hi SillySealion58
"keep N best checkpoints" logic in my training loop.
If this is the usecase, may I suggest overwriting them locally? (the same will happen on the remote storage) This is exactly how the lightning / ignite feature is implemented
Check the links that are generated in the ui when you upload an artifact or model
hm ReassuredTiger98 can you send the full log? I think it should have worked (but as you mentioned it might be conda/pip mix?!)
Ohh, if this is the case then it kind of makes sense to store on the Task itself. Which means the Task object will have to store it, and then the UI will display it :(
I think the actual solution is a vault , per user, which would allow users to keep their credentials on the sever, the agent to pass those to the Task when it spins it, based on the user. Unfortunately the vault feature is only available on the paid/enterprise version ( with RBAC etc.).
Does that make sense?
GrotesqueDog77 this should just work, decorate the functions with @PipelineDecorator.component
and call the functions one after the otherpaths = step_one() step_two(paths)
ClearML will make sure it serializes the strings and pass them to step two (of course step two should actually run on a machine with access to the same folder, but this is another issue ๐ )
I double checked the code it's always being passed ๐
Only the dictionary keys are returned as the raw nested dictionary, but the values remain casted.
Using which function ? task.get_parameters_as_dict
does not cast the values (the values themselves are stored as strings on the backend), only task.connect
will cast the values automatically
Hi @<1610083503607648256:profile|DiminutiveToad80>
This sounds like the wrong container ? I think we need some more context here
if I use automatic code analysis it will not find all packages because ofย
importlib
.
But you can manually add them with Task.add_requirements, no?
Yep, and this is the root cause of the issue (But easily fixable) ๐
Now in case I needed to do it, can I add new parameters to cloned experiment or will these get deleted?
Adding new parameters is supported ๐
MysteriousBee56 what do you mean "save Scalars on the machine"? All metrics are sent to the trains server. You can later retrieve them from code, if you need.
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
Did you run clearml-init
after the pip install ?
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent
will be able to override them :)
/home/npuser/.clearml/venvs-builds/3.7/task_repository/commons-imagery-models-py
Yep I see it now, could you simulate locally (i.e have the other folders in the path as well)?
could it be you also have a file somewhere that is called sfi or imagery or models or chip_classifier that it accidently tries to import first from ?
Hi AgitatedTurtle16
My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
So basically once you see an experiment in the UI, it means you can launch it on an agent.
There is No need to containerize your experiment (actually that's kind of the idea, removing the need to always containerize everything).
The agent will clone the code, apply uncommitted changes & install the packages in the base-container-image at runtime.
This allows you to u...
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
FYI matplotlib imshow will create a debug image, and on complex plots the plot might get converted to image. (But shown under the plots section). All in all you might not be aware of it, but you are uploading image to your files server
Hi @<1716987924207112192:profile|CostlyOctopus40>
is opensearch supported in ClearML instead of Elasticsearch ? please shed some light on that
Long story short, maybe?! but this is not officially supported.
We only support elasticsearch, the opensearch fork is not officially supported and since we continue to use more advanced features of Elastic, it might be that the API will not be compatible in the future.
Out of curiosity, why are you using opensearch?
can you get the agent to execute the task on the current conda env without setting up new environment?
Wouldn't that break easily ? Is this a way to avoid dockers, or a specific use case ?
is there any other way to get task from the queue running locally in the current conda env?
You mean including cloning the code etc. but not installing any python packages ?