why is pushing into the services queue required ...
The services queue is usually connected with an agent running in "services mode" which means this agent is executing multiple tasks in parallel (as opposed to regular agent that only launches one Task at a time, the assumption is that "service" Tasks are usually not heavy on cpu/ram so multiple instances make sense)
Notice that the actual configuration that is used is the https://github.com/allegroai/clearml/blob/b21e93272682af99fffc861224f38d65b42c2354/clearml/backend_config/bucket_config.py#L23
But it is created here:
https://github.com/allegroai/clearml/blob/b21e93272682af99fffc861224f38d65b42c2354/clearml/backend_config/bucket_config.py#L199
Hmmm, can you view the settings? that's the only thing I can think of at the moment that will be diff between your setup and the working one...
Also, is there a way for you to have the trains-server behind https (on your GCP)
Hi @<1566596960691949568:profile|UpsetWalrus59>
Could it be the two experiments have the exact name ?
(I sounds like a bug in the UI, but I'm trying to make sure, and also understand how to reproduce)
What's your clearml-server version ?
Hi AmiableFish73
Hi all - is there an easier way track the set of datasets used by a particular task?
I think the easiest is to give the Dataset an alias, it will automatically appear in the Configuration section:Dataset.get(..., alias="train dataset")
wdyt?
Then running by using the
, am I right?
yep
I have put the
--save-period
while running Yolov5 and ClearML does not save the weight per epoch that I have trained. Why is this happened?
But do you still see it in the clearml UI ? do you see the models logged in the clearml UI ?
That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?
@<1687643893996195840:profile|RoundCat60> can you access the web UI over https ?
Hmm, it is not returned, it is inside the function....
yes I'm with you, we need to fix it asap
RipeGoose2 you mean to have the preview html on S3 work as expected (i.e. click on it add credentials , open in a new tab) ?
there is probably some way to make an S3 path open up in the browser by default
You should have a pop-up asking for credentials ...
Could you check that if you add the credentials in the profile page it works ?
Fixed in pip install clearml==1.8.1rc0
🙂
Wait, that makes no sense to me. The API from python and the API from the UI are getting the same data from the backend ...
What are you getting with?from clearml import Task task = Task.get_task(task_id=<put task id here>) print(task.models)
FiercePenguin76 the git repo should detect only clearml
as required python package
Basically the steps are:
decide if the initial python entry script is a standlone script (i.e. no local imports) in the git repo (in your example "task_with_deps.py") If this is a "standlone script" only look for imports inside the calling python script, and list those packages under "installed packages" If this is Note a standalone script, go over All the python files inside the repository, look for "i...
That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,
You can also change allow_omegaconf_edit
in the UI to True, and then you could just edit the OmegaConf in the UI (if you do not change
allow_omegaconf_edit` then the edit in the UI is ignored)
HI ResponsiveCamel97
What's the clearml-server version? How do you spin the server on your k8s cluster, helm ?
Now I suspect what happened is it stayed on another node, and your k8s never took care of that
Only as "default docker + argument" , if you need the "extra_docker_arguments" (which I think a mount point is a good example for), then you have to add it in the conf file
doing some extra "services"
what do you mean by "services" ? (from the system perspective any Task that is executed by an agent that is running in "services-mode" is a service, there are no actual limitation on what it can do 🙂 )
Hi SquareFish25
Sure, here are a few:
HPO
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Pipeline
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Automation:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
How come the second one is one line?
Can you please elaborate on the latter point? My jupyterhub’s fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
Do you accidentally know if there are any plans for an implementation with the logger variable, so that in case of something it would be possible to write to different tables?
CheerfulGorilla72 what do you mean "an implementation with the logger variable" ? pytorch-lighting defaults to the TB logger, which clearml will automatically catch and log into the clearml-server, you can always add additional logs with clearml interface Logger.current_logger().report_???
What am I mis...
Are you running a jupyter notebook inside vscode ?
sorry typo client.task.
should be client.tasks.
Hi ZealousSeal58
What's the clearml version you are using ?
If there was a "debug mode" for viewing the stack trace before the crash that would've been most helpful...
import traceback traceback.print_stack()
I'm assuming TF was not part of the original requirements, and was automatically pulled by one of the packages, hence the latest version ....