Hi @<1810121608967229440:profile|NonchalantWhale65> , can you provide a short code snippet that reproduces the problematic behaviour?
Hi @<1648134232087728128:profile|AlertFrog99> , I don't think there is anything specifically built in for that. You can fetch a list of all children and then see the latest.
Are you sure the files server is correctly configured on the pods ?
Hi @<1594863230964994048:profile|DangerousBee35> , do you have some stand-alone code snippet that reproduces this behaviour?
OddShrimp85 , Hi 🙂
I'm afraid that the only way to load contents of setup A into setup B is to perform a data merge.
This process basically requires merging the databases (mongodb, elasticsearch, files etc.). I think it's something that can be done in the paid version as a service but not in the open one.
Hi @<1693795218895147008:profile|FlatMole2> , is it possible that the apiserver.conf file isn't persistent and somehow changes?
The chart already passes the --create-queue command line option to the agent, which means the agent will create the queue(s) it's passed. The open source chart simply doesn't allow you to define multiple queues in detail and provide override pod templates for them, however it does allow you to tell the agent to monitor multiple queues.
None
RattyLouse61 , the API will retrieve the URLs for the debug samples. You can then download them manually. If you want these debug samples accessible by other tasks via SDK you might need to save them as artifacts, however you won't have visibility via the UI for those (like playing audio in debug samples section)
Does ClearML automatically capture
all
stdout/stderr
, like TensorFlow C++
stdout
? Is there an extra process for that? Where is this done and what are the assumptions?
ClearML should capture any output from python code. C++ is not supported
I would suggest the website 🙂
Hi @<1774969995759980544:profile|SmoggyGoose12> , I think that selecting GPUs works only in docker mode.
I'm accessing both using SSH tunneling & the same domain
I guess we found the culprit 🙂
I think you can configure agent.reload_config
in clearml.conf
and then push the change in the file programmatically somehow
I can think of two solutions:
Fix local python environments and begin using virtual environments ( https://github.com/pyenv/pyenv for example) Use the agent in --docker
mode. You won't need to worry about python versions but you will need to install Docker on that machine.
Hi @<1594863230964994048:profile|DangerousBee35> , I don't think there is such a mechanism currently. What would the expected/optimal behaviour would be in your use case?
How do you suggest the SDK know what is the 'right' URL and what is the 'wrong' url?
GrievingTurkey78 , what timeout did you set? Please note that it's in seconds so it needs to be a fairly large number
GrievingTurkey78 , I'm not sure. Let me check.
Do you have cpu/gpu tracking through both pytorch lightning AND ClearML reported in your task?
GrievingTurkey78 , please try task.init(
auto_resource_monitoring=False, ...
)
GrievingTurkey78 , can it be a heavy calculation that takes time? ClearML has a fallback to time instead of iterations if a certain timeout has passed. You can configure it with task.set_resource_monitor_iteration_timeout(seconds_from_start=<TIME_IN_SECONDS>)
Hi CostlyFox64 ,
Can you try configuring your ~/clearml.conf
with the following?agent.package_manager.extra_index_url= [ "https://<USER>:<PASSWORD>@packages.<HOSTNAME>/<REPO_PATH>" ]
I understand. In that case you could implement some code to check if the same parameters were used before and then 'switch' to different parameters that haven't been checked yet. I think it's a bit 'hacky' so I would suggest waiting for a fix from Optuna
Hi @<1544853721739956224:profile|QuizzicalFox36> ,
You can use StorageManager.download_file()
to easily fetch files.
None
Can you verify you ~/.clearml.conf
has proper configuration. If you dofrom clearml import Task t=Task.init()
Does this work?
Hi @<1576381444509405184:profile|ManiacalLizard2> , can you please elaborate more on your specific use case? And yes, ClearML supports working only with a specific user currently. What do you have in mind to expand this?
Hi @<1717350332247314432:profile|WittySeal70> , just to clarify, are you talking about the ClearML server itself or about agents?
From my understanding the AMI is simply an image with the ClearML server preloaded to it.
Can you check the apiserver logs for any issues?