Is there a way to do this using ssh keys?
the .ssh of the host machine should be automatically mounted, you can force it by setting force_git_ssh_protocol: true
None
It is still not working for me. Are you using Linux, windows or macos?
should work for linux mac and windows, what are you using ?
Can't figure out what made it get to this point
I "think" this has something to do with loading the configuration and setting up the "StorageManager".
(in other words setting the google.storage)... Or maybe it is the lack of google storage package?!
Let me check
Hi ReassuredTiger98
Agent's queue priory can be translated to the order the agent will pull jobs from.
Now let's assume we have two agents with priorities A,B for one and B,A for the other. If we only push a Task to queue A, and both agents are idle (implying queue B is empty), there is no guarantee which one will pull the job.
Does that make sense ?
What is the use-case you are trying to solve/optimize for ?
Hi NastyFox63
This seems like most of the reports are converted to pngs (which is what the automagic does if it fails to convert the matplotlib into interactive plot).
no more than 114 plots are shown in the plots tab.
Are you saying we have 114 limit on plots ?
Is this true for "full screen" mode (i.e. not in the experiments table but switch to full detailed view)
Hi VexedCat68
can you supply more details on the issue ? (probably the best is to open a github issue, and have all the details there, so we have better visibility)
wdyt?
Hi @<1554275802437128192:profile|CumbersomeBee33>
what do you mean by "will the dependencies will be removed or not" ?
The next time the agent spin a new Task it will create a new venv and delete the previous one
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
Makes total sense!
Interesting, you are defining the sub-component inside the function, I like that, this makes the code closer to how this is executed!
training loop is within line 469, I think.
I think the model state is just post training loop (not inside the loop), no?
Yes JitteryCoyote63 I think you are correct, this currently the easiest to do. PompousParrot44 notice that you should have a "services" queue with a trains-agent "services mode" running to enqueue those type pf mostly sleeping services 🙂
I was thinking we can quickly create a service that does that, maybe leverage one of these ?
https://github.com/mehrdadmhd/scheduler-py
https://github.com/dbader/schedule
WDYT?
Hi JitteryRaven85
I have also deleted some hyper-params but they appear again when training starts.
Yes you cannot "delete" parameters, as any missing parameter is synced back (making sure you have a full log).
The problem is that when I clone an experiment and change the hyper params some change and some remain the same
Could you expand on which parameters stay the same ? (obviously this should not happen)
Well it seems we forgot that one 😞 I'll quickly make sure it is there.
As a quick solution (no need to upgrade)task.models["output"]._models.keys()
It has to be alive so all the "child nodes" could report to it....
BoredHedgehog47 this is basically a wizard explaining the steps, see the 3 tabs 🙂
BTW, you can launch an experiment directly from CLI with clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task
GreasyPenguin14 let me check with the guys when is the next version .
Are you using the self-hosted server of the community server ?
thanks MagnificentSeaurchin79 , yes that makes it clear.
If that is the case, I think building a container is the easiest solution 🙂
(BTW: You could also build a wheel, if you have setup.py then running is once bdist_wheel will build a wheel, and then install the wheel)
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
Actually I saw that the
RuntimeError: context has already been set
appears when the task is initialised outside
if name == "main":
Is this when you execute the code or when the agent ?
Also what's the OS of your machine/ agent ?
Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration
Yes actually that might be it. Here is how it works,
It launch a thread in the background to do all the analysis of the repository, extracting all the packages.
If the process ends (for any reason), it will give the background thread 10 seconds to finish and then it will give up. If the repository is big, the analysis can take longer, and it will quit
how to make sure it will traverse only current package?
Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.
Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this w...
can we somehow in clearml-session choose the pool of ports for work?
Yes, I think you can.
How do you spin the worker nodes? Is it Kubernetes ?
DepressedChimpanzee34
I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
On "regular" load there is no need for multiple processes, and the memory consumption might be more important than reply lag (at least before you start to scale)
DisturbedWalrus17
By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Can you try with even more ...
Okay found the issue, to disable SSL verification global add the following env variable:CLEARML_API_HOST_VERIFY_CERT=0(I will make sure we fix the actual issue with the config file)
Can you please tell me how to return the folder where the script should run?
add it to the python path
PYTHONPATH="/src/project"
I do not think this is the upload timeout, it makes no sense to me for GCP package (we do not pass any timeout, it's their internal default for the argument) to include a 60sec timeout for upload...
I'm also not sure where is the origin of the timeout (I'm assuming the initial GCP handshake connection could not actually timeout, as the response should be relatively quick, so 60sec is more than enough)