is number of calls performed, not what those calls were.
oh, yes this is just a measure of how many API calls are sent.
It does not really matter which ones
๐ DilapidatedDucks58 how exactly are you "relaunching/continue" the execution? And what exactly are you setting?
or do you mean agent can convert https url to ssh??
Yep it does that automatically if you set: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25
There seems to be a problem with multiprocessing: Although I stopped the task,
You mean you "aborted the task" from the UI?
- There is a memory leak somewhere, please see the screenshot of datadog memory consumptionI'm assuming from the leftover processes ?
Python 3.8/Pytorch 1.11/clearml-sdk 1.9.0/clearml-agent 1.4.1
From the log I see the agent is running in venv mode
Hmm please try with the latest clearml-agent (the others should not have any effect)
Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects ๐ it is probably the easiest now that we have support for nested projects.
iโm working on creating a custom config with istio
That is awesome! let me know if we could help ๐
Also please consider PRing it, I'm sure other users will appreciate the option
t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:
2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads
And there's no sign of updates on the dashboard
Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.task.flush() sleep(45)
So was the issue solved?
named asย
venv_update
ย (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anย
output_uri
ย parameter in theย
PipelineDecorator.componen...
ElegantCoyote26 I don't think Keras logs it anywhere unless you have TB, so nowhere to take the data from...
In short, yes you have to have TB :)
I see now, give me a minute I'll check
PanickyMoth78
Is it limited to
accounts? (
unfortunately, yes ๐ , but I'm sure sales will be able to hook you up ...
(torchvision vs. cuda compatibility, will work on that),
The agent will pull the correct torch based on the cuda version that is available at runtime (or configured via the clearml.conf)
I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui
You can also have the token (token==password) configured as the defauylt user/pass in your agent's clearml.conf
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L19
BTW: could it be the Task.init is Not called on the "module.name" entry point, but somewhere internally ?
SubstantialElk6 (2) yes definitely will be fixed
Regrading (1), what do you mean by "via the code" ? Do you mean like as a Task docker cmd ?
MysteriousBee56 there is no way to tell the trains-agent to pull from local copy of your repository...
You might be able to hack it, if you copy the entire local repo to the trains-agent version control cache. would that help you?
I see, good point. It does look like mostly boiler plate code, not sure where it actually runs the python command, but I'm sure it is there (python.ts, but could not locate who is actually using it)
instead of terminating them once they are inactive, so that they could be available immediately when they are needed.
JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
(since you are using venv mode, if the cuda is not detected at startup time, it will not install the GPU version, as it has no CUDA support)
PompousBeetle71 notice that starting with this version when you set model tags they will be stored as user tags , which you can change and edit in UI. So if you still need the system tags you have to access them directly.
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
None of them is problematic, this is what I'm trying to say ๐
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
potential sources of slow down in the training code
Is there one?
So if everything works you should see "my_package" package in the "installed packages"
the assumption is that if you do:pip install "my_package"
It will set "pandas" as one of its dependencies, and pip will automatically pull pandas as well.
That way we do not list the entire venv you are running on, just the packages/versions you are using, and we let pip sort the dependencies when installing with the agent
Make sense ?
help_models is a dir in the git
And the git is registered on the experiment correctly ?