setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).
This seems like a question to GS storage, maybe we should open an issue there, their backend does the rate limit
My main concern now is that this may happen within a pipeline leading to unreliable data handling.
I'm assuming the pipeline code will have max_workers, but maybe we could have a configuration value so that we can set it across all workers, wdyt?
If
...
but the debug samples and monitored performance metric show a different count
Hmm could you expand on what you are getting, and what you are expecting to get
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
there is almost zero overhead if your docker container alreadyt has everything (including the agent) preinstalled and you set it with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
it then should basically just run the code.
Can clearml-serving does helm install or upgrade?
Not sure I follow, how would a helm chart install be part of the ml running ? I mean clearml-serving is installed via helm chart, but this is a "one time" i.e. you install the clearm-serving and then you can via CLI / python send models to be served there, this is not a "deployed per model" scenario, but a deployment for multiple models, dynamically loaded
SmallBluewhale13 in your code what are you getting when you print the version:from clearml import __version__ print(__version__)
I’m not sure ifÂ
https
 will work because I want to use ssh keys for creds.
BTW: I was not aware github provide pypi like artifactory, do they ?
Regrading SSH keys, they are passed from the host machine (i.e. in venv mode it will use the SSH keys from the user running the agent, and n docker mode, they are automatically mapped into the container)
HugeArcticwolf77 you can add --services-mode to the agent, and it will basically keep on spinning Tasks in parallel (unfortunately the open source version does not include a way to limit it to a maximum of concurrent Tasks)
Hi MysteriousBee56 ,
Yes this is permissions issue, the docker creates all folders as root (as it is the root user running inside the docker), Then when you execute in venv mode, you are running it from your user, which obviously cannot change root created folders.
and the agent default runtime mode is docker correct?
Actually the default is venv mode, to run in docker mode add --docker to the command line
So I could install all my system dependencies in my own docker image?
Correct, inside the docker it will inherit all the preinstalled packages, But it will also install any missing ones (based on the Task requirements. i.e. "installed packages" section)
Also what is the purpose of the
aws
block in the clearml.c...
Is there any documentation on versioning for Datasets?
You mean how to select the version name ?
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
Sadly no 😞
(I mean you could quickly write a reader for TB and report it, but it is not built into the SDK)
Are you running the agent in docker mode? or venv mode ?
Can you manually ssh on port 10022 to the remote agent's machine ?ssh -p 10022 root@agent_ip_here
MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?
$ git ls-remote --get-url origin
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML doesn't pick up model checkpoints automatically.
What's the framework you are using?
BTW:
Task.add_requirements("requirements.txt")
if you want to specify Just your requirements.txt, do not use add_requirements use:
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
(add requirements with a filename does the same thing, but this is more readable)
AdventurousButterfly15
Despite having manually installed this torch version, during task execution agent still tries to install it somehow and fails:
Are you running the agent in venv mode? or docker mode?
Notice that in docker mode it inherits the python packages from the container, and adds/reinstalls missing packages. In venv mode it creates a New clean venv (there is no way to inherit a venv, venv can only inherit from system wide installed packages)
The idea is that you cannot e...
Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects 🙂 it is probably the easiest now that we have support for nested projects.
Hi @<1523701304709353472:profile|OddShrimp85>
You mean something like clearml-serving ?
None
Can you copy the "Installed Packages" here, and point to the package causing the issue?
Hmm that is a good question, are you mounting the clearml.conf somehow ?
How can i find queue name
You can generate as many as you like, the default one is called "default" but you can add new queues in the UI (goto workers & queus page, then Queues, and click "+ New Queue"
MysteriousBee56 that is very strange definitely explains it, kudos on debugging it !!!
FYI all the git pulls are cached even in docker mode so there is no "tax" to pay for pulling the sub-modules (only the first time of course)
Hi ClumsyElephant70
What's the clearml you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)
AstonishingSeaturtle47 yes it does. But I have to ask how come you have sub modules that one will have credentials for the master repo and not the sub ones? Also it sounds like a good solution would be for the trains-agent to try and pull the sub-modules and if it cannot, it should just print a warning and continue. What do you think?