The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it
I think this is the main issue, how come it does not catch it? Are you using argparser ?
Meaning the node restarted (or actually moved)
Hi SubstantialElk6
I think you are absolutely correct, it seems the glue pops all the arguments, when in fact it should maybe process them a,d convert the --env/-e
What do you think?
Aloso I assume if these are the default arguments they should actually be part of the k8s apply.yaml template no ?
Hi FiercePenguin76
https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/model_module/model_outputmodel.html
Basically:from clearml import OutputModel model = OutputModel() model.update_weights(weights_filename='local_file_here.bin')
In your code, can you print the following:import os print(os.environ.keys())There should be a few keys the Pycharm plugin is sending from the local machine, pointing to the git repo
When you set the pod make sure you mount the clearml local cache folder to the PV
basically /root/.clearml/cache/
Or can I enable agent in this kind of local mode?
You just built a local agent
But first I want to make sure the verify argument is actually used, hence False
Notice that the actual configuration that is used is the https://github.com/allegroai/clearml/blob/b21e93272682af99fffc861224f38d65b42c2354/clearml/backend_config/bucket_config.py#L23
But it is created here:
https://github.com/allegroai/clearml/blob/b21e93272682af99fffc861224f38d65b42c2354/clearml/backend_config/bucket_config.py#L199
In terms of creating dynamic pipelines and cyclic graphs, the decorator approach seems the most powerful to me.
Yes that is correct, the decorator approach is the most powerful one, I agree.
Hi @<1610083503607648256:profile|DiminutiveToad80>
do you have a full log? can you share the code you are trying to run?
With default settings, to upload 2 datasets of 120 GB and 70 Gb it took more than 6 hours!
SmugSnake6 at the end s the an outcome of limited bandwidth or limited CPU ?
Hi EnchantingOstrich20
You how doe s clearml get it there?
In runtime it analyzes the code you are running looking for imports then checks the version you have actively used (i.e. active venv / python) and lists it there.
You can also override those in code, or edit them after you clone the ask and before you enqueue it for remote execution
If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?
suppose I have an S3 bucket where my data is stored and I wish to transfer it to ClearML file server.
Then you first have to download the entire bucket locally, then register the local copy.
Basically:
StorageManager.download_folder("
", "/target/folder")
# now register the local "/target/folder" with Dataset.add_files
Hi DeliciousBluewhale87
Yes that should have worked, can you verify the task status ?
Print(Task.get_task(...).get_status())
I am writing quite a bit of documentation on the topic of pipelines. I am happy to share the article here, once my questions are answered and we can make a pull request for the official documentation out of it.
Amazing please share once done, I will make sure we merge it into the docs!
Does this mean that within component or add_function_step I cannot use any code of my current directories code base, only code from external packages that are imported - unless I add my code with ...
right now I can't figure out how to get the session in order to get the notebook path
you mean the code that fires "HTTPConnectionPool" ?
so I assume clearml moves them from one queue to the other?
Correct. When it creates the k8s job and launches it on the cluster it moves it into the queue.
Can you see it on your k8s cluster (meaning the job/pod)?
Last but not least - can I cancel the offline zip creation if I'm not interested in it
you can override with OS environment, would that work?
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling
task.close()
takes a long time
It actually zips the entire offline folder so you can later upload it. Maybe we can disable that part?!
` # generate the script section
script = (
"fr...
Do we support GPUs in a) docker mode b) k8s glue?
yes on both
Is there a good reference to get started with k8s glue?
A few folks here already set it up, do you have a k8s cluster with GPU support ?
. I wonder if I can extend this to reporting grad_norm per layer.
oh that makes sense, technically I assume so, is this a HF logger option? notice ClearML is already integrated with HF on the HF side, do they report that when TB logger is used?
PungentLouse55 I'm checking something here, you might stumbled on a bug in parameter overriding. Updating here soon ...
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
Hi MistakenDragonfly51
Hello everyone! First, thanks a lot to everyone that made ClearML possible,
❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf" with ...
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
🎉
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar
and you have your credentials in the browser when deleting a Task ?