Reputation
Badges 1
25 × Eureka!With default settings, to upload 2 datasets of 120 GB and 70 Gb it took more than 6 hours!
SmugSnake6 at the end s the an outcome of limited bandwidth or limited CPU ?
Hi EnchantingOstrich20
You how doe s clearml get it there?
In runtime it analyzes the code you are running looking for imports then checks the version you have actively used (i.e. active venv / python) and lists it there.
You can also override those in code, or edit them after you clone the ask and before you enqueue it for remote execution
If you spin two agent on the same GPU, they are not ware of one another ... So this is expected behavior ...
Make sense ?
suppose I have an S3 bucket where my data is stored and I wish to transfer it to ClearML file server.
Then you first have to download the entire bucket locally, then register the local copy.
Basically:
StorageManager.download_folder("
", "/target/folder")
# now register the local "/target/folder" with Dataset.add_files
Hi DeliciousBluewhale87
Yes that should have worked, can you verify the task status ?
Print(Task.get_task(...).get_status())
I am writing quite a bit of documentation on the topic of pipelines. I am happy to share the article here, once my questions are answered and we can make a pull request for the official documentation out of it.
Amazing please share once done, I will make sure we merge it into the docs!
Does this mean that within component or add_function_step I cannot use any code of my current directories code base, only code from external packages that are imported - unless I add my code with ...
right now I can't figure out how to get the session in order to get the notebook path
you mean the code that fires "HTTPConnectionPool" ?
so I assume clearml moves them from one queue to the other?
Correct. When it creates the k8s job and launches it on the cluster it moves it into the queue.
Can you see it on your k8s cluster (meaning the job/pod)?
Last but not least - can I cancel the offline zip creation if I'm not interested in it
you can override with OS environment, would that work?
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling
task.close()
takes a long time
It actually zips the entire offline folder so you can later upload it. Maybe we can disable that part?!
` # generate the script section
script = (
"fr...
Do we support GPUs in a) docker mode b) k8s glue?
yes on both
Is there a good reference to get started with k8s glue?
A few folks here already set it up, do you have a k8s cluster with GPU support ?
. I wonder if I can extend this to reporting grad_norm per layer.
oh that makes sense, technically I assume so, is this a HF logger option? notice ClearML is already integrated with HF on the HF side, do they report that when TB logger is used?
PungentLouse55 I'm checking something here, you might stumbled on a bug in parameter overriding. Updating here soon ...
Hi MistakenDragonfly51
Hello everyone! First, thanks a lot to everyone that made ClearML possible,
❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf" with ...
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
🎉
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar
and you have your credentials in the browser when deleting a Task ?
Hi ConvolutedSealion94
Yes this seems like the correct curl
How did you spin the clearml-serving containers? is it with the docker-compose or with the helm chart (I remember that there are some pitfalls with the helm chart, and I would actually start with the local docker-compose to debug it)
Fixing that would make this feature great.
Hmm, I guess that is doable, this is a good point, search for the GUID is not always trivial (or maybe at least we can put in the description the project/dataset/version )
Hi @<1663354518726774784:profile|CrookedSeal85>
I am trying to optimize storage on my ClearML file server when doing a lot of experiments.
This is not straight forward, you will need to get a list of all the events via
None
filter on image events
and then delete the the URL you are getting via the StorageManager.
But to be honest, why not just direct it to S3 or something like that ?
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages then they will always be available for you 🙂
And then also use
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
DeterminedCrab71 that is a good point, how does plotly adjust for nans on graphs?
Let me check the API reference
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all
So not straight query, but maybe:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all_exall section might do the trick.
SuccessfulKoala55 any chance you have an idea on what to pass there ?
that really depends on hoe much data you have there, and the setup. The upside of the file server is you do not need to worry about credentials, the downside is storage is more expensive
Hi ApprehensiveFox95
You mean from code remove the argparse arguments ?
Or post execution in the UI?
The remaining problem is that this way, they are visible in the ClearML web UI which is potentially unsafe / bad practice, see screenshot below.
Ohhh that makes sense now, thank you 🙂
Assuming this is a one time credntials for every agent, you can add these arguments in the "extra_docker_arguments" in clearml.conf
Then make sure they are also listed in: hide_docker_command_env_vars which should cover the console log as well
https://github.com/allegroai/clearml-agent/blob/26e6...
DefeatedOstrich93 can you verify lightning actually only stored once ?
Hmm you either need to run with SUDO or make sure the running user has docker run permissions