How does the folder structure look like, and where is the "package" and the entry script ?
By the way, will downloading still happen if the datasets is available in the cache folder?
If it is cached, then there is no need to re-download ๐
hardware monitoring etc.
This is averaged and being sent only every 30 seconds, not a lot of calls.
I just saw that I went through the first 200k API calls rather fast, so that is how I rationalized it.
Yes, that's kind of makes sens
Once every 2000 steps, which is every few seconds. So in theory those ~20 scalars should be batched since they are reported more or less at the same time. It's a bit odd that the API calls added up so quickly anyway.
The default flush is ever...
Hi DisgustedDove53
Unfortunately SSO in general is not part of the open-source (the integration is way to complex and will cause too many security issues).
On the paid tier there is full SSO integration including SAML. I'm pretty sure it also has a permission system on-top so you can control visibility / access inside the clearml platform.
...instance to stop
you mean spin the instance down?
If you one each "main" process as a single experiment, just don't call Task.init in the scheduler
Hi GrievingTurkey78
First, I would look at the CLI clearml-data
as a baseline for implementing such a tool:
Docs:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Implementation :
https://github.com/allegroai/clearml/blob/master/clearml/cli/data/main.py
Regrading your questions:
(1) No, a new dataset version will only store the diff from the parent (if files are removed it stored the metadata that says the file was removed)
(2) Yes any get operation will downl...
/opt/clearml/data/fileserver
this is ion the host machine and it is mounted Into the container to /mnt/fileserer
feature is however available in the Enterprise Version as HyperDatasets. Am i correct?
Correct
BTW you could do:datasets_used = dict(dataset_id="83cfb45cfcbb4a8293ed9f14a2c562c0") task.connect(datasets_used, name='datasets') from clearml import Dataset dataset_path = Dataset.get(dataset_id=datasets_used['dataset_id']).get_local_copy()
This will ensure that not only you have a new section called "datasets" on the Task's configuration, buy tou will also be able to replace the datase...
It should preserve the order as the order of the update back (i.e. when executed by the agent) is the same as the order of the keys (obviously py3.7+ becuase it creates dict not Ordered Dicts)
IrritableJellyfish76 point taken, suggestions on improving the interface ?
For example:examples/k8s_glue_example.py --queue k8s_gpu - --namespace pod-clearml-conf ~/trains.conf --template-yaml example/base.yml
I think it would be nicer if the CLI had a subcommand to show the content ofย
~/.clearml_data.json
ย .
Actually, it only stores the last dataset id at the moment, no not much ๐
But maybe we should have a cmd line that just outputs the current datasetid, this means it will be easier to grab and pipe
WDYT?
@<1523701066867150848:profile|JitteryCoyote63>
I just created a new venv and run
pip install "torch==1.11.0.*" --extra-index-url
Then started python:
import torch
torch.cuda.is_available()
And I get True
what are you getting?
This is odd I was running the example code from:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
It is stored inside a repo, but the steps that are created (i.e. checking the Task that is created) do not have any repo linked to them.
What's the difference ?
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why
These are most certainly dataloader process. But clearml-agent when killing the process should also kill all subprocesses, and it might be there is something going on that prenets it from killing the subprocesses ...
Is this easily reproducible ? Can you verify it is still the case with the latest RC of clearml-agent ?
I had no idea it was going to do that and sent your servers over 1.4M API hits unintentionally
Yeah, that is way too much, I think relates to the frequency it updates the console ๐
@<1542316991337992192:profile|AverageMoth57> it sounds like you should use SSH authentication for the agent, just setforce_git_ssh_protocol: true
None
And make sure you have the SSH kets on the agent's machine
Hi @<1544853721739956224:profile|QuizzicalFox36>
http:/34.67.35.46:8081/...
notice there is a / missing in the link, how is that possible? it should be http://
Do you have python 3.7 in the docker ?
๐ CooperativeFox72 please see if you can send a code snippet to reproduce the issue. I'd be happy to solve the it ...
AstonishingSeaturtle47 , makes sense?
I was thinking mainly about AWS.
Meaning S3?
EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?
Happy new year @<1618780810947596288:profile|ExuberantLion50>
- Is this the right place to mention such bugs?Definitely the right place to discuss them, usually if verified we ask to also add in github for easier traceability / visibility
m (i.e. there's two plots shown side-by-side but they're actually both just the first experiment that was selected). This is happening across all experiments, all my workspaces, and all the browsers I've tried.
Can you share a screenshot? is this r...