Notice: dataset_rgb.list_files() will list the content of the dataset, Not the local files:
e.g.: /folder/myfile.ext and not /hone/user/cache/folder/myfile.ext
So basically i think you are just not passing actual files, you should probably do:for local_file in Path(folder_rgb).rglob('*'): ...
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hi @<1570583227918192640:profile|FloppySwallow46>
Hey I have a question, Can you monitor the time for one pipeline,
you mean to see the start / end time of the pipeline?
Click on the details link on the right hand side and you will have all the details on the pipeline task, including running time
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec 🙂
Hi BattyLion34
script_a.py
 generates fileÂ
test.json
 in project folder
So let's assume "script_a" generates something and puts it under /tmp/my_data
Then it can create a dateset from the folder /tmp/my_data , with Dataset.create() -> Dataset.sync -> Dataset.upload -> Dataset.finalize
See example: https://github.com/alguchg/clearml-demo/blob/main/process_dataset.py
Then "script_b" can get a copy of the dataset using "Dataset.get()", see examp...
It seems like the naming Task.create a lot of confusion (we are always open to suggestions and improvements). ReassuredTiger98 from your suggestion, it sounds like you would actually like more control in Task.init (let's leave Task.create aside, as its main function is Not to log the current running code, but to create an auxiliary Task).
Did I understand you correctly ?
Local IP, like 192.168.1.123
the other repos i have are constantly worked on and changing too
Not only it will be cloned automatically, the git diff of the sub-modules are stored as well 🙂
WickedGoat98 Notice this is not the "clearml-agent-services" docker but "clearml-agent" docker image
Also the default docker image is "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
Other than that quite similar :)
however, I don't think it's our code, since the trigger is not triggered at all, unless a new task is created :((
Yeah I think you are correct, I'm more interested in understanding the how you use it ...
BTW can you test with the latest clearml python version (the trigger code is the important part)?
@<1720249421582569472:profile|NonchalantSeaanemone34>
dso = Dataset.create(
dataset_project= project_name,
dataset_name= dataset_name,
parent_datasets=[parent_datasets_id],
)
dso = Dataset.get(
dataset_project= project_name,
dataset_name= dataset_name,
only_completed=True,
only_published=False,
alias='latest',
)
why are you creating a dataset then getting a dataset on the same object?
it seems you are trying to upload...
This means that in your "Installed packages" you should see the line:
Notice that this is not a pypi artifactory (i.e. a server to add to the extra index url for pip), this is a direct pip install from a git repository, hence it should be listed in the "installed packages".
If this is the way the package was installed locally, you should have had this line in the installed packages.
The clearml agent should take care of the authentication for you (specifically here, it should do nothing).
If ...
K8s can schedule pod with different priorities.
I'm not sure I agree here, could you refer me to the docs on this ability in k8s ?
So maybe no real scheduling means there is no ClearML scheduling after applying pod to k8s.
That is correct 🙂
Does it will implement in the future?
Yes, this is enterprise feature, in the community you can specify --max-pods limit (which will cause it never to pull a job if it hits the max-pod limit)
Correct, which makes sense if you have a stochastic process and you are looking for the best model snapshot. That said I guess the default use case would be min/max (and not the global variant)
or shall I call the Task.init even from the agent
WorriedParrot51 I think something is lost here.
Task.init() is always called, even when the agent is executing the code. The difference is in what happens inside the Task.init() call. When the codebase itself is executed by the trains-agent, it signals through OS environment to the task.init() that instead of a new created task, it should use the already created one. from this point all data flows from the trains-server back into the c...
Hi FloppyDeer99
What is the meaning of no real scheduling
I think the meaning is that from the moment a k8s job is created, the k8s is in charge of actually spinning the container. Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
The idea of the cleaml-k8s -glue is that the glue will launch a job on the k8s cluster only if it is sure there are enough resources to actually spin the job now (as opposed to, sometime in the future), this mea...
yes that makes send, I think what happened is one of the processes completed the Task (i.e. closed it) before the others did, and so they threw exception.
I switched to have all tasks in a separate process
I think that's probably the best (performance wise as well), nice!
Hi RattyBat71
Do you tend to create separate experiments for each fold?
If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!
restart the notebook kernel ?
from clearml.backend_api.session.client import APIClient client = APIClient() result = client.queues.get_next_task(queue='queue_ID_here')Seems to work for me (latest RC 1.1.5rc2)
Yes, I think the API is probably the easiest:from clearml.backend_api.session.client import APIClient client = APIClient() project_list = client.projects.get_all() print(project_list)
You can query the system and get all the experiments based on date, then grab the machine GPU metrics.
DefeatedCrab47 check the cleanup service, it queries the system with the Apiclient.
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/examples/services/cleanup/cleanup_service.py#L72
JitteryCoyote63
Picks a new experiment on top of the long one running
This is very very strange. Is the long running experiment being logged (i.e. do you still see console output in the UI)?
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
HiÂ
, if you don't mind having a look too,
With pleasure :)
according to the above I was expecting the config to be auto-magically updated with the new yaml config I edited in the UI, however it seems like an additional step is required.. probably connect_dict? or am I missing something
Notice the OmegaConf section description :Full OmegaConf YAML configuration. This is a read-only section, unless 'Hydra/_allow_omegaconf_edit_' is set to TrueBy default it will alw...