
Reputation
Badges 1
25 × Eureka!I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue 🙂
DepressedChimpanzee34
What's the hydra version ?
I tested with 1.1.0dev3 and it worked for me
Thanks EnviousStarfish54
Let me check if I can reproduce it
I mean just add the toy tqdm loop somewhere just before starting the lightning train function. I just want to verify that it works, or maybe there is something in the specific setup happening in real-time that changes it
One issue that I see is that the Dockerfile inside the agent container
Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?
So what will you query ?
i.e. runpip install --upgrade trains
Hi RipeGoose2
Could you expand on "inconsistency in the iteration reporting" ? Also "calling trainer.fit multiple" would you expect it to show as a single experiment or is it kind of param search ?
CooperativeSealion8 let me know if you managed to solve the issue, also feel free to send the entire trains-server log. I'm assuming one of the dockers failed to boot...
An upload of 11GB took around 20 hours which cannot be right.
That is very very slow this is 152kbps ...
I think poetry should somehow return error if toml is "empty" then we can detect it...
I have to assume that I do not know the dataset ID
Sorry I mean:
datasets = Dataset.list_datasets(dataset_project="some_project")
for d in datasets:
d["version"] = Dataset.get(dataset_id=d["id"]).version
wdyt?
Hi ElegantCoyote26 , yes I did 🙂
It seems cometml puts their default callback logger for you, that's it.
Weird issue, I'll make sure we fix compatibility with python 3.9
Hi @<1523701132025663488:profile|SlimyElephant79>
I would like to save only the last & best checkpoints and not all of them if possible.
Basically it will mimic the local file system, so if you overwrite the local files it will overwrite the remote model.
You can also disable auto logging, and manually upload the models
In Task.init
pass auto_connect_frameworks
False for the specific framework
see:
[None](https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-lo...
do I need to create a brand new dataset with a new name that inherits from the original?
Yes, you just create a new version, specify the parent one, add changes and close it.
If you later need you can squash a version (same ides as git squash). Make sense ?
Hi MistakenDragonfly51
I'm trying to set
default_output_uri
in
This should be set wither on your client side, or on the worker machine (running the clearml-agent).
Make sense ?
In Azure VMSS, there is a method called "Custom Data", which is basically a way of passing things to be executed
I know that it is in the to do list to add "azure_autoscaler" which is basically asybling to the aws_autoscaler.
With the same idea of the "custom data" as initial bash script:
You can check here:
https://github.com/allegroai/clearml/blob/4a2099b53c09d1feaf0e079092c9e075b43df7d2/clearml/automation/aws_auto_scaler.py#L54
Hi @<1552101447716311040:profile|SteadySeahorse58>
ValueError: Could not find queue named "services"
Did you set an agent / auto-scaler ? where is the pipeline and its components will be running ?
My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?
This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)
JitteryCoyote63 look for the latest RC it should have the fix (output_uri=False) 1.7.3rc1
from your jupyterlab can you do:!curl
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... 😞
I'll make sure we fix it on the next version
Why do you ask? is your server sluggish ?
GrievingTurkey78 I have to admit I can't see the difference, can you help me out 🙂
Oh my bad, post 0.17.5 😞
RC will be out soon, in the meantime you can install directly from github:pip install git+