Reputation
Badges 1
25 × Eureka!But this is clearml python package, it is not really related to the server. Could it be you also update the clearml package ?
WittyOwl57 I think this is a great idea, can you open a feature issue on GitHub so this is not forgotten ?
BTW: regardless, if you have time to upgrade to the new the azure package upgrade, it will be great π this is on our to do list for a while, but since not a lot of users complained it got pushed ...
BTW: if you could implement _AzureBlobServiceStorageDriver
with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
task = Task.init(project_name='debug', task_name='test tqdm cr cl') print('start') for i in tqdm.tqdm(range(100), dynamic_ncols=True,): sleep(1) print('done')
This code snippet works as expected (console will show the progress at the flush interval without values in between). What's the difference ?!
So assuming they are all on the same LB IP: You should do:
LB 8080 (https) -> instance 8080
LB 8008 (https) -> instance 8008
LB 8081 (https) -> instance 8081
It might also work with:
LB 443 (https) -> instance 8080
We're not using a load balancer at the moment.
The easiest way is to add ELB and have amazon add the httpS on top (basically a few clicks on their console)
@<1687643893996195840:profile|RoundCat60> can you access the web UI over https ?
Sure, run:clearml-agent init
It is a CLI wizard to configure the initial configuration file.
If you could provide the specific task ID then it could fetch the training data and study from the previous task and continue with the specified number of trainings.
Yes exactly, and also all the definitions for the HPO process (variables space, study etc.)
The reason that being able to continue from a past study would be useful is that the study provides a base for pruning and optimization of the task. The task would be stopped by aborting when the gpu-rig that it is using is neede...
Hi UnevenBee3
the optuna study is stored on the optuna class
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L186
And actually you could store and restore it
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L104
I think we should improve the interface though, maybe also add get_study(), wdyt?
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
Hi TartBear70
I'm setting up reproducibility myself but when I call Task.init() the seed is changed
Correct
. Is it possible to tell clearml not to initialize any rng? It appears that task.set_random_seed() doesn't change anything.
I think this is now fixed (meaning should be part of the post weekend release)
. Is this documented?
Hmm i'm not sure (actually we should write it, maybe in Task.init docstring?)
Specifically the function that is being called is:
https://gi...
That is odd, can you send the full Task log? (Maybe some oddity with conda/pip ?!)
he problem is due to tight security on this k8 cluster, the k8 pod cannot reach the public file server url which is associated with the dataset.
Understood, that makes sense, if this is the case then the path_substitution
feature is exactly what you are looking for
So this is an additional config file with enterprise?
Extension to the "clearml.conf" capabilities
Is this new config file deployable via helm charts?
Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry π )
HappyDove3 where are you running the code?
(the upload is done in the background, but it seems the python interpreter closed?!)
You can also wait for the upload:task.upload_artifact(name="my artifact", artifact_object=np.eye(3,3), wait_on_upload=True)
HappyDove3
see here https://github.com/allegroai/clearml-pycharm-plugin π
Hi HappyDove3task.set_script
is a great way to add the info (assuming the .git is missing)
Are you running it using PyCharm? (If so use the clearml pycharm plugin, it basically passes the info from your local git to the remote machine via OS environment)
I'm suggesting to make it public.
Actually I'm thinking of enabling users to register Drivers in runtime, expanding the capability to support any type of URL link, meaning you can register "azure://" with AzureDriver, and the StorageHelper will automatically use the driver you provide.
This will make sure Any part of the system will be able to transparently use any custom driver.
wdyt?
Hi WittyOwl57
That's actually how it works (original idea/design was borrowed from libclound), basically you need to create a Drive, then the storage manger will use it.
Abstract class here:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L51
Is this what you had in mind ?
What I'd really want is the same behaviour in the console (one smooth progress bar) and one line per epoch in the logs; high hopes, right?
I think they send some "odd" character instead of CR, otherwise I cannot explain the difference.
Can you point to a toy example demonstrating the same issue ?
Also I just tried the pytorch-lightningΒ
RichProgressBar
Β (not yet released) instead of the default (which is unfortunately based on tqdm) and it works great.
Yey!
WittyOwl57 I can verify the issue reproduces! π !
And I know what happens, TQDM is sending an "up arrow" key, if you are running inside bash, that looks like CR (i.e. move the cursor to the begining of the line), but when running inside other terminals (like PyCharm or ClearML log) this "arrow-key" is just unicode character to print, it does nothing, and we end up with multiple lines.
Let me see if we can fix it π
WittyOwl57 this is what I'm getting on my console (Notice New lines! not a single one overwritten as I would expect)
` Training: 10%|β | 1/10 [00:00<?, ?it/s]
Training: 20%|ββ | 2/10 [00:00<00:00, 9.93it/s]
Training: 30%|βββ | 3/10 [00:00<00:00, 9.89it/s]
Training: 40%|ββββ | 4/10 [00:00<00:00, 9.87it/s]
Training: 50%|βββββ | 5/10 [00:00<00:00, 9.87it/s]
Training: 60%|ββββββ | 6/10 [00:00<00:00, 9.88it/s]
Training: 70%|βββββββ | 7/10 [00:00<00...
I might have found it, tqdm is sending{ 1b 5b 41 } unicode arrow up?
https://github.com/horovod/horovod/issues/2367
Can you reproduce this behavior outside of lightning? or in a toy example (because I could not)