Reputation
Badges 1
131 × Eureka!We're using Ray for hyperparameter search for non-CV model successfully on ClearML
Oh okay, my initial implementation was not far off:
` task = Task.init(project_name='VINZ', task_name=f'VINZ Retraining {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}')
task.set_progress(0)
print("Training model...")
os.system(train_cmd)
print("✔️ Model trained!")
task.set_progress(75)
print("Converting model to ONNX...")
os.system(f"python export.py --weights {os.path.join(training_data_path, 'runs', 'train', 'yolov5s6_results', 'weights', 'best.pt')} --img...
Do I need to instantiate a task inside my component ? Seems a bit redundant....
upsy sorry didnt read the entire backtrace
Well the credentials are scoped to the entire bucket, but do I have to specify the full uri path ? the model files management is not fully managed like for the datasets ?
Okay! Tho I only see a param to specify a weights url while I'm looking to upload local weights
Well I uploaded datasets in the previous steps with the same credentials
My bad, the specified file did not exists since I forgot to raise an exception if the export command failed >< Well I guess this is the reason, will test that on monday
Ah no i cant since the pipeline is in its own dummy model and you cannot reattach pipelines to real projects so I must instanciate a dummy task just to attach the output model to the correct project
SmugDolphin23 But the training.py has already a CLearML task created under the hood since its integration with ClearML, beside initing the task before the execution of the file like in my snippet is not sufficient ?
Normally it's an environment variable set to the current user, you can check by typing echo $USER
in the terminal
Nice it works 😍
I'll try to update the version in the image I provide to the workers of th autoscaler app (but sadly I don't control the version of those in itself since it's CLearML managed)
Hey SuccessfulKoala55 currently using the clearml
package version 1.7.1
and my server is a PRO SaaS deployment
And running with a Python 3.10
interpreter
Note that you might need to log out login or reboot the machine for the change to take effect
As opposed to the Controller/Task component where the add_step()
only allows to sequentially execute them
I got some credentials issues to in some pipelines steps and I solved it using
task = Task.current_task()
task.setup_aws_upload(...)
It can allows you to explicitaly specify credentials
Does it happens for all your packages or for a specific one ?
Does your current running environment has all the requuired packages, cause your pipeline controller has the run_locally()
option and I'm not sure if the pipeline orchestrator will follow the same logic of installing all your component's imports as dependencies on remote workers if you do not execute on a distant agent but in locall using that option
THat make sense since this function executes your component as classic pythonic functions
Hum, must be more arcane then, I guess the official support would be able to provide an answer, they usually answer within 24 hours
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?
Would have been great if the CLearML resolver would just inline the code of locally defined vanilla functions and execute that inlined code under the import scope of the component from which it is called
But the task appeared with the correct name and outputs in the pipeline and the experiment manager
Yes but not in the controller itself, which is also remotely executed in a docker container
No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps
to 1
(default 50
) to prevent the ClearML iteration logger from timing-out
Make sure you have entered the commandusermod -aG docker $USER
On the VM you are running your agent on