
Reputation
Badges 1
25 × Eureka!I wanted to know what the best way to create and register the SSL keys is.
of I see, so basically you need to add it to add nginx with SSL certificates on top of the hosted service (or configure the dockercompose nginx container to add that)
Then you need to add the self signed SSL into any host machine (I'm assuming these are not "valid" SSL certificates generated by a reputable SSL provider)
But generally speaking if you are using self hosted clearml-server on a local machine that n...
That is a good point, when the UI returns a pop error, you have all the info, on "toaster" messages, I guess it does not?!
As long as it does not make the toaster message too large, seems like a good idea to add
You can check the example here, just make sure you add the callback and you are good to go 🙂
https://github.com/allegroai/trains/blob/master/examples/frameworks/keras/keras_tensorboard.py#L107
Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
btw: what's the OS and python version?
GiddyTurkey39 do you have an experiment with the jupyter notebook ?
Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
Yes, you are too quick for the resource monitoring 🙂
Hi SlimyElephant79
As you can imagine, wandb's tracking code would be present across the code modules and I was hoping for a structured approach that would help me transition to ClearMLs experiment tracking.
Do you guys a have a layer in between that does the reporting, or is the codebase riddled with direct reporting calls ? if the latter, then I guess search and replace ? or maybe a module that "converts" wandb call to clearml call ? wdyt?
Hi DangerousDragonfly8
You mean you want to trigger something when users archive a Task ?
Hi FancyWhale93pipe.start()
should actually stop the local pipeline logic execution and fire it on the "services queue".
The idea is that you can launch the pipeline locally, but the actual execution of the entire logic is remote.
You can have the pipeline running locally if you call pipe.start_locally
or also run the steps locally (as sub processes) with pipe.start_locally(run_pipeline_steps_locally=False)
BTW: based on your example, a more intuitive code might be the pi...
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...
What is the specific use case, updating a file on existing dataset and creating a new version?
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
My bad, I worded my question wrong I see,
LOL no worries 🙂
Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113
Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...
is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that
I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
When you have a bit of experience, please suggest a path forward, it will be great to integrate
When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.
But it worked well when using StorageManager and uploading to the minio directly, is that correct?
.. I give my Minio to output_uri argument
How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
Guys FYI:params = task.get_parameters_as_dict()
Hi CourageousDove78
Not the cleanest, but you can basically pass everything here:
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html#post--tasks.get_all
Reasoning is that it is passed almost as is to the server for the actual query.