![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/AgitatedDove14.png)
Reputation
Badges 1
25 × Eureka!Hover near the edge of the plot, the you should get a "bar" you can click on to resize
(Also can you share the clearml.conf, without actual creds π )
What is the Model url?print(model.url)
Assuming Tensorflow (which would be an entire folder)local_folder_or_files = mode.get_weights_package()
Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?
Maybe I can plot it using other lib.
I remember a while back there was integration with network visualization but it was hard to support and failed to many times...
If you have library that converts the network into html or image you can report it as debug sample?
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7
) then use it to create the new venv
As a result -> Could the agent maybe also output theΒ
virtualenv
Β version used ...
Thanks GreasyPenguin66
How about:!curl
BTW, no need to rebuild the docker, next time you can always do !apt update && apt install -y <package here>
π
SmallDeer34 in theory no reason it will not work with it.
If you are doing a single node (from Ray's perspective)
This should just work, the challenge might be multi-node ray+cleaml (as you will have to use clearml to set the environment and ray as messaging layer (think openmpi etc.)
What did you have in mind?
HighOtter69
By default if you are continuing an experiment it will start from the last iteration of the previous run. you can reset it with:task.set_initial_iteration(0)
was consistent, whereas for some reason this old virtualenv decided to use python2.7 otherwiseΒ
Yes,
This sounds like a virtualenv bug I think it will not hurt to do both (obviously we have the information)
Β
Thank you!!! π
SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230
HighOtter69
Could you test with the latest RC? I think this fixed it:
https://github.com/allegroai/clearml/issues/306
Hmm can you try with additional configuration, next to "secure: true" in your clearml.conf, can you add "verify: false"
Thanks!
In the conf file, I guess this will be where ppl will look for it.
hey, that worked! what library is being used that reads that configuration?
It's passed to boto3, but the pyhon interface and aws cli use different configuration, I guess, because otherwise it should have worked...
VivaciousPenguin66 I have the feeling it is the first space in the URI that breaks the credentials lookup.
Let's test it:from clearml import StorageManager uri = '
` Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
original
StoargeManager.get_local_copy(uri)
qouted
StoargeManager.get_local_copy(uri.replace(' ', '%20')) `
The remaining problem is that this way, they are visible in the ClearML web UI which is potentially unsafe / bad practice, see screenshot below.
Ohhh that makes sense now, thank you π
Assuming this is a one time credntials for every agent, you can add these arguments in the "extra_docker_arguments" in clearml.conf
Then make sure they are also listed in: hide_docker_command_env_vars
which should cover the console log as well
https://github.com/allegroai/clearml-agent/blob/26e6...
Hi @<1657918724084076544:profile|EnergeticCow77>
Can I launch training with HugginFaces accelerate package using multi-gpu
Yes,
It detects torch distributed but I guess I need to setup main task?
It should π€
Under the execution Tab script path, you should see something like -m torch.distributed.launch ...
Hi @<1523711619815706624:profile|StrangePelican34>
if I am trying to deploy 100 models on a GPU that can handle 5 concurrently,
Main limitation is Triton's ability to dynamically load / unload models. We know Nvidia is adding this capability, but I think this is still not out, once they support it, it should be transparent
OddAlligator72 okay, that is possible, how would you specify the main python script entry point? (wouldn't that make more sense rather than a function call?)
How do you determine which packages to require now?
Analysis of the actual repository (i.e. it will actually look for imports π ) this way you get the exact versions you hve, but nit the clutter of the entire virtual environment
You might be able to also find out exactly what needs to be pickled using theΒ
f_code
Β of the function (but that's limited to C implementation of python).
Nice!
OddAlligator72 quick question:
suggest that you implement a simple entry-point API
How would the system get the correct packages / git repo / arguments if you are only passing a single function entrypoint ?
OddAlligator72 what you are saying is, take the repository / packages from the runtime, aka the python code calling the "Task.create(start_task_func)" ?
Is that correct ?
BTW: notice that the execution itself will be launched on other remote machines, not on this local machine
I like the idea of using the timeit interface, and I think we could actually hack it to do most of the heavy lifting for us π
SteadyFox10 TRAINS_CONFIG_FILE or CLEARML_CONFIG_FILE