Reputation
Badges 1
25 × Eureka!I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
pytorch DDP
with what backend ? gloo ? nvcc ? openmpi ?
So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface π
So in a simple "all-or-nothing"
Actually this is the only solution unless preemption is supported, i.e. abort running Task to free-up an agent...
There is no "magic" solution for complex multi-node scheduling, even SLURM will essentially do the same ...
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
RoughTiger69 yes I think "Scale" tier covers it π
` task = Task.init(...)
assume model checkpoint
if task.models['output']:
get the latest checlpoint
model_file_or_path = task.models['output'][-1].get_local_copy()
load the model checkpoint
run training code `RoughTiger69 Would the above work for you?
right click on the experiment, select Reset, now you can edit it.
Hi @<1546665638829756416:profile|LovelyChimpanzee39>
anyone know what params I need to pass in order to enable it?
we feel you there π this is an actual plotly feature that we really want to add, but kind of out of our hands: None
feel free to support as there π€
Is this caused by running the script with the arguments
Yep π
Hi JitteryCoyote63
Is this close ?
https://github.com/allegroai/clearml/issues/283
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')
Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
Yes EnviousStarfish54 the comparison is line by line and compared only to the left experiment (like any multi comparison, you have to set the baseline, which is always the left column here, do notice you can reorder the columns and the comparison will be updated)
I guess that was never the intention of the function, it just returns the internal representation. Actually my question would be, how do you use it, and why? :)
Hmm I think this was the fix (only with TF2.4), let me check a sec
I use Yaml config for data and model. each of them would be a nested yaml (could be more than 2 layers), so it won't be a flexible solution and I need to manually flatten the dictionary
Yes, you are correct, the recommended option would be to store it with task.connect_configuration
it's goal is to store these types of configuration files/objects.
You can also store the yaml file itself directly just pass Path object instead of dict/string
BTW: if you make the right column the base line (i.e. move it to the left, you will get what you probably expected)
JitteryCoyote63 did you add the bash script here: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L99
diff line by line is probably not useful for my data config
You could request a better configuration diff feature π Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes π
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
Hi StormyOx60
Yes, by default it assumes any "file://" or local files, are accessible (which makes sense because if they are not, it will not able to download them).
there some way to force it to download the dataset to a specified location that is actually on my local machine?
You can specify a specific folder is not "local" and what it will do it will copy the zip locally and unzip it.
Is this what you are after ?
ComfortableShark77 are you saying you need "transformers" in the serving container?CLEARML_EXTRA_PYTHON_PACKAGES: "transformers==x.y"
https://github.com/allegroai/clearml-serving/blob/6005e238cac6f7fa7406d7276a5662791ccc6c55/docker/docker-compose.yml#L97
we will try to use Triton, but itβs a bit hard with transformer model.
Yes ...
All extra packages we add in serving)
So it should work, you can also run your preprocess class manually from your own machine (for debugging), if you pass to it a local file (basically the downloaded model file from the UI, it should work
it. But itβs maybe not the best solution
Yes... it is not, separating the pre/post to CPU instance and letting triton do the GPU serving is a lot more effici...
can someone show me an example of howΒ
PipelineController.create_draft
I think the idea is to store a draft versio of the pipeline (not the decorator type, I think, but the one launching pre-executed Tasks).
GiganticTurtle0 I'm not sure I fully understand how / why you are using it, can you expand?
EDIT:
However, my intention is ONLY to create it to be executed later on.
Hmm so may like enqueue it?