Reputation
Badges 1
2 × Eureka!Can you try setting the env variables to 1
instead of True
? In general, those should indeed be the correct variables to set. For me it works when I start the agent with the following command:
CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent daemon --queue "demo-queue"
Unfortunately, ClearML HPO does not "know" what is inside the task it is optimizing. It is like that by design, so that you can run HPO with no code changes inside the experiment. That said, this also limits us in not being able to "smartly" optimize.
However, is there a way you could use caching within your code itself? Such as using functools' LRU cache? This is built-in in python and will cache function return values if it's ever called again with the same input arguments.
There also see...
Could you tell us what browser (version) you're using, maybe we can recreate ourselves 🙂
Hi Fawad, maybe this can help you get started! They're both c++ and python examples of triton inference. Be careful though, the pre and postprocessing used is specific to the model (in this case yolov4) and you'll have to change it to your own model's needs
Hey PanickyMoth78
Here is an easy to reproduce, working example. Mind the multi_instance_support=True
parameter in the pipeline itself. This code launches 3 pipelines for me just as it should 🙂
` from clearml.automation.controller import PipelineDecorator
import time
PipelineDecorator.set_default_execution_queue("default")
@PipelineDecorator.component()
def step_one():
time.sleep(2)
@PipelineDecorator.component()
def step_two():
time.sleep(2)
@PipelineDecorator.pipel...
Hi @<1534344450795376640:profile|VividSwallow28> ! I've seen your github issue and will answer you there 🙂 I'll leave a link here for others facing the same issue.
The built in HPO uses tags to group experiment runs together and actually use the original optimizer task ID as tag to be able to quickly go back and see where they came from. You can find an example in the ClearML Examples project.
VivaciousBadger56 Thank you for the screenshots! I appreciate the effort. You indeed clicked on the right link, I was on mobile so had to instruct from memory 🙂
First of all: every 'object' in the ClearML ecosystem is a task. Experiments are tasks, so are dataset versions and even pipelines! Each task can be viewed using the experiment manager UI, that's just how the backend is structured. Of course we keep experiments and data separate by giving them a separate tab and different UI, but...
Pipelines! 😄
ClearML allows you to create pipelines, with each step either being created from code or from pre-existing tasks. Each task btw. can have a custom docker container assigned that it should be run inside of, so it should fit nicely with your workflow!
Youtube videos:
https://www.youtube.com/watch?v=prZ_eiv_y3c
https://www.youtube.com/watch?v=UVBk337xzZo
Relevant Documentation:
https://clear.ml/docs/latest/docs/pipelines/
Custom docker container per task:
https://...
It depends on how complex your configuration is, but if config elements are all that will change between versions (i.e. not the code itself) then you could consider using parameter overrides.
A ClearML Task can have a number of "hyperparameters" attached to it. But once that task is cloned and in draft mode, one can EDIT these parameters and change them. If then the task is queued, the new parameters will be injected into the code itself.
A pipeline is no different, it can have pipeline par...
That's what happens in the background when you click "new run". A pipeline is simply a task in the background. You can find the task using querying and you can clone it too! It is places in a "hidden" folder called .pipelines
as a subfolder on your main project. Check out the settings, you can enable "show hidden folders"
RoundMosquito25 it is true that the TaskScheduler
requires a task_id
, but that does not mean you have to run the pipeline every time 🙂
When setting up, you indeed need to run the pipeline once, to get it into the system. But from that point on, you should be able to just use the task_scheduler on the pipeline ID. The scheduler should automatically clone the pipeline and enqueue it. It will basically use the 1 existing pipeline as a "template" for subsequent runs.
Damn it, you're right 😅
# Allow ClearML access to the training args and allow it to override the arguments for remote execution
args_class = type(training_args)
args, changed_keys = cast_keys_to_string(training_args.to_dict())
Task.current_task().connect(args)
training_args = args_class(**cast_keys_back(args, changed_keys)[0])
Thanks again for the extra info Jax, we'll take it back to our side and see what we can do 🙂
No worries! And thanks for putting in the time.
It's been accepted in master, but was not released yet indeed!
As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.
For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict some...
Hi VictoriousPenguin97 ! I think you should be able to change it in the docker-compose file here: https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
You can map the internal 8008 port to another port on your local machine. But beware to provide the different port number to any client that tries to connect (using clearml-init
)
After re-reading your question, it might be difficult to have cross-process communication though. So if you want the preprocessing to happen at the same time as the training and the training to pull data from the preprocessing on the fly, that might be more difficult. Is this your usecase?
Could you use tags for that? In that case you can easily filter on which group you're interested in, or do you have a more impactful UI change in mind to implement groups? 🙂
Also, the answer to blocking on the pipeline might be in the .wait()
function: https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#wait-1
TimelyPenguin76 I can't seem to make it work though, on which object should I run the .wait()
method?
Do you have a screenshot of what happens? Have you checked the console when pressing f12?
Hi PanickyMoth78 , I have made a minimal example and indeed adding multi_instance_support=True
prevents ClearML from killing the process, allowing you to launch pipelines in a loop 🙂
Allright, a bit of searching later and I've found 2 things:
- You were right about the task! I've staged a fix here . It basically detects whether a task is already running (e.g. from the pipelinedecorator component) and if so, uses that task instead. We should probably do this for all of our integrations.
- But then I found another bug. Basically the pipeline decorator task wou...
Can you walk us through how you set up your jupyter instance? If we can recreate your error, we'll be able to help much faster. What's the command you're using to set it up, on which OS are you running it and so on 🙂 Also, have you checked your jupyter server is running on port 8888? Chances are something else is using 8888, so jupyter might be running on some other port like 8889 instead, so ClearML is trying to get a kernel from a completely different service.
Isitdown seems to be reporting it as up. Any issues with other websites?
Cool! 😄 Yeah, that makes sense.
So (just brainstorming here) imagine you have your dataset with all samples inside. Every time N new samples arrive they're just added to the larger dataset in an incremental way (with the 3 lines I sent earlier).
So imagine if we could query/filter that large dataset to only include a certain datetime range. That range filter is then stored as hyperparameter too, so in that case, you could easily rerun the same training task multiple times, on differe...
Wait is it possible to do what i'm doing but with just one big Dataset object or something?
Don't know if that's possible yet, but maybe something like the proposed querying could help here?