Reputation
Badges 1
25 × Eureka!I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now 🙂
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
VivaciousWalrus99 any chance the original Task was executed with python2 ?
what do you have for:ls -la /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/
Gitlab has support for S3 based cache btw.
This might still be considered "slow" compared to local-dist/cluster mount
Would adding support for some sort of post task script help? Is something already there?
Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)
Yes, I do have my files in the git repo. Although I have not quite understood which part it takes from the remote git repo, and which part it takes from my local system.
it will do "git pull" on the remote machine and then apply any uncommitted changes it has stored in the Task
It seems that one also needs to explicitly hand in the git repo in the pipeline and task definitions via PipelineController,
Correct, unless the pipeline logic and the steps are the same git repo, you can...
It manages the scheduling process, so no need to package your code, or worry about building dockers etc. It also has an AWS autoscaler, that spins ec2 instances based on the amount of jobs you have in the execution queue, and the limit of your budget (obviously spinning down machines that are idle)
It looks like the tag being used is hardcoded to 1.24-18. Was this issue identified and fixed in later versions?
BoredHedgehog47 what do you mean by "hardcoded 1.24-18" ? tag to what I think I lost context here
Hi DeliciousBluewhale87
My theory is that the clearml-agent is configured correctly (which means you see it in the clearml-server). The issue (I think) is that the Task itself (running inside the docker) is missing the configuration. The way the agent passes the configuration into the docker is by mapping a temporary configuration file into the docker itself. If the agent is running bare-metal, this is quite straight forward. If the agent is running on k8s (or basically inside a docker) th...
PanickyAnt52 when the docker is loaded, it will search for the highest python version to use for the agent. Then when it is launching the Task itself, it will first try to match the python version requested by the Task. It does so by looking for "python3.7" ,
what are you getting when running "which python3.7" inside the docker ? Could it be you have a venv inside the docker with the diff python version ?
... Would not work for huge llm style models.
yes I agree... but then if the model is small enough then you can just keep it in memory ...
LOL 🙂
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
Are they ephemeral or later used by other Tasks, execution etc ?
For example: configuration files, they are specific for an execution, and someone will edit them.
Initial weights files, are something that multiple execution might needs them, and they will be used to restore an execution. Data, even if changing, is usually used by multiple executions tasks etc.
It seems like you treat these files as "configurations", is that right ?
how would I get an agent to launch in the same instance of my clearml server
Actually that is my point, you do not have to spin the agent on the clearml-server instance. We added the services agent as part of the docker-compose for easier deployment, that said you can always manually SSH to the server, or run on any other machine, like you would spin any other clearml-agent .
Does that make sense ?
So it sounds as if for some reason calling Task.init inide a notebook on your jupyterhub is not detecting the notebook.
Is there anything special about the jupyterhub deployment ? how is it deployed ? is it password protected ? is this reproducible ?
@<1546303293918023680:profile|MiniatureRobin9>
, not the pipeline itself. And that's the last part I'm looking for.
Good point, any chance you want to PR this code snippet ?
def add_tags(self, tags):
# type: (Union[Sequence[str], str]) -> None
"""
Add Tags to this pipeline. Old tags are not deleted.
When executing a Pipeline remotely (i.e. launching the pipeline from the UI/enqueuing it), this method has no effect.
:param tags: A li...
Is this reproducible with the hpo example here:
https://github.com/allegroai/clearml/tree/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/examples/optimization/hyper-parameter-optimization
What's your clearml version? (And is it possible you verify with the latest version?)
Hi ElegantCoyote26 , yes I did 🙂
It seems cometml puts their default callback logger for you, that's it.
I have also tried with type hints and it still broadcasts to string. Very weird...
Type hints are ignored, it's the actual value you pass that is important:
` @PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
...
@PipelineDecorator.pipeline(name='custom pipeline logic', project='examples', version='0.0.5')
def executing_pipeline(pickle_url, mock_parameter='mock'):
da...
JitteryCoyote63 to filter out 'archived tasks' (i.e. exclude archived tasks)Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(system_tags=["-archived"])))
Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA dri...
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them?
This seems like an old SDK no?
Hi CleanPigeon16
You need to be able access the machine running the agent, usually the default port will be 10022.
If you need further debug message, add --debug at the beginning of the clearml-session.clearml-session --debug ...To get all the debug print, please upgrade to clearml-session==0.3.3
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
so you have a repo with poetry that some users update and some do not?
All working on the same branch ?
PompousParrot44 please try to reply on the thread, so we do not create a mess in the main channel 🙂
What's the "working directory" in the execution section? Do you have package "test" in the installed packages?
I would expect that after calling Task.enqueue(exit=True), the local task is closed and no processes related to it is running
Ohh my apologies, I did not understand that.
Are you saying that locally you call task.remote_execute(exit_process=True) and it does not leave the local process ?
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430