![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/AgitatedDove14.png)
Reputation
Badges 1
25 × Eureka!I theory this would be doable, but wouldn't it be a bit confusing? Also why not always use containers if the host supports it, there is no real downside, just set the default docker image to something that is a good starting point
Hi FunnyTurkey96
Which pip are you using, basically pip changed the dependency resolver after 20.1
Change: https://github.com/allegroai/clearml-agent/blob/aede6f4bac71c8fc56e7cf982318a48527953a3c/docs/clearml.conf#L57pip_version: "<20.2"
See if that helps
it should be fairly easy to write such a daemon
from clearml.backend_api.session.client import APIClient
client = APIClient()
timestamp = time() - 60 * 60 * 2 # last 2 hours
tasks = client.tasks.get_all(
status=["in_progress"],
only_fields=["id"],
order_by=["-last_update"],
page_size=100,
page=0,
created =[">{}".format(datetime.utcfromtimestamp(timestamp))],
)
...
references:
[None](https://clear.ml/...
I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry)
horrible
...
I have to agree, we are changing this interface, I do not think it is good π
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
because it should have detected it...
Did you see "Repository and package analysis timed out ..."
Maybe you should makeΒ
naming_function
Β as public variable inΒ
SearchStrategy
Β class or allow changing it inΒ
HyperParameterOptimizer
Β class?
I like this idea, let's do that
Just making sure, you hit the 1024 character limit on S3 path?
If this is the case we should also fix the "artifact naming" to take that into account (it already does and has a limit, see here:
https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379...
Hmm, I think the issue is here (the docker command mount)'-v', '/tmp/.clearml_agent.de0n48pm.cfg:/root/clearml.conf'
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way π
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
Can you see the repo itself ? the commit id ?
I would guess that for some reason loglevel is DEBUG, could that be the case?
Yes, I find myself trying to select "points" on the overview tab. And I find myself wanting to see more interesting info in the tooltip.
Yep that's a very good point.
The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.
So what you are saying, this could be a way to multi select experiments for detailed comparison (i.e. selecting the "dots" on the overview graph), is this what you had in mind?
It talks about referencing an issue.
Yes please, just better visibility π
I remember being told that the ClearML.conf on the client will not be used in a remote execution like the above so I think this was the problem.
SubstantialElk6 the configuration should be set on the agent's machine (i.e. clearml.conf that is on the machine running the agent)
- Users have no choice of defining their own repo destination of choice.
In the UI you can specify in the "Execution" tab, Output "destination", a different destination for the models/artifacts. Is this...
it seems it's following the path of the script i'm using to task.create, eg:
The folder it should run it is the script path you are passing (i.e. "script=ep_fn," )
Wrong path would imply that is it not finding the correct repository, is that the case ?
Hi SubstantialElk6
try:--docker "<image_name> --privileged"
Notice the quotes
Just run once (from your python console / pycharm etc.):
https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py
BTW: from the instance name it seems like it is a VM with preinstalled pytorch, why don't you add system site packages, so the venv will inherit all the preinstalled packages, it might also save some space π
DeterminedToad86 see here:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L55
Change it on the agent's conf file to:system_site_packages: true
Β I want to schedule bulk tasks to run via agents, so I'm runningΒ
create
I see, that makes sense.
specially when dealing with submodules,
BTW: submodule diff should always get stored, can you provide some error logs on fail cases?
Before manually modifying the diff:
If you have local commits (i.e. un-pushed) this might fail the diff apply, in that case you can set the following in your clearml.confstore_code_diff_from_remote: true
https://github.com/allegroai/clear...
So you want these two on two different graphs ?
Hi @<1730033904972206080:profile|FantasticSeaurchin8>
Is this only relates to this
https://github.com/coqui-ai/Trainer/issues/7
Or is it a clearml sdk issue?
SubstantialElk6 try to add -e CLEARML_AGENT_EXTRA_PYTHON_PATH=/code/app/flair
It should add it to the runtime pythonpath
(to the BASE DOCKER IMAGE on the Task itself)
Even before we had a chance to properly notice everyone π
Thank you! All the details will follow in a dedicated post, for the time being, I can say that pushing a model with pre/post processing python code and full scalable inference solution has never been easier
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA dri...
I believe AnxiousSeal95 is.
ElatedFish50 any specific reason for the question?