Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
Can you copy the "Installed Packages" here, and point to the package causing the issue?
ShallowCat10 Thank you for the kind words 🙂
so I'll be able to compare the two experiments over time. Is this possible?
You mean like match the loss based on "images seen" ?
Hi FriendlyKoala70 you can edit the installed package section and add the missing package. See more details on how trains-agent works here (although it's on conda the same rules apply for pip) https://github.com/allegroai/trains-agent/issues/8
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are 🙂
is this configurable?
It is 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
that really depends on hoe much data you have there, and the setup. The upside of the file server is you do not need to worry about credentials, the downside is storage is more expensive
...I'm not sure I follow, the clearml-task is designed to always be used so that at the end the agent will be running the Task. What am I missing?
In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).
Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?
Now I need to figure out how to export that task id
You can always look it up 🙂
How come you do not have it?
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407Seems like it found a matching version and did not use it...
Let me check that
Hi WackyRabbit7
I believe this is fixed in clearml-server 1.1 (this is a plotly color issue), releasing later today or tomorrow 🙂
save off the "best" model instead of the last
Should be relatively easy to update on the main Task the model with the best performance, no?
Hi NastyFox63 could you verify the fix works?pip install git+
. Could you clarify the question for me, please?
...
Could you please point me to the piece of ClearML code related to the downloading process?
I think I mean this part:
https://github.com/allegroai/clearml/blob/e3547cd89770c6d73f92d9a05696018957c3fd62/clearml/datasets/dataset.py#L2134
Hi @<1716987933514272768:profile|SuccessfulPuppy43>
How to make remote ClearML agent do
pip install -e .
in theory there is no need to do that clearml-agent adds the repo root folder to the python path.
If you insist on actually installing it, try to add to your "installed packages" section a "requirement.txt" compatible line:
-e .
Hi @<1657918706052763648:profile|SillyRobin38>
You mean remove the entire serving session? is it still running somewhere ?
(for example if you take the docker-compose down it will be marked aborted automatically after 2 hours)
can you see these metric on TB ?
Hmm SuccessfulKoala55 what do you think?
LazyLeopard18 nice. maybe we should add it in the FAQ / Install. Could you send the exact docker-compose you used and command line, I'll ask the guys to add it 🙂
Another question, do you have the argparse with type=str ?
Yes, actually ensuring pip is there cannot be skipped (I think in the past it cased to many issues, hence the version limit etc.)
Are you saying it takes a lot of time when running? How long is the actual process that the Task is running (just to normalize times here)
So are you saying why do we need to install a specific pip version ?
You can "disable it" by selecting a very high versionpip_version: "<40"https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L67
NICE! MagnificentSeaurchin79 could you PR this fix?
Hi @<1541592204353474560:profile|GhastlySeaurchin98>
During our first large hyperpameter run, we have noticed that there are some tasks that get aborted with the following console log:
This looks like the HPO algorithm doing early stopping, which algo are you using ?