Reputation
Badges 1
25 × Eureka!And you want all of them to log into the same experiment ? or do you want an experiment per 60sec (i.e. like the scheduler)
Hi @<1730396272990359552:profile|CluelessMouse37>
However, the caching doesn't seem to be working correctly. Despite not changing the configuration, the first step runs every time.
How are you creating the cached component?
is this a standalone script or a git repo link?
These parameters are dictionaries of specific configurations (dict of dict) that are the same but might not be taken into account properly by the caching mechanism.
hmm for the component to be cached (or reuse...
using the docker-compose file for the
clearml-serving
pipeline, do we also have to mount it somehow?
oh yes, you are correct the values are passed using environment variables (easier when using docker compose)
You can in addition add a mount from the host machine to a conf file,
volumes:
- ${PWD}/clearml.conf:/root/clearml.conf
wdyt?
Itβs only on this specific local machine that weβre facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesnβt solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
.
notation works as well as {}
I can install clearml and clearml-agemt and run the worker inside a docker
oh I see, you should install it inside a docker, then mount the docker socket so it can spin sibling containers , ans lastly make sure the mounts are correct with this env variable:
None
Hi @<1747428509627715584:profile|CumbersomeDuck6>
but is it possible to use ClearML in Rust, without writing a wrapper.
With the RestAPI you can...
noticed the API doesnt cover dataset operations but the CLI can.
Yes the CLI will fetch/create datasets for you,
wdyt?
What's the error you are getting ?
My question is, which version do you need docker compose?
Ohh sorry, there is no real restriction, we just wanted easy copy-paste for the installation process.
Can you please tell me if it is possible to set up slack monitoring in clearml?
It is π
This one?
https://clear.ml/docs/latest/docs/guides/services/slack_alerts
We actually added a specific call to stop the local execution and continue remotely , see it here: https://github.com/allegroai/trains/blob/master/trains/task.py#L2409
Hi MinuteWalrus85
This is great question, and super important when training models. This is why we designed a whole system to manage datasets (including storage querying, balancing data, and caching). Unfortunately this is only available in the paid tier of Allegro... You are welcome to https://allegro.ai/enterprise/ the sales guys.
π
is it planned to add a multicursor in the future?
CheerfulGorilla72 can you expand? what do you mean by multicursor ?
ElegantKangaroo44 good question, that depends on where we store the score of the model itself. you can obviously parse the file name task.models['output'][-1].url
and retrieve the score from it. you can also store it on the model name task.models['output'][-1].name
and you can put it as general purpose blob o text on what is currently model.config_text
(for convenience you can have model parse a json like text and use model.config_dict
ImmensePenguin78 it might be... Let me check, worst case sync after the weekend π
(pypi does contain 1.2.0rc4 and we are finalizing tests so that we can release a stable 1.2.0)
Hi @<1523702932069945344:profile|CheerfulGorilla72>
I think more details re needed here:)
directly from the UI from the services queue?
Spin the agent with --service-mode
it will keep pulling jobs from the queue and spinning them (BTW, it will only start the next job after the first one finished the env setup, and you must be running with --docker mode π
Nice! So out of curiosity why didn't it work this time and you had to do it manually?
CheerfulGorilla72 could it be the server address has changed when migrating ?
Yes it is reproducible do you want a snippet?
Already fixed π please ping tomorrow, I think an RC should be out soon with the fix
this is very odd, can you post the log?
@<1523702932069945344:profile|CheerfulGorilla72> use the following bucket name when you are configuring your files/output uri
s3://<iphere>:<porthere>/<bucket_here>
From there everything should work as expected
Yep it is the scale π and yes it should appear once you upgrade
Hi @<1739818374189289472:profile|SourSpider22>
could you send the entire console log? maybe there is a hint somewhere there?
(basically what happens after that is the agent is supposed to be running from inside the container, but maybe it cannot access the clearml-server for some reason)
we run in containers without venv, in the main section, and then delete it or use it for similar experimentsIf this is the case then the idea is the venv creation is actually cached, you can turn it on here (unmark the line)
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116
@<1671689437261598720:profile|FranticWhale40> could you test the fix? just pull & run
allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1
Hi GrotesqueOctopus42 ,
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Yes, that is appreciated π
Basically logs in the thread of the initial message.
To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!
ElegantKangaroo44 I think TrainsCheckpoint
would probably be the easiest solution. I mean it will not be a must, but another option to deepen the integration, and allow us more flexibility.
which part of the code?
the main script?!
but is not part of the package
is the repo it self a package ?
But first I want to make sure the verify argument is actually used, hence False