Reputation
Badges 1
25 × Eureka!Pycharm does get confused sometimes
Damn, JitteryCoyote63 seems like a bug in the backend, it will not allow you to change the task type to the new types 😞
JitteryCoyote63 good news
not trains-server error, but trains validation error, this is easily fixed and deployed
Please do, just so it wont be forgotten (it won't but for the sake of transparency )
Thanks GrievingTurkey78 !
It seems that under the hood they user argparser
See here:
https://github.com/google/python-fire/blob/c507c093fa6622ab5efee21709ffbf25974e4cf7/fire/parser.py
Which means it might just work?!
What do you think?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Start with the simple scikit learn example
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
The pipeline example is more complicated, it needs the base endpoints, start simple 😃
or even different task types
Yes there are:
https://clear.ml/docs/latest/docs/fundamentals/task#task-types
https://github.com/allegroai/clearml/blob/b3176a223b192fdedb78713dbe34ea60ccbf6dfa/clearml/backend_interface/task/task.py#L81
Right now I dun see differences, is this a deliberated design?
You mean on how to use them? I.e. best practice ?
https://clear.ml/docs/latest/docs/fundamentals/task#task-states
Okay, so I think it doesn't find the correct Task, otherwise it wouldn't print the warning,
How do you setup the HPO class ? Could you copy paste the code?
Hi @<1664079296102141952:profile|DangerousStarfish38>
You mean spin the agent on multiple Windows machines? Yes that is supported, I think that it is limited to venv (i.e. not docker) mode, but other than that should work out of the box
But what should I do? It does not work, it says incorrect password as you can see
How are you spinning the agent machine ?
Basically 10022 port from the host (agent machine) is routed into the container, but it still needs to be open on the host machine, could it be it is behind a firewall? Are you (client side runnign clearml-session) on the same network as the machien runnign the agent ?
i'm sorry, I mean if the queue name is not provided to the agent , the agent will look for the queue with the "default" tag. If you are specifying the queue name, there is no need to add the tag.
Is it working now?
Hi PanickyMoth78
So do not tell anyone, but the next version will have reports built in clearml, as well as the ability to embed graphs in 3rd party (think Notion GitHub, markdown etc.)
Until then (ETA mid Dec), the easiest is to download an image or just use the url (it encodes the full view, so when someone clicks on it they the exact view you are seeing)
Hmm so you are saying you have to be logged out to make the link work? (I mean pressing the link will log you in and then you get access)
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... 😞
I'll make sure we fix it on the next version
Should work in all cases, plotly/matplotlib/scalar_rerport
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages
then they will always be available for you 🙂
And then also use
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara
first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
Sure 🙂
BTW: clearml-agent will mount your host .ssh into the docker to /root/.ssh by default.
So no need to do that manually
BoredHedgehog47 could it be "python" python points to python 2.7 inside your container, as opposed to python3 on your machine
(this error is python2 trying to run python 3 code)
https://stackoverflow.com/questions/20555517/using-multiple-versions-of-python"Training classifier with command:\n python -m sfi.imagery.models.bbox_predictorv2.train
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec 🙂
Hi CooperativeFox72
Sure 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Hmmm, that actually connects with something we were thinking about: introducing sections to the hyper parameters. This way we could easily differentiate between the command line arguments and other types of parameters. DilapidatedDucks58 what do you think?
Both are fully implemented in the enterprise version. I remember a few medical use cases, and I think they are working on publishing a blog post on it, not sure. Anyhow I suggest you contact the sales people and I'm sure they will gladly setup a call/demo/PoC.
https://allegro.ai/enterprise/#contact
Hi AttractiveCockroach17
. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
AttractiveCockroach17 could it be Hydra actually kills these processes?
(I'm trying to figure out if we can fix something with the hydra integration so that it marks them as aborted)
So as you say, it seems hydra kills these
Hmm let me check in the code, maybe we can somehow hook into it
AttractiveCockroach17 can I assume you are working with the hydra local launcher ?
SubstantialElk6 This seems to be the issuecp: failed to access '/root/default_clearml.conf': Permission denied clearml_agent: ERROR: Could not find task id=024a421c0e174650a1c7ff64af756c26 (for host: )
Notice it seems it just cannot read the clearml.conf
, wdyt?
I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
The docker image for the service-agent is not root ?
@<1597762318140182528:profile|EnchantingPenguin77> can you provide the full log?