Reputation
Badges 1
25 × Eureka!GreasyLeopard35
I can update that the fix to UniformIntegerParameterRange should be pushed with tomorrows release 🙂
(which would fix in turn LogUniformParameterRange)
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
yup! that's what I was wondering if you'd help me find a way to change the timings of. Is there an option I can override to make the retry more aggressive?
you mean wait for less?
None
add to your clearml.conf:
api.http.retries.backoff_factor = 0.1
Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
So you are saying 156 chunks, with each chunk about ~6500 files ?
CloudyHamster42 what's the trains-server version ?
Profile page top left corner
That sounds like an issue with "working dir" , check the "Execution" "Working Directory" field.
'.' means the root of the git repository
'subfolder' means run the script from the subfolder etc. also make sure that the script path is adjusted accordingly.
btw: Trains should have filled in all the correct paths... If you have time get the latest trains (0.14.3) and run again see if the problem consts, we should probably fix that bug 🙂
Hmm I just tested on the community version and it seems to work there, Let me check with frontend guys. Can you verify it works for you on https://app.community.clear.ml/ ?
this is the code for task scheduler
So it makes sense the first "scheduled" job is epoch time 0 (1970) because "executes_immediately" basically means it sets a date that passed, so it triggers it. does that make sense ?
Hmm, any suggestion on making it more visible or on the interface ? (I mean deleting the cache file is always a solution, but it sounded quite painful to debug, hence the question)
Oh this is Only in the SaaS server ...
(I'm sorry I was not clear on that)
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merchandise with AI 🚆 😉
Anyhow, this one definitely backfired...
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL 😄
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI 🚆 😉
Anyhow, this one definitely backfired...
BTW: 0.14.3 solved the issue you are referring to, so you can import trains before / parsing the args without an issue. Regrading passing project/name as parameters. A few thoughts: (1) you can always rename / move projects from the UI (2) If you are running it with trains-agent
there is no meaning to these arguments, as by definition the Task was already created... Maybe we should give an option to exclude a few arguments from argparser, I think this topic came up a few times... What d...
Hi PompousBeetle71 , what exactly is the scenario / problem we are trying to solve ?
Hi CooperativeSealion8
Seems like your NoScript addon is blocking the site :)
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
Maybe the only thing to worry about is making sure the IP address is stable, so if k8s replaces the node, you do not have to reconfigure the clients 🙂
Could be nice to write some automation
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib
, is this the...
Hi PlainSquid19
Any model stored by TF/Keras/PyTorch/Joblib will automatically appear in the artifact/models tab.
Are you asking on how to add one manually ?
So if you set it, then all nodes will be provisioned with the same execution script.
This is okay in a way, since the actual "agent ID" is by default set based on the machine hostname, which I assume is unique ?
Hi ShinyPuppy47
getting this error pretty sprotically
What do you mean by "sporadically" ? This should be consistent ,either there is access to the clearml.conf, file or not. no ?!
What is your setup? Is this coming from the agent or manual execution ?
Yes, as long as the client is served from http://app.something.com it will look for the api server at http://api.something.com