Reputation
Badges 1
25 × Eureka!Hi TrickyRaccoon92
BTW: checkout the HP optimization example, it might make things even easier π https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
how did you install trains?pip install git+
And when runningΒ
get
Β the files on the parent dataset will be available as links.
BTW: if you call get_mutable_copy() the files will be copied, so you can work on them directly (if you need)
Sorry, on the remote machine (i.e. enqueue it and let the agent run it), this will also log the print π
NastyOtter17 can you provide some more info ?
yes, it worked. thank you very much.
ScantCrab97 nice!
. it was indeed a matter of the subnets....
BrightRabbit75 you are awesome, thank you!
(now we probably need to add it to the faq somewhere?!)
If you are using the latest RC:pip install clearml==0.17.5rc5You can pass True it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Essentially, I think the key thing here is we want to be able to build the entire Pipeline including any updates to existing pipeline steps and the addition of new steps without having to hard-code any Task IDβs and to be able to get the pipelineβs Task ID back at the end.
Oh if this is he case then basically you CI/CD code will be something like:
@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_...
pywin32 isnt in my requirements file,
CloudySwallow27 whats the OS/env ?
(pywin32 is not in the direct requirements of the agent)
Getting the last checkpoint can be done via.
Task.get_task(task_id='aabbcc').models['output'][-1]
SmarmySeaurchin8 what do you think?
https://github.com/allegroai/trains/issues/265#issuecomment-748543102
task.connect_configuration
What do you mean by a custom queue ?
In the queues page you have a plus button, this will just create a new queue
Sure thing, thanks FlutteringWorm14 !
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
can we also put the path to the CA?
Yes :)
Assuming from previous threads this is run on K8s , I think a configuration is missing, use system packages:
https://github.com/allegroai/clearml-agent/blob/cb6bdece39751eaef975287609b8bab603f116e5/docs/clearml.conf#L57
Hi @<1539055479878062080:profile|FranticLobster21>
Like this?
https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[β¦]ation/hyper-parameter-optimization/hyper_parameter_optimizer.py
[https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[β¦]ation/hyper-parameter-opt...
GiganticTurtle0 I think I located the issue:
it seems the change is in "config" (and for some reason it stores the entire dict) but the split values are not changed.
Is this it?
AntsySeagull45 kudos on sorting it out π
quick note, trains-agent will try to run the python version specified by the original Task. i.e. if you were running python3.7 it will first try to look for python 3.7 then if it is not there it will run the default python3. This allows a system with multiple python versions to run exactly the python version you had on your original machine. The fact that it was trying to run python2 is quite odd, one explanation I can think of is if the original e...
Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
That's the only explanation ...
But the weird thing is, it did not work on my linux box?!
Sounds good let's work on it after the weekend, π
This will mount the trains-agent machine's hosts file into the docker
LOL yes π
just make sure it won't be part of the uncommitted changes of the AWS autoscaler π
Notice that the new pip syntax:packagename @ <some_link_here>Is actually interpreted by pip as :
Install "packagename" if it is not installed use the @ "<some_link_here>" to install it.
IdealPanda97 Hmm I see...
Well, unfortunately, Trains is all about free access to all π
That said, the Enterprise edition does add permissions and data management on top of Trains. You can get in touch through the https://allegro.ai/enterprise/#contact , I'm sure someone will get back to you soon.
Makes sense to add it to docker run by default if GPUs are mentioned in agent.
I think this is an arch thing, --privileged is not needed on ubuntu flavor, that said you can always have it if you add it here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
clearml-agent daemon --gpus 0 --queue default --docker
But docker still sees all GPUs.
Yes --gpus should be enough, are you sure regrading the --privileged flag ?