Reputation
Badges 1
25 × Eureka!JitteryCoyote63 I think I failed explaining myself.
- I think the problem of the controller is that you are interacting (aka changing hyper parameters)) with a Task created using new SDK version, with an older SDK version. specifically we added section names to the hyper parameters, and only new version of the SDK is aware of it.
Make sense? - Regrading the actual problem. It seems like this is somehow related to the first one, the task at run time is using an older SDK version , and I t...
Thanks FiercePenguin76 , I can totally understand your point on running proper tests, and reluctance to break other things.
I suggest to add a comment with the temp fix that solved the problem for you, and we will make sure the team takes it from there. wdyt?
Hi MinuteGiraffe30
Are you saying that when you are running you code locally with a gitea repository, cleamrl incorrectly adds a link to gitlab ?
how come the previous gitdiff passed ?
AdventurousButterfly15 this one is quite self container:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
So I guess pip install finished working
But the task is evidently not being executed.
This is very odd ... you can run the agent with debugging with --debug --foreground to see all the outputs and logs
With pleasure, I'll make sure we officially release RC1 soon :)
PunySquid88 do you want to test a fix?
@<1523701079223570432:profile|ReassuredOwl55>
Hey, hereβs a quickie β is it possible to specify different βtypesβ of input parameters (βArgs/β¦β) such that they are handled nicely on the front end?
You me cast / checked in the UI ?
Hi GrittyKangaroo27
Maybe check the TriggerScheduler , and have a function trigger something on k8s every time you "publish" a model?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
While I'll look into it, you can do:from clearml import OutputModel output_model = OutputModel() output_model.update_weights("best_model.onnx")
@<1542316991337992192:profile|AverageMoth57> it sounds like you should use SSH authentication for the agent, just setforce_git_ssh_protocol: true
None
And make sure you have the SSH kets on the agent's machine
UpsetBlackbird87pipeline.start()
Will launch the pipeline itself On a remote machine (a machine running the services agent).
This is why your pipeline is "stuck" it is not actually running.
When you call start_lcoally() the pipeline logic itself is runnign on your machine and the nodes are running on the workers.
Makes sense ?
Hi StaleHippopotamus38
I imagine I could make the changes specified in the warning toΒ
/etc/security/limits.conf
Yep seems like elastic memory issue, but I think the helm chart takes care of it,
You can see a reference in the docker compose:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L41
The default cleanup service should work with S3 with a correctly configured clearml service agent if I understand the workings correctly.
Yes I think you are correct
I am referring to the UI.
In that case, no π . This is actually a backend server change (from the UI it should be relatively simple). Is this somehow a showstopper ?
Hi @<1541954607595393024:profile|BattyCrocodile47>
is this on your self hosted machine ?
If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
It can also work by running on multiple known nodes.
Horovod sits on top of openmpi that needs ssh to open multiple nodes, I'm not sure how one would connect it without passing the SSH keys from one node to the other, and making sure they can directly communicate. (Not saying it is not possible, but just a few things to configure before it works, the enterprise edition remove the need for the direct SSH connection between the nodes)
How would i add a glue for multinode?
Basic...
PunySquid88 RC1 is out with a fix:pip install trains-agent==0.14.2rc1
In your trains.conf, change the valuefiles_server: '
s3://ip :port/bucket'
It's just another flag when running the trains-agent
You can have multiple service-mode instances, there is no actual limit π
Hi ReassuredTiger98
Are you referring to the UI (as much as I understand there was an improvement, but generally speaking, it still needs the users to have the S3 credentials in the UI client, not backend)
Or are you asking on the cleanup service ?
WickedGoat98 Same for me, let me ask the UI guys, I think this is a UI bug.
Also maybe before you post the article we could release a fix to both, what do you think?
EDIT:
Never mind π i just saw the medium link, very cool!!!
BattyLion34 is this running with an agent ?
What's the comparison with a previously working Task (in terms of python packages) ?
GiganticTurtle0 found it, fix will be pushed tomorrow π
Also, don't be shy, we love questions π
I "think" you are referring to the venvs cash, correct?
If so, then you have to set it in the clearml.conf running on the host (agent) machine, make sense ?
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π