What issues are you facing?
It should. Can you provide both the task and machine logs?
Hi @<1772433273633378304:profile|VexedWoodpecker50> , these are the packages that were on the environment that ran the experiment. Please see here - None
I think this is what you're looking for then - NoneTask.add_requirements
Do you have a custom certificate for SSL by chance? If this is the case please see the following:
https://github.com/allegroai/clearml/issues/7
The solution would be changing the following to false:
https://github.com/allegroai/clearml/blob/9624f2c715df933ff17ed5ae9bf3c0a0b5fd5a0e/docs/clearml.conf#L16
Hi @<1769534171857817600:profile|DepressedSeaurchin77> , can you please provide the full screenshot for context?
What about network? Does something return 400 or something of the sort?
Do you mean if they are shared between steps or if each step creates a duplicate?
Hi @<1774245260931633152:profile|GloriousGoldfish63> , what version did you deploy?
You still have the AWS/GCP autoscalers which are great 🙂
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , did you try the solution in GitHub? Did it not help?
Hi @<1749965229388730368:profile|UnevenDeer21> , an NFS is one good option. You can also point all agents on the same machine to the same cache folder as well. Or just like you suggested, point all workers to the same cache on a mounted NFS
With the autoscaler it's also easier to configure a large variety of different compute resources. Although if you're only interested in p4 equivalent instances and on fast demand I can understand the issue
UpsetTurkey67 , I'm not sure. I know that the UI uses the API so everything shown in the UI exists in the backend. So, I just played a bit in iPython with dir(task)
to see what it offers 🙂
I'm thinking maybe you can do it by chaining tasks somehow, but I think that is not the correct way (Never tried it myself, as I said, in the end it all abstracts to a single pipeline).
Maybe check out pipelines with functions or even better decorators, this might be an easier solution for you and you can create very complex pipelines with it. Think about using loops and if statements to create steps and progress your pipeline
Hi NastySeahorse61 ,
It looks like deleting smaller tasks didn't make much of a dent. Do you have any tasks that ran for very long or were very intensive on reporting to the server?
Hi @<1800699527066292224:profile|SucculentKitten7> , you can control the docker image on the task level, either through the code ( Task.set_base_docker
) or through the webUI in the 'execution' section of the task
This will disable storing the uncommitted changes
By default it will use the packages that were detected in the environment. You can override that default behaviour with this.
Hi @<1761199244808556544:profile|SarcasticHare65> , and if you run locally for the same amount of iterations this does not happen?
Can you check the machine status? Is the storage running low?
Hi @<1731483438642368512:profile|LoosePigeon2> , you need to set the following:
sdk: {
development: {
store_code_diff_from_remote: false
store_uncommitted_code_diff: false
On the machine you're running your pipeline from
Hi @<1715175986749771776:profile|FuzzySeaanemone21> , what if you try to register them as https?
Looks like elastic is failing to access a shard. Do you have visibility into machine utilization? How much RAM is elastic consuming?
Also, is this the entire error repeating or is there more context?
You can set the docker image you want to run with using Task.set_base_docker
None
I suggest watching the following videos to get a better understanding:
Agent - None
Autoscaler - None
Also please review agent docs - None
when a task is enqueued when does the autoscaler kicks in?
You're looking for the polling interval parameter as mentioned in the documentation - [None](https://clear.ml/docs/latest/docs/webapp/appl...
GrittyKangaroo27 , I think that ClearML will be just right up your alley then 🙂