- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?
"sub nodes" inside pipeline, in my opinion, makes them much more useful, in sense that all the steps are visible.
Yeah I really like this idea... continuing this thread, would it also make sense to have a Task object per "sub-node" and run the sub-nodes as subprocess of the parent Node? I'm thinking this sounds like a combination of both local pipeline execution and remote pipeline execution.
wdyt?
SparklingHedgehong28 this is actually quite cool! Still not sure why not just use the built in autoscaler https://github.com/allegroai/clearml/tree/master/examples/services/aws-autoscaler , but it is a really cool usage of ASG 🤩
Only as "default docker + argument" , if you need the "extra_docker_arguments" (which I think a mount point is a good example for), then you have to add it in the conf file
Hi SmarmyDolphin68
I see this in between my training epochs, what could be causing this?
This is basically saying we are saving a second model on the same Task and even though both are logged, only the last is stored on the Task itself.
This will change as in the next version a Task will be able to hold reference to multiple models in the artifactory 🙂
Martin, thank you very much for your time and dedication, I really appreciate it
My pleasure 🙂
Yes, I have latest 1.0.5 version now and it gives same result in UI as previous version that I used
Hmm are you saying the auto hydra connection doesn't work ? is it the folder structure ?
When is the Task.init is called ?
See example here:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
Weird issue, I'll make sure we fix compatibility with python 3.9
Okay that kind of makes sense, now my followup question is how are you using the ASG? I mean the clearml autoscaler does not use it, so I just wonder on what the big picture, before we solve this little annoyance 🙂
CooperativeSealion8
when it first asks me to enter my full name
Where? in the Web?
Hi GiddyTurkey39
First, yes you can just edit the "installed packages" section and add any missing package (this is equal to requirements.txt)
I wonder why trains
failed detecting the "bigquery" package in the first place... Any thoughts ?
RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/
How does the folder structure look like, and where is the "package" and the entry script ?
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I am running cross_validation, training a bunch of models in a loop like this:
Use the wildcard or disable all together:
task = Task.init(..., auto_connect_frameworks={"joblib": False})
You can also do
task = Task.init(..., auto_connect_frameworks={"joblib": ["realmodelonly.pkl", ]})
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
I'll try to go with this option, I think its actually perfect for my needs
Great!
Hi @<1657918706052763648:profile|SillyRobin38>
You should either disable certificate verification or add the self-signed certificate to your urllib
None
or set
export REQUESTS_CA_BUNDLE="/path/to/cert/file"
export SSL_CERT_FILE="/path/to/cert/file"
Just call the Task.init before you create the subprocess, that's it 🙂 they will all automatically log to the same Task. You can also call the Task.init again from within the subprocess task, it will not create a new experiment but use the main process experiment.
Local IP, like 192.168.1.123
but this is not different from not using clearml-data,
ReassuredTiger98 just making sure we are on the same page. clearml-data immutability is fixed, the user cannot change the content of the dataset (it is actually compressed and uploaded). If you want to change it, you create a new child version
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
Yes, I think the API is probably the easiest:from clearml.backend_api.session.client import APIClient client = APIClient() project_list = client.projects.get_all() print(project_list)
Hi HandsomeCrow5 .
Remember the debug images are events with links to the actual images, so you first have to get the events and then you can download the images with https://allegro.ai/docs/examples/examples_storagehelper/#storagemanager (which by definition has the credentials, because it was able to upload them 🙂
To get the events:from trains.backend_api.session.client import APIClient client = APIClient() client.events.debug_images(task='aabbcc')
Hi SubstantialElk6
you can do:from clearml.config import config_obj config_obj.get('sdk')
You will get the entire configuration tree of the SDK section (if you need sub sections, you can access them with '.' notation, e.h. sdk.storage
)
Hi IrritableOwl63
Yes this seems like a docker setup issue 🙂
either run the agent with sudo (not really recommended 😉 ) or add to suduers :
https://docs.docker.com/engine/install/linux-postinstall/
Hmm I just tested on the community version and it seems to work there, Let me check with frontend guys. Can you verify it works for you on https://app.community.clear.ml/ ?