You can try set_base_docker
:
t = Task.init(project_name="examples", task_name="set docker parames") t.set_base_docker( docker_cmd="nvidia/cuda:11.1", docker_arguments="-e ENV=1", docker_setup_bash_script=['apt update', 'apt-get install -y gcc'] )
But, if you like, you can connect a remote interpreter and debug with PyCharm, locally, without clearml-agent
Hi ItchyHippopotamus18 , can you try withtorch.save(model_jit, os.path.join(checkpoint_path, f'{epoch_num}_{round(acc_full, 4)}.pt'))
?
Hi PanickyMoth78 , thanks for the logs, I think I know the issue, i’m trying to reproduce it my side, keeping you updated about it
Thanks ImpressionableAlligator9 and MagnificentWorm7 for reporting this, I will double check it
Hi ThickDove42 ,
The SETUP SHELL SCRIPT is the bash script to run at the beginning of the docker before launching the Task itself.
You can just try edit it, for example:
apt update apt-get install -y gcc
Hi MammothGoat53 ,
which clearml
version are you using? I run the same and all worked as expected (I changed the project_name
and the task_name
to be 4 chars length)
agree
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
Another process is using the lock, can you specify the ami (and region) so I can try to reproduce it?
Hi PompousHawk82 . Are you running in parallel the several instances of the same code on the same task?
Hi VictoriousPenguin97
sdk.storage.direct_access
is part of the extended support in the paid version.
But I think its not required since ClearML will simply try to access the path directly as it is, and you don’t need to configure it.
Hi MysteriousBee56 ,
The https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py is an example how you can add services to manage your experiments.
You can change the criteria for fetching the tasks in this script (in the https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py#L72 call) to something like a specific tag you can add to the experiments ( delete
tag?, you can add tag to multi tasks) and it should...
and using https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L140 for running scripts at docker startup
Hi ArrogantBlackbird16 ,
How do you generate and run your tasks? Do you use the same flow as in the https://clear.ml/docs/latest/docs/fundamentals/agents_and_queues#agent-and-queue-workflow ? Some other automation?
Hi LazyTurkey38 ,
Yes, it will create a virtual env for the task
You can add a limitation to the query page size:task_filter = {"page_size": <your-limit>, "page": 0}
what do you think?
Hi ImmensePenguin78 ,
You can get all the console outputs using task.get_reported_console_output()
. can this do the trick?
Hi GleamingGiraffe20 ,
Without adding Task.init
, i’m getting some OSError: [Errno 9] Bad file descriptor
error, do you get those too?
Do you run your script from CLI or IDE (pycharm maybe?)?
FierceFly22 like Elior wrote, you can use Task.execute_remotely
, just need to supply the queue name 🙂
Hi ShinyLobster84 , where do you usually install XXXXX package from? or some artifactory?
Hi DefiantShark80 ,
task.report_scalar() # does not always work
what do you mean? report_scalar not sending the info or raising an error?
Hi FloppyDeer99 ,
It depends on you setup:
if you have on prem machines, you can start more than one clearml-agent on the machine with the resources and assign for example each gpu on the machine to a https://clear.ml/docs/latest/docs/clearml_agent#docker-mode . You can have the same for cloud machine, and if you are using the AWS you can run the https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler/ as a service. K8S: there is a great example for k8s glue https://github.com/...
Sure, with clearml
and clearml-agent
you get autoscaling for your machines (with monitoring) and automation for your tasks that will handle all for you (docker images, manage the credentials …).
There are much more parts in the system, so maybe you can share a use case so I can help you with it?
Hi UnevenDolphin73 , Do you get any preview run this example - https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py ?
PompousParrot44 since imshow
display data as an image, we currently log it into the debug samples section as this is the section natively used to display and interact with images.
Do you think it should be under plots?
Hey PanickyMoth78 ,
Regardingclearml.utilities.locks.exceptions.LockException: [Errno 11] Resource temporarily unavailable
I read a bit https://bugs.python.org/issue43743 , can you try the suggested workaround (just for the check)?
adding
import shutil shutil._USE_CP_SENDFILE = False
on top?
Hi LazyTurkey38 ,
Basically, when the agent clones the repo and switches into the dev, I want it to run a script that is in the repo that:
Installs some pip dependencies (I kind of want these regardless of any extra deps specific to the task).
what about adding all your packages to the task? can this help? you can do it with Task.add_requirements(package_name, version)
Generate some
py
files/symlinks. This is needed when you use custom C extensions / ANTLR ...
So just after the clone, before creating the env?
Hi FloppyDeer99 ,
In other words, docs introduce that ClearML Open Source supports orchestration, how can I found the relating codes?
You can find many examples https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps/ , if you have a specific use case you want to check, please share and I can send an example of it.
And what the role of clearml-agent in orchestration, a combination of kube-scheduler and kubelet?
ClearML agent is an ML-Ops tool for users to r...