Ho StrongHorse8 ,
Yes, each clearml agent can listen to a different queue and use a specific GPU, you can view all the use cases and example in this link https://clear.ml/docs/latest/docs/clearml_agent/#allocating-resources
Hi GleamingGiraffe20 , still getting those errors?
not the parameters, but maybe this can help - https://clear.ml/docs/latest/docs/clearml_data
Can you send me the logs with and without? (you can send the logs in DM if you prefer)
Hi DeliciousBluewhale87
So now you don’t have any failures but gpu usage issue? How about running the ClearML agent in docker mode? You can choose an Nvidia docker image and all the Cuda installations and configuration will be part of the image.
What do you think?
👍 great, so if you have an image with clearml agent, it should solve it 😀
Hi WickedBee96 ,
Are you running a standalone script or some code part of a git repository?
NonchalantDeer14 thanks for the logs, do you maybe have some toy example I can run to reproduce this issue my side?
For 'TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start' message - the iteration reporting is automatically detected if you are using tensorboard , matplotlib , or explicitly with trains.Logger
Assuming there were no reports, so the monitoring falls back to report every 30 seconds.
Thanks for the examples, will try to reproduce it now.
Hi PanickyMoth78 , thanks for the logs, I think I know the issue, i’m trying to reproduce it my side, keeping you updated about it
can you build your own docker image with clearml-agent installed in it?
Hi HealthyStarfish45
If you are running the task via docker, we dont auto detect the image and docker command, but you have more than one way to set those:
You can set the docker manually like you suggested. You can configure the docker image + commands in your ~/trains.conf https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L130 (on the machine running the agent). You can start the agent with the image you want to run with. You can change the base docker image...
Try to clone the task (right click on the task and choose “clone”) and you will get a new task in draft mode, that you can configure ( https://clear.ml/docs/latest/docs/getting_started/mlops/mlops_first_steps#clone-an-experiment )
I will check the aws token, just to verify, you imported the StorageManager after the os.environ calls?
Do you inherit from SearchStrategy in you implementation (you can read about it https://allegro.ai/docs/automation_optimization_searchstrategy.html#automation-optimization-searchstrategy )? If not, can you share how?
About the docstring, thanks 🙂 we will update it with the exceptions.
Hi MotionlessCoral18 , can you check the configuration you added under you profile? is the bucket entry contains the HOST (ENDPOINT) section?
When you are not using the StorageManager you don’t get the OSError: [Errno 9] Bad file descriptor errors?
HelpfulHare30 are you running it from a repository?
How do you load the file? Can you find this file manually?
Hi LazyFish41 , You can specify the pip version in the agent’s configuration file: https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf#L57
the ClearML agent will install pip
So something like https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py#L28 but multi index?
Was wondering if there are plans to add better support for it
Not currently, can you add a https://github.com/allegroai/clearml/issues/new so we do not forget to add such?
Hi LazyFish41 ,
You can use agent.docker_init_bash_script to execute any command at the startup of any docker, so you can use it to install the python version you want to use.
You can specify Set the python version to use when creating the virtual environment and launching the experiment with agent.python_binary
Hi TenseOstrich47 ,
Try using aws credentials with region too https://github.com/allegroai/clearml/blob/master/docs/clearml.conf#L88
credentials: [ specifies key/secret credentials to use when handling s3 urls (read or write) { bucket: "my-bucket-name" key: "my-access-key" secret: "my-secret-key" region: "my-region" },