Reputation
Badges 1
662 × Eureka!I'll have a look at 1.1.6 then!
And that sounds great - environment variables should be supported everywhere in the config, or then the docs should probably mention where they are and are not supported ๐
I'll be happy to test it out if there's any commit available?
The thing I don't understand is how come this DOES work on our linux setups ๐ค
Yes -- that's what I meant by The title is specified in the plot
. I make the plots manually - title, axes labels, ticks, etc. In that sense, the figure is entirely configured. ClearML just saves it as "untitled 00/plot image"
Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv
has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv
, but the latter does not have poetry installed, and so it crashes?
Thanks SuccessfulKoala55 , I made https://github.com/allegroai/clearml-agent/issues/126 as a suggestion.
Do you have any thoughts on how to expose these... manually?
It does so already for environment variables that prefixed with CLEARML_
, so it would be nice to have some control over that.
I can only say Iโve found ClearML to be very helpful, even given the documentation issue.
I think theyโve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info ๐
Can I query where the worker is running (IP)?
Generally the StorageManager seems a bit slow, even a simple StorageManager.list(...)
on a local path seems to take a long time
The error seems to come from this line:self._driver = _FileStorageDriver(str(path_driver_uri.root))
(line #353 in clearml/storage/helper.py
Where if the path_driver
is a local path, then the _FileStorageDriver
starts with a base_path = '/'
, and then takes extremely long time at iterating over the entire file system (e.g. in _get_objects
, line #1931 in helper.py
)
Here's a full description of the layout:
Remote agent + entire ClearML docker suite running on host A. Host A also has a /data/clearml
folder mounted to it and to it's docker containers (I've edited the docker-compose
to add this mount point) Connect to host A, use StorageManager on the /data/clearml
folder for some early troubles (e.g. long .list
call) Use the same connection to run a task with execute_remotely
and download_folder
and see it crash :disapp...
Indeed with ~
the .root
call ends with an empty string, so it has a bit of different flow
This also appears in the error log:
` StorageManager.download_folder(cache_dir.as_posix(), local_folder=".")
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/manager.py", line 278, in download_folder
for path in helper.list(prefix=remote_url):
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/helper.py", line 596, in list
res = self._driver.list_container_objects(self._container, ex_prefix=prefix)
Fi...
Btw TimelyPenguin76 this should also be a good starting point:
First create the target directory and add some files:sudo mkdir /data/clearml sudo chmod 777 -R /data/clearml touch /data/clearml/foo touch /data/clearml/bar touch /data/clearml/baz
Then list the files using the StorageManager. It shouldn't take more than a few miliseconds.` from clearml import StorageManager
%%timeit
StorageManager.list("/data/clearml")
-> 21.2 s ยฑ 328 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each) `
Added the following line under volumes
for apiserver
, fileserver
, agent-services
:- /data/clearml:/data/clearml
It's also sufficient to see StorageManager.list("/data/clear")
takes a really long time to return no results
Any leads TimelyPenguin76 ? I've also tried setting up a minio s3 bucket, but I'm not sure if the remote agent has copied the credentials and host ๐ค
It does (root in a docker container); it shouldn't touch /run/systemd/generator/systemd-networkd.service
anyway though
FYI @<1523701087100473344:profile|SuccessfulKoala55> (or I might be doing something wrong), but it seems the python migration code comes with carriage returns, so it fails on linux by default (one has to tr -d '\r'
to use it)
EDIT: And also it defaults to /opt/allegro/data
rather than the recommended /opt/clearml/data
which is suggested when installing the server ๐ค
It was really easy with the attached code, really ๐
I would only maybe suggest adding in the documentation, that if one uses the default recommended install location, then the script can be run without any command line arguments.
I had to momentarily look at the code to see the default paths match my own (though I could've also looked at --help
default values ๐ )
Perfect now ๐ (also nice cleanup of default_new_data_root
duplicate code :D)
Following up on that (I don't think the K8s helm chart for 1.7.0 is out yet SlimyDove85 , is it?) - but what's the recommended way to backup the mongodb before upgrading on K8s?
No it doesn't, the agent has its own clearml.conf file.
I'm not too familiar with clearml on docker, but I do remember there are config options to pass some environment variables to docker.
You can then set your environment variables in any way you'd like before the container starts