Reputation
Badges 1
662 × Eureka!That still seems to crash SuccessfulKoala55 🤔
EDIT: No, wait, the environment still needs updating. One moment still...
I'll have a look at 1.1.6 then!
And that sounds great - environment variables should be supported everywhere in the config, or then the docs should probably mention where they are and are not supported 🙂
I'll be happy to test it out if there's any commit available?
The thing I don't understand is how come this DOES work on our linux setups 🤔
Yes -- that's what I meant by The title is specified in the plot
. I make the plots manually - title, axes labels, ticks, etc. In that sense, the figure is entirely configured. ClearML just saves it as "untitled 00/plot image"
Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv
has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv
, but the latter does not have poetry installed, and so it crashes?
Thanks SuccessfulKoala55 , I made https://github.com/allegroai/clearml-agent/issues/126 as a suggestion.
Do you have any thoughts on how to expose these... manually?
It does so already for environment variables that prefixed with CLEARML_
, so it would be nice to have some control over that.
Can I query where the worker is running (IP)?
Generally the StorageManager seems a bit slow, even a simple StorageManager.list(...)
on a local path seems to take a long time
The error seems to come from this line:self._driver = _FileStorageDriver(str(path_driver_uri.root))
(line #353 in clearml/storage/helper.py
Where if the path_driver
is a local path, then the _FileStorageDriver
starts with a base_path = '/'
, and then takes extremely long time at iterating over the entire file system (e.g. in _get_objects
, line #1931 in helper.py
)
Here's a full description of the layout:
Remote agent + entire ClearML docker suite running on host A. Host A also has a /data/clearml
folder mounted to it and to it's docker containers (I've edited the docker-compose
to add this mount point) Connect to host A, use StorageManager on the /data/clearml
folder for some early troubles (e.g. long .list
call) Use the same connection to run a task with execute_remotely
and download_folder
and see it crash :disapp...
Indeed with ~
the .root
call ends with an empty string, so it has a bit of different flow
This also appears in the error log:
` StorageManager.download_folder(cache_dir.as_posix(), local_folder=".")
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/manager.py", line 278, in download_folder
for path in helper.list(prefix=remote_url):
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/helper.py", line 596, in list
res = self._driver.list_container_objects(self._container, ex_prefix=prefix)
Fi...
Btw TimelyPenguin76 this should also be a good starting point:
First create the target directory and add some files:sudo mkdir /data/clearml sudo chmod 777 -R /data/clearml touch /data/clearml/foo touch /data/clearml/bar touch /data/clearml/baz
Then list the files using the StorageManager. It shouldn't take more than a few miliseconds.` from clearml import StorageManager
%%timeit
StorageManager.list("/data/clearml")
-> 21.2 s ± 328 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `
Added the following line under volumes
for apiserver
, fileserver
, agent-services
:- /data/clearml:/data/clearml
It's also sufficient to see StorageManager.list("/data/clear")
takes a really long time to return no results
Any leads TimelyPenguin76 ? I've also tried setting up a minio s3 bucket, but I'm not sure if the remote agent has copied the credentials and host 🤔
It does (root in a docker container); it shouldn't touch /run/systemd/generator/systemd-networkd.service
anyway though
FYI @<1523701087100473344:profile|SuccessfulKoala55> (or I might be doing something wrong), but it seems the python migration code comes with carriage returns, so it fails on linux by default (one has to tr -d '\r'
to use it)
EDIT: And also it defaults to /opt/allegro/data
rather than the recommended /opt/clearml/data
which is suggested when installing the server 🤔
It was really easy with the attached code, really 👍
I would only maybe suggest adding in the documentation, that if one uses the default recommended install location, then the script can be run without any command line arguments.
I had to momentarily look at the code to see the default paths match my own (though I could've also looked at --help
default values 😛 )
Perfect now 👌 (also nice cleanup of default_new_data_root
duplicate code :D)
Following up on that (I don't think the K8s helm chart for 1.7.0 is out yet SlimyDove85 , is it?) - but what's the recommended way to backup the mongodb before upgrading on K8s?
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more 😅
Thanks AgitatedDove14 , I'll give it a try. Perhaps additional documentation is needed for that extra_layout
Sorry, not necessarily RBAC (although that is tempting 😉 ), but for now was just wondering if an average joe user has access to see the list of "registered users"?