Reputation
Badges 1
662 × Eureka!Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials'
to the extra_vm_bash_script
and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
Yes it would be π
Visualization is always a difficult topic... I'm not sure about that, but a callback would be nice.
One idea that comes to mind (this is of course limited to DataFrames), but think the git diff
, where I imagine 3 independent section:
Removed columns (+ truncated preview of removed values) (see below) Added columns (+ truncated preview of removed values)
The middle column is then a bit complicated, but I would see some kind of "shared columns" dataframe, where each ...
Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC
SuccessfulKoala55 That at least did not work, unless one has to specify wildcard patterns perhaps..?
Same result π This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug π€
That still seems to crash SuccessfulKoala55 π€
EDIT: No, wait, the environment still needs updating. One moment still...
Oh, well, no, but for us that would be one way solution (we didn't need to close the task before that update)
I'll have yet another look at both the latest agent RC and at the docker-compose, thanks!
There was no "default" services agent btw, just the queue, I had to launch an agent myself (not sure if it's relevant)
Any updates on this? We can't do anything with our K8s since this 404...
@<1523701070390366208:profile|CostlyOstrich36> I added None btw
I'll have a look at 1.1.6 then!
And that sounds great - environment variables should be supported everywhere in the config, or then the docs should probably mention where they are and are not supported π
I'll be happy to test it out if there's any commit available?
The thing I don't understand is how come this DOES work on our linux setups π€
Yes -- that's what I meant by The title is specified in the plot
. I make the plots manually - title, axes labels, ticks, etc. In that sense, the figure is entirely configured. ClearML just saves it as "untitled 00/plot image"
Or to be clear, the environment installed by the autoscaler under /clearml_agent_venv
has poetry installed, and it uses that to set up the environment for the executed task, e.g. in root/.clearml/venvs-builds/3.10/task_repository/.../.venv
, but the latter does not have poetry installed, and so it crashes?
Thanks SuccessfulKoala55 , I made https://github.com/allegroai/clearml-agent/issues/126 as a suggestion.
Do you have any thoughts on how to expose these... manually?
It does so already for environment variables that prefixed with CLEARML_
, so it would be nice to have some control over that.
I can only say Iβve found ClearML to be very helpful, even given the documentation issue.
I think theyβve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info π
Can I query where the worker is running (IP)?
Generally the StorageManager seems a bit slow, even a simple StorageManager.list(...)
on a local path seems to take a long time
The error seems to come from this line:self._driver = _FileStorageDriver(str(path_driver_uri.root))
(line #353 in clearml/storage/helper.py
Where if the path_driver
is a local path, then the _FileStorageDriver
starts with a base_path = '/'
, and then takes extremely long time at iterating over the entire file system (e.g. in _get_objects
, line #1931 in helper.py
)
Here's a full description of the layout:
Remote agent + entire ClearML docker suite running on host A. Host A also has a /data/clearml
folder mounted to it and to it's docker containers (I've edited the docker-compose
to add this mount point) Connect to host A, use StorageManager on the /data/clearml
folder for some early troubles (e.g. long .list
call) Use the same connection to run a task with execute_remotely
and download_folder
and see it crash :disapp...
Indeed with ~
the .root
call ends with an empty string, so it has a bit of different flow
This also appears in the error log:
` StorageManager.download_folder(cache_dir.as_posix(), local_folder=".")
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/manager.py", line 278, in download_folder
for path in helper.list(prefix=remote_url):
File "/home/idan/.clearml/venvs-builds/3.7/lib/python3.7/site-packages/clearml/storage/helper.py", line 596, in list
res = self._driver.list_container_objects(self._container, ex_prefix=prefix)
Fi...
Btw TimelyPenguin76 this should also be a good starting point:
First create the target directory and add some files:sudo mkdir /data/clearml sudo chmod 777 -R /data/clearml touch /data/clearml/foo touch /data/clearml/bar touch /data/clearml/baz
Then list the files using the StorageManager. It shouldn't take more than a few miliseconds.` from clearml import StorageManager
%%timeit
StorageManager.list("/data/clearml")
-> 21.2 s Β± 328 ms per loop (mean Β± std. dev. of 7 runs, 1 loop each) `
Added the following line under volumes
for apiserver
, fileserver
, agent-services
:- /data/clearml:/data/clearml
It's also sufficient to see StorageManager.list("/data/clear")
takes a really long time to return no results