
Reputation
Badges 1
625 × Eureka!I've updated my feature request to describe that as well. A textual description is not necessarily a preview π For now I'll use the debug samples.
These kind of things definitely show how ClearML was designed originally only for neural networks tbh, where images are almost always only part of the dataset. Same goes for the consistent use of iteration
everywhere π
Actually TimelyPenguin76 I get only the following as a "preview" -- I thought the preview for an image would be... the image itself..?
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the reply!
Iβm familiar with the above repo, we have the ClearML Server and such deployed on K8s.
Whatβs lacking is documentation regarding the clearml-agent helm chart. What exactly does it offer, etc.
Weβre interested in e.g. using karpenter to scale our deployments per demand, effectively replacing the AWS autoscaler.
Just because it's handy to compare differences and see how the data changed between iterations, but I guess we'll work with that π
We'll probably do something like:
When creating a new dataset with a parent (or parents), look at immediate parents for identically-named files If those exist, load those with matching framework (pyarrow, pandas, etc), and log differences to the new dataset π
Let me know if you do; would be nice to have control over that π
Opened this - https://github.com/allegroai/clearml/issues/530 let me know if it's not clear enough FrothyDog40 !
I'm using some old agent I fear, since our infra person decided to use chart 3.3.0 π
I'll try with the env var too. Do you personally recommend docker over the simple AMI + virtual environment?
More complete log does not add much information -Cloning into '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx'... fatal: could not read Username for '
': terminal prompts disabled fatal: clone of '
` ' into submodule path '/root/.clearml/venvs-builds/3.10/task_repository/...
Nope, no .netrc
defined anywhere, really (+I've abandoned the use of docker for the autoscaler as it complicates things, at least for now)
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
Here's how it failed for us π
poetry
stores git related data in poetry.lock
, so when you pip list
, you get an internal package we have with its version, but no git reference, i.e. internal_module==1.2.3
instead of internal_module @ git+https://....@commit
.
Then pip
actually fails (our internal module is not on pypi), but poetry
suceeds
The SDK is fine as it is - I'm more looking at the WebUI at this point
I wouldn't put past ClearML automation (a lot of stuff depend on certain suffixes), but I don't think that's the case here hmm
Different AMI image/installing older Python instances that don't enforce this...
For future reference though, the environment variable should be PIP_USE_PEP517=false
Sure! It's a bit intricate as it accommodates many of our different plotting functionalities, but this consists of the important bits (I realize we have some bad naming here, but fig[0]
is actually a Figure object, and fig[1]
is an Axes object):
` plt.switch_backend('agg')
sns.set_theme(...)
fig = plt.subplots(...)
sns.histplot(data, ax=fig[1], ...)
fig[1].set_xlim(...)
fig[1].set_ylim(...)
fig[1].legend(loc='best')
fig[1].set_xlabel(xlabel)
fig[1].set_ylabel(ylabel)
fig[1].set_...
Because setting env vars and ensuring they exist on the remote machine during execution etc is more complicated π
There are always ways around, I was just wondering what is the expected flow π
AgitatedDove14 Basically the fact that this happens without user control is very frustrating - https://github.com/allegroai/clearml/blob/447714eaa4ac09b4d44a41bfa31da3b1a23c52fe/clearml/datasets/dataset.py#L191
We just do task.close() and then start a new task.Init() manually, so our "pipelines" are self-controlled
Hey FrothyDog40 ! Thanks for clarifying - guess we'll have to wait for that as a feature π
Should I create a new issue or just add to this one? https://github.com/allegroai/clearml/issues/529
IIRC, get_local_copy()
downloads a local copy and returns the path to the downloaded file. So you might be interested in e.g.local_csv = pd.read_csv(a_task.artifacts['train_data'].get_local_copy())
With the models, you're looking for get_weights()
. It acts the same as get_local_copy()
, so it returns a path.
EDIT: I think also get_local_copy()
for a model should work π
Hm, that seems less than ideal. I was hoping I could pass some CSV locations. I'll try and find a workaround for that. Thanks!
So the ..data
referenced in the example above are part of the git repository?
What about setting the working_directory
to the user working directory using Task.init
or Task.create
?
Right, so where can one find documentation about it?
The repo just has the variables with not much explanations.
But... Which queue does it listen to, and which type of instances will it use etc
Weβre using karpenter
(more magic keywords for me), so my understanding is that that will manage the scaling part.
Does it make sense to you to run several such glue instances, to manage multiple resource requirements?
Anything else youβd recommend paying attention to when setting the clearml-agent helm chart?
Either one would be nice to have. I kinda like the instant search option, but could live with an ENTER to search.
I opened this meanwhile - https://github.com/allegroai/clearml-server/issues/138
Generally, it would also be good if the pop-up presented some hints about what went wrong with fetching the experiments. Here, I know the pattern is incomplete and invalid. A less advanced user might not understand what's up.