Reputation
Badges 1
662 × Eureka!Haha, I've opened so many issues these past few days... Sure, np!
Same result 😞 This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug 🤔
Ah, the API server /users.get_all
, I see!
Hi AgitatedDove14 !
Ah, thanks! I'll use the artifacts for linking.
We've forgone the "use current task" already because it indeed made things even more difficult (the task that was used is then automatically hidden by this automatic renaming of dataset tasks).
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
Okay this was a deep dive into clearml-agent code 😁
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old (Python 3.6.9 and Python 3.8 had latest virtualenv, but Python 3.7.5 had an old virtualenv).
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
As a result -> Could the agent maybe also output the virtualenv
version used with setting up the environment for the first time?
Why not give ClearML read-only access credentials to the repository?
Hm, I'm not sure I follow 🤔 How does the API server config relate to the file server?
SuccessfulKoala55 That string was autogenerated by pyhocon and matches their documentation too - https://github.com/lightbend/config/blob/master/HOCON.md#substitutions
The first example won't work (it will treat ${...}
as a string literal and won't replace it). The second does work, but as mentioned anyway, these were not hand typed, but rather generated from pyhocon, so I don't think that's the issue 🤔
Those are cool and very welcome additions (hopefully the additional info in the Info
tab will be a link?) 😁
The main issue is the clutter that the forced renaming creates, as shown in the pictures I attached in the other thread.
Why does ClearML hide the dataset task from the main WebUI? Users should have some control over that. If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some .datasets
hidden sub-project. Not...
Will try later today TimelyPenguin76 and report back, thanks! Does this revert the behavior to the 1.3.x one?
I'd like to set up both with and without GPUs. I can use any region, preferably some EU one.
It seems that the agent uses the remote repository 's lock file. We've removed and renamed the file locally (caught under local changes), but it still installs from the remote lock file 🤔
@<1539780258050347008:profile|CheerfulKoala77> you may also need to define subnet or security groups.
Personally I do not see the point in Docker over EC2 instances for CPU instances (virtualization on top of virtualization).
Finally, just to make sure, you only ever need one autoscaler. You can monitor multiple queues with multiple instance types with one autoscaler.
No it doesn't, the agent has its own clearml.conf file.
I'm not too familiar with clearml on docker, but I do remember there are config options to pass some environment variables to docker.
You can then set your environment variables in any way you'd like before the container starts
Heh, my bad, the term "user" is very much ingrained in our internal way of working. You can think of it as basically any technically-inclined person in your team or company.
Indeed the options in the WebUI are too limited for our use case, so we're developed "apps" that take a yaml configuration file and build a matching pipeline.
With that, our users do not need to code directly, and we can offer much more fine control over the pipeline.
As for the imports, what I meant is that I encounter...
I can see the task in the UI, it is not archived, and that's pretty much the snippet, but in full I do e.g.
Thanks AgitatedDove14 , I'll give it a try. Perhaps additional documentation is needed for that extra_layout
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use Task.init
before override_current_task_id(None)
.
When would this callback be called? I'm not sure I understand the usecase.
Since this is a single process, most of these are only needed once when our "initializer" task starts and loads.
TimelyPenguin76 here's the full log (took a moment to anonynomize completely):
`
Using environment access key CLEARML_API_ACCESS_KEY=xxx
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.3.0, location: /tmp/.clearml_agent.zs4e7egs.cfg):
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.m...
I'm using some old agent I fear, since our infra person decided to use chart 3.3.0 😕
I'll try with the env var too. Do you personally recommend docker over the simple AMI + virtual environment?
More complete log does not add much information -Cloning into '/root/.clearml/venvs-builds/3.10/task_repository/xxx/xxx'... fatal: could not read Username for '
': terminal prompts disabled fatal: clone of '
` ' into submodule path '/root/.clearml/venvs-builds/3.10/task_repository/...
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials'
to the extra_vm_bash_script
and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
That was a good idea, unfortunately did not help too much, but I think I may have a found a work around, thanks!
Basically when running remotely, the first argument to any configuration (whether object or string, or whatever) is ignored, right?