Reputation
Badges 1
662 × Eureka!Is it currently broken? š¤
I think now there's the following:
Resource type Queue (name) defines resource + max instancesAnd I'm looking for:
Resource type "pool" of resources (type + max instances) A pool can be shared among queues
Heh, my bad, the term "user" is very much ingrained in our internal way of working. You can think of it as basically any technically-inclined person in your team or company.
Indeed the options in the WebUI are too limited for our use case, so we're developed "apps" that take a yaml configuration file and build a matching pipeline.
With that, our users do not need to code directly, and we can offer much more fine control over the pipeline.
As for the imports, what I meant is that I encounter...
Of course now it's not there anymore š If/when it happens again I'll ping you here š
Maybe it's better to approach this the other way, if one uses Task.force_requirements_env_freeze() , then the locally updated packages aren't reflected in poetry š¤
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
Another example - trying to validate dataset interactions ends with
` else:
self._created_task = True
dataset_project, parent_project = self._build_hidden_project_name(dataset_project, dataset_name)
task = Task.create(
project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
if bool(Session.check_min_api_server_version(Dataset.__min_api_version)):
get_or_create_proje...
I'm guessing that's not on pypi yet?
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials' to the extra_vm_bash_script and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
That's probably in the newer ClearML server pages then, I'll have to wait still š
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more š
Yes š I want ClearML to load and parse the config before that. But now I'm not even sure those settings in the config are even exposed as environment variables?
I will! (once our infra guy comes back from holiday and updates the install, for some reason they setup server 1.1.1???)
Meanwhile wondering where I got a random worker from
Can I query where the worker is running (IP)?
Thanks AgitatedDove14 , I'll first have to prove viability with the free version :)
Indeed. I'll open an issue, sure!
One more UI question TimelyPenguin76 , if I may -- it seems one cannot simply report single integers. The report_scalar feature creates a plot of a single data point (or single iteration).
For example if I want to report a scalar "final MAE" for easier comparison, it's kinda impossible š
We're not using the docker setup though. The CLI run by the autoscaler is python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_small , so no docker
Yes. Though again, just highlighting the naming of foo-mod is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being Iām just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.require...
Heh, well, John wrote that in the first reply in this thread š
And in Task.init main documentation page (nowhere near the code), it says the following -
It also happens when use_current_task=False though. So the current best approach would be to not combine the task and the dataset?
Basically when there are occasionally extreme values (i.e. most values fall in [0, 50] range, and one value suddenly falls in 50e+12 range), the plotting library (matplotlib or ClearML, unsure) hangs for a really long time
The S3 bucket credentials are defined on the agent, as the bucket is also running locally on the same machine - but I would love for the code to download and apply the file automatically!
Or if it wasn't clear, that chunk of code is from clearml's dataset.py
I've updated my feature request to describe that as well. A textual description is not necessarily a preview š For now I'll use the debug samples.
These kind of things definitely show how ClearML was designed originally only for neural networks tbh, where images are almost always only part of the dataset. Same goes for the consistent use of iteration everywhere š
The network is configured correctly š But the newly spun up instances need to be set to the same VPC/Subnet somehow
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init function with args so I can upload the environment file via StorageManager to S3 Call some start_run function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...