Hi GrievingTurkey78
First, I would look at the CLI clearml-data
as a baseline for implementing such a tool:
Docs:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Implementation :
https://github.com/allegroai/clearml/blob/master/clearml/cli/data/main.py
Regrading your questions:
(1) No, a new dataset version will only store the diff from the parent (if files are removed it stored the metadata that says the file was removed)
(2) Yes any get operation will downl...
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
Assuming you are using docker-compose, the console output is a good start
Hi ColossalDeer61 ,
Xxx is the module where my main experiment script resides.
So I think there are two options,
Assuming you have a similar folder structure-main_folder
--package_folder
--script_folder
---script.py
Then if you set the "working directory" in the execution section to "." and the entry point to "script_folder/script.py", then your code could do:from package_folder import ABC
2. After cloning the original experiment, you can edit the "installed packages", and ad...
Closing the data doesnt work: dataset.close() AttributeError: 'Dataset' object has no attribute 'close'
Hi @<1523714677488488448:profile|NastyOtter17> could you send he full exception ?
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
So you want to launch the second step before the task that is uploading artifact is completed, but after the artifacts are uploaded?
Yep I changed it
This means it will totally ignore the overrides and just take the OmegaConf, this is by design. You either use the overrides, or you configure the OmegaConf. LovelyHamster1 Does that make sense ?
And is this repo installed on the pipeline creating machine ?
Basically I'm asking how come it did not automatically detect it?
VictoriousPenguin97 I'm assuming the exact same server version ?
I want to build a real time data streaming anomaly detection service with clearml-serving
Oh, so the way it currently works clearml-serving will push the data in real-time into Prometheus (you can control the stats/input/out), then you can build the anomaly detection in grafana (for example alerts on histograms over time is out-of-the-box, and clearml creates the histograms overtime).
Would you also need access to the stats data in Prometheus ? or are you saying you need to process it ...
We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
Hover over the border (I would suggest to use the full screen, i.e. maximize)
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)
For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.
You can use artifacts for complex data then retrieve them programatically.
Or you can manually report scalers / plots etc, with Logger
class, also you can retrive them with task.get_last_scalar_metrics
I see that you guys have made a lot of progress in the last two months! I'm excited to dig inΒ
Thank you!
You can further di...
LovelyHamster1 verified, this is a UI bug with old limitation enforced.
I will make sure they know about it, it should be fixed for the upcoming release π
DefiantHippopotamus88 you are sending the curl to the wrong port , it should be 9090 (based on what remember from the unified docker compose) on your setup
Hi @<1523706645840924672:profile|VirtuousFish83>
Hello, is it possible to disable lazy loading ?
You mean in the UI for loading the console ?
The logs can be huge 10s and 100s of MB...
We have the same issue for hyperparameters even with only ~100 keys,
100+ parameters that is quite a lot.
So are you saying the search in the UI only filter the lazily loaded elements and not the entire param list?
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one
also in the global pipeline script (not just inside the function)
Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"
You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?
Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
so would that be "tags" "parents" ?
Any recommendation or working combinations of AMI
I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[β¦]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...
Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob
with the last part of the path as wildcard selection
π
You can of course manually create a zip file...
How would you change the interface to support it ?
StickyLizard47 apologies for the https://github.com/allegroai/clearml-server/issues/140 not being followed (probably slipped through the cracks of backend guys, I can see the 1.5 release happened in parallel). Let me make sure it is followed.
SarcasticSquirrel56 specifically, did you also spin a clearml-k8s glue? or are the agents statically allocated on the helm chart?
Hi NastyFox63
What do you mean not all of them are shown?
Do they have diff series/titles, are they plots or scalars ? How are you reporting them ?