Reputation
Badges 1
25 × Eureka!Hi SourOx12
How do you set the iteration when you continue the experiment? is it with Task.init
continue_last_task
?
JitteryCoyote63 Should be quite safe, there is no major change that I'm aware of on the ClearML side that can effect it.
That said, wait for after the weekend, we are releasing a new ClearML package, I remember there was something with the model logging, it might not directly have something to do with ignite, but worth testing on the latest version.
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3
Then run againclearml-task ...
I assume it is reported into TB, right ?
Hi @<1668427971179843584:profile|GrumpySeahorse51>
Could you provide the full stack log?
this erros seems to originate from psutil (which is used) but it lacks the clearml-session context
but I have no idea what's behingย
1
,ย
2
ย andย
3
ย compare to the first execution
This is why I would think multiple experiments, since it will store all the arguments (and I think these arguments are somehow being lost.
wdyt?
Feel free to open an issue on GitHub making sure this is not forgotten
his means that you guys internally catch the argparser object somehow right?
Correct ๐ this is how you get the type checking casting abilities, and a few other perks
RoughTiger69 the easiest thing would be to use the override option of Hydra:parameter_override={'Args/overrides': '[the_hydra_key={}]'.format(a_new_value)})
wdyt?
Hi @<1523701083040387072:profile|UnevenDolphin73>
How can I ensure tasks in a pipeline have the same environment as the pipeline itself?
...
but the tasks (executed remotely) do not use that same environment?
Just verifying, we are talking about pipeline decorators?
We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image
You can specify the docker on the decorator itself:
[None](https://github.com/allegroai...
Hi @<1639799308809146368:profile|TritePigeon86>
Sounds awesome, how can we help?
Yes, I think you are correct, verified on Firefox & Chrome. I'll make sure to pass it along.
Thanks SteadyFox10 !
Ohh now I get it...
Wait a couple of hours, 0.16 is out today with trains-agent --stop flag ๐
Hi @<1597762318140182528:profile|EnchantingPenguin77>--ipc=host
actually means that there is no need for the --shm-size
argument, it means you have access to the enitre GPU ram on the host machine. I'm assuming that the GPU card just does not have enough VRAM ...
None
Hi GrievingTurkey78
First, I would look at the CLI clearml-data
as a baseline for implementing such a tool:
Docs:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Implementation :
https://github.com/allegroai/clearml/blob/master/clearml/cli/data/main.py
Regrading your questions:
(1) No, a new dataset version will only store the diff from the parent (if files are removed it stored the metadata that says the file was removed)
(2) Yes any get operation will downl...
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
Assuming you are using docker-compose, the console output is a good start
Hi ColossalDeer61 ,
Xxx is the module where my main experiment script resides.
So I think there are two options,
Assuming you have a similar folder structure-main_folder
--package_folder
--script_folder
---script.py
Then if you set the "working directory" in the execution section to "." and the entry point to "script_folder/script.py", then your code could do:from package_folder import ABC
2. After cloning the original experiment, you can edit the "installed packages", and ad...
Closing the data doesnt work: dataset.close() AttributeError: 'Dataset' object has no attribute 'close'
Hi @<1523714677488488448:profile|NastyOtter17> could you send he full exception ?
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
So you want to launch the second step before the task that is uploading artifact is completed, but after the artifacts are uploaded?
Yep I changed it
This means it will totally ignore the overrides and just take the OmegaConf, this is by design. You either use the overrides, or you configure the OmegaConf. LovelyHamster1 Does that make sense ?
And is this repo installed on the pipeline creating machine ?
Basically I'm asking how come it did not automatically detect it?
VictoriousPenguin97 I'm assuming the exact same server version ?
I want to build a real time data streaming anomaly detection service with clearml-serving
Oh, so the way it currently works clearml-serving will push the data in real-time into Prometheus (you can control the stats/input/out), then you can build the anomaly detection in grafana (for example alerts on histograms over time is out-of-the-box, and clearml creates the histograms overtime).
Would you also need access to the stats data in Prometheus ? or are you saying you need to process it ...
We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
Hover over the border (I would suggest to use the full screen, i.e. maximize)
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)