Reputation
Badges 1
25 × Eureka!DepressedChimpanzee34
I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
On "regular" load there is no need for multiple processes, and the memory consumption might be more important than reply lag (at least before you start to scale)
DisturbedWalrus17
By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Can you try with even more ...
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
Let's start small. Do you have grafana enabled in your docker compose and can you login to your grafana web ui?
Notice grafana needs to access the prometheus container directly so easiest way is to have everything in the same docker compose
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None
but when I run the same task again it does not map the keys..Β (edited)
SparklingElephant70 what do you mean by "map the keys" ?
Okay found it, ElegantCoyote26 the step name is changed but the Task name remains the same ... π
I'll make sure we fix it on the next version
Because it lives behind a VPN and github workers donβt have access to it
makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?
Hi SarcasticSparrow10
You will need to habe multiple trains-agent s but they will be sharing the same queue (i.e. pulling jobs from the same queue the HPO process is pushing to)
Make sense ?
Hi UpsetTurkey67
"General/my_parameter_name" so that only this part of the configuration will be updated?
I'm assuming this is a Hyperparameter not a configuration object (i.e. task.connect not task.connect_configuration), if this is the case then Yes π
Could it be that clone has to be False? (I assume the reasoning is the cloning feature)
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
WhimsicalLion91
What would you say the use case for running an experiment with iterations
That could be loss value per iteration, or accuracy per epoch (iteration is just a name for the x-axis in a sense , this is equivalent to time series)
Make sense?
After you call task.set_initial_iteration(0) what do you get with task.get_initial_iteration() , is it 0 ?
That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,
You can also change allow_omegaconf_editin the UI to True, and then you could just edit the OmegaConf in the UI (if you do not changeallow_omegaconf_edit` then the edit in the UI is ignored)
RipeGoose2 yes, the UI cannot embed the html yet, but if you go click on the link itself it will open the html in a new tab.
Could you verify it works ?
PompousParrot44 What is the "working directory" on the experiment itself? and the "script path"?
Based on what you wrote above, in order for it work you should have:
working directory: "."
script path: "-m test.scripts.script"
notice no "--args" and working directory is "." (i.e. the root of the repository)
(only works for pyroch because they have diff wheeks for diff cuda versions)
Hi @<1545216070686609408:profile|EnthusiasticCow4>
is there a way to get the date from the InputModel?
You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?
Great if this is what you do how come you need to change the entry script in the ui?
to fix it, I excluded this var entirely from the docker-compose
Make sense.
the path to the JSON file
Yep, that's what I did and things seem to work... Let me check again if I missed anything
It analyses the script code itself, going over all imports and adding only the directly imported packages
DepressedChimpanzee34 <character> will almost always be converted into \ because otherwise it will not support \t or \n etc.
What I'm looking here is some logic that will allow us not to break backwards compatibility on the one hand, but still will allow you to have something like "first\second" entry.
WDYT? any ideas? (I really want to make sure we fix it as soon as possible)
ohh AbruptHedgehog21 if this is the case, why don't you store the model with torch.jit.save and use Triton to run the model ?
See example:
https://github.com/allegroai/clearml-serving/tree/main/examples/pytorch
(BTW: if you want a full custom model serve, in this case you would need to add torch to the list of python packages)
DepressedChimpanzee34 any string serialization package I tried will convert r"some\blah" into "some\\blah" (json yaml hocon) otherwise you end up with \b as an escape character. I'm really not sure what to do here. (And reinventing the standard seems unhealthy)
No worries, I would love for us to come up with a nice solution π
Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
ReassuredTiger98 could you provide more information ? (versions, scenario. etc.)
Hi RoughTiger69
A. Yes makes total sense . Basically you can use Task.export Task.import to do achieve this process (notice we assume the dataset artifacts links are available on both, usually this is the case)
B. The easiest way would be to use Process , then one subprocess is exporting from dev , where the credentials and configuration is passed with os environment. The another subprocess imports it to the prod server (again with os environment pointing to the prod server). Make sense?
DepressedChimpanzee34
so parsing bask is done via a yaml reader:
https://github.com/allegroai/clearml/blob/49fcbd7bbf3236f4175cdff29fa951847b0923cc/clearml/backend_interface/task/args.py#L506
We could add extra test here, checking for \ in the string, that should solve it and will be backwards compatible (I think)
https://github.com/allegroai/clearml/blob/49fcbd7bbf3236f4175cdff29fa951847b0923cc/clearml/backend_interface/task/task.py#L935