
Reputation
Badges 1
25 × Eureka!No, I just want to register a new model in the storage.
Is the model file is already uploaded, you can register it without a Task:InputModel.import_model(...)
https://github.com/allegroai/clearml/blob/b3a2b3425c5098ebfc0598c9dfb3e670d4a87706/clearml/model.py#L521
I need to create a separate task for this right?
If you want the model to be uploaded, then yes you have to create a Task.
In my understanding requests still go through
clearml-server
which configuration I left
DefiantHippopotamus88 actually this is Not correct.
clearml-server only acts as a control plane, no actual requests are routed to it, it is used to sync model state, stats etc. not part of the request processing flow itself.curl: (56) Recv failure: Connection reset by peer
This actually indicates 9090 port is not being listened to...
What's the final docker-compose you are usi...
To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.
The issue here you assume both are idle and you need global priority based on resource preference. I understand your scenario now, but it will only hold if enqueuing order is "optimal". For example, if machine Y is running a Task B that is about to be completed (e.g. in a minute) then still machine X will pick the new Task B, and again we end up in the scenario where Task A i...
Hi CurvedHedgehog15
User aborted: stopping task (3)
?
This means "someone" externally aborted the Task, in your case the HPO aborted it (the sophisticated HyperBand Bayesian optimization algorithms we use, both Optuna and HpBandster) will early stop experiments based on their performance and continue if they need later
yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
Was going crazy for a short amount of time yelling to myself: I just installed clear-agent init!
oh noooooooooooooooooo
I can relate so much, happens to me too often that copy pasting into bash just uses the unicode character instead of the regular ascii one
I'll let the front-end guys know, so we do not make ppl go crazy π
Great, you can test directly from the master πpip3 install -U git+
Oh think I understand you point now.
basically you can:
Create the initial Task, once it is in the system clone it and adjust parameters externally. A simple example here:
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/automation/manual_random_param_search_example.py#L41
wdyt?
So I see this in the build, which means it works , and compiles, what is missing ?
` Building wheels for collected packages: leap
Building wheel for leap (setup.py) ... [?25l- \ |
1667848450770 UH-LPT371:0 DEBUG / - \ | / - done
[?25h Created wheel for leap: filename=leap-0.4.1-cp38-cp38-linux_x86_64.whl size=1052746 sha256=1dcffa8da97522b2611f7b3e18ef4847f8938610180132a75fd9369f7cbcf0b6
Stored in directory: /root/.cache/pip/wheels/b4/0c/2c/37102da47f10c22620075914c8bb4a9a2b1f858263021...
Found the issue, fix in the next RC (soon to be out)
. So i'd like to use the command line argument it in the first argparse, and then hide/delete/override before running the second argparse.
Nice, hack!
BTW: seems like conda doesn't support git+git:// packages
How about switching to pip ? you can still run the entire thing from conda env, it will just use pip & venv to install everything, other than that it should work as expected.
You need to mount it to ~/clearml.conf
(i.e. /root/clearml.conf)
The new parameterΒ
abort_on_failed_steps
Β could be a list containing the name of the
I like that, we can also have it as an argument per step (i.e. the decorator can say, abort_pipeline_on_fail or continue_pipeline_processing)
Hmm, it is not returned, it is inside the function....
Woot woot! π€©
Thanks!
In the conf file, I guess this will be where ppl will look for it.
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
Could it be the Args section of the task it clones does not have the "input_train_data" argument ?
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)