https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L86
you can just pass the instance of the OptunaOptimizer, you created, and continue the study
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...
What is the specific use case, updating a file on existing dataset and creating a new version?
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
My bad, I worded my question wrong I see,
LOL no worries 🙂
Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113
Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...
is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that
I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
When you have a bit of experience, please suggest a path forward, it will be great to integrate
When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.
But it worked well when using StorageManager and uploading to the minio directly, is that correct?
.. I give my Minio to output_uri argument
How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
Guys FYI:params = task.get_parameters_as_dict()
Hi CourageousDove78
Not the cleanest, but you can basically pass everything here:
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html#post--tasks.get_all
Reasoning is that it is passed almost as is to the server for the actual query.
Could you test if this is working:
https://github.com/allegroai/clearml/blob/master/examples/reporting/matplotlib_manual_reporting.py
JitteryCoyote63 What did you have in mind?
Hi BroadMole98
What I think I am understanding about trains so far is that it's great at tracking one-off script runs and storing artifacts and metadata about training jobs, but doesn't replace kubeflow or snakemake's DAG as a first-class citizen. How does Allegro handle DAGgy workflows?
Long story short, yes you are correct. kubeflow and snakemake for that matter, are all about DAGs where each node is running a docker (bash) for you. The missing portions (for both) are:
How do I cr...
I don't see any requests
This points to configuration, specifically maybe it is directed to a different server?!
I think we were able to fix it, let me check if it was pushed 🙂
It's in my local conda environment though.
Meaning this is a wheel installed manually in conda? or is it a folder inside the conda environment ?
SteadySeagull18 btw: in post-callback the node.job will be completed
because it is a called after the Task is completed
Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?
ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:curl
-o docker-compose.yml
What's the trains-server version ?
You can see it if you go to the profile page
Please hit Ctrl-F5 refresh the entire page, see if it is till empty....