Reputation
Badges 1
25 × Eureka!ImmensePenguin78
I think the latest RC adds it, should be released later today 🙂
Yes this is definitely the issue, the agent assume the docker user is "root".
Let me check something
BTW
/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
This error is from the agent, correct? it seems it did not clone the correct code, is train.py
committed to the repository ?
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff
will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?
Can you see it on the console ?
the use case i have is to allow people from my team to run their workloads on set of servers without stepping over each other..
So does that mean CPU only workloads?
Also are we afraid of fairness? (i.e. someone "taking" all the CPU for themselves)
os.environ['TRAINS_PROC_MASTER_ID'] = '1:da0606f2e6fb40f692f5c885f807902a' os.environ['OMPI_COMM_WORLD_NODE_RANK'] = '1' task = Task.init(project_name="examples", task_name="Manual reporting") print(type(task))
Should be: <class 'trains.task.Task'>
New python executable in /home/smjahad/.clearml/venvs-builds/3.6/bin/python2
This is the output of venv create
this is odd.
Could it be that by accident you did:pip install cleamrl-agent
and notpip3 install clearml-agent
and now it is running on python2 (which would explain the error) ?
I would uninstall/reinstall on python3 to verify
Because we are working with very big files, having them stored at multiple locations is something we try to avoid
Just so I better understand, is this for storing files as part of a dataset, or as debug samples ?
In other words can two diff processes create the exact same file (image) ?
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
Can't say I have noticed that, is this a delay on the send ? Which for some reason is correlated with the epochs ? What was the case with 0.17.5?
Hi @<1716987933514272768:profile|SuccessfulPuppy43>
How to make remote ClearML agent do
pip install -e .
in theory there is no need to do that clearml-agent adds the repo root folder to the python path.
If you insist on actually installing it, try to add to your "installed packages" section a "requirement.txt" compatible line:
-e .
Hi JuicyDog96
The easiest way at the moment (apologies for still lack of RestAPI documentation, it is coming:)
Is actually the code (full docstring doc)
https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
You can access it all with an easy Pythonic interface, for example:from trains.backend_api.session.client import APIClient client = APIClient() tasks = client.tasks.get_all()
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])
Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
NICE! CurvedHedgehog15 cool stuff! and my pleasure 🙂
HelplessCrocodile8 I just tried it, everything seems to work (ubuntu 20.04) 😞
What's the OS your are using? Python version? Is it conda ?
LazyLeopard18 you can point the artifact directly to your azure object storage and have StorageManager download and cache it for you:
FrothyShark37 any chance you can share snippet to reproduce?
We do upload the final model manually.
If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229
I was just wondering if i can make the autologging usable.
It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
` Task.current_task().mo...
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
Hi ReassuredTiger98
However, the clearml-agent also stops working then.
you mean the clearml-agen daemon (the one that spinned the container) is crashing as well ?
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
Hmm can you try with additional configuration, next to "secure: true" in your clearml.conf, can you add "verify: false"
Oh right, I missed the fact the helper functions are also decorated, yes it makes sense we add the tags as well.
Regarding nested pipelines, I think my main question is , are they independent or are we generating everything from the same code base?
Building the pipeline in runtime from external configuration is very cool!!
I think nested components is exactly the correct solution, and it is a great use case.
Yes, the webserver doesn't know where the api server is, it will access /api and then the nginx running the webapp will do the routing (reverse proxy)
I think that for some reason it is failing to do that (actually similarly to the stackoverflow you linked)
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
Can you see the repo itself ? the commit id ?