Reputation
Badges 1
25 × Eureka!Does adding external files not upload them ti the dataset output_uri?
@<1523704667563888640:profile|CooperativeOtter46> If you are adding the links with add_external_files these files are Not re-uploaded
Thank you @<1523720500038078464:profile|MotionlessSeagull22> always great to hear π
btw, if you feel like sharing your thoughts with us, consider filling our survey , it should not take more than 5min
CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.
Basic setup:
A machine running clearml-agent (this is the "remote machine") A machine running cleaml-session (let's call it laptop π )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default ), Once the agent is running on the remote machine, from your laptop ru...
I understand, but then the toml file needs to be parsed to ensure poetry is used. It's just a tool entry in the pyproject.toml.
Probably too much for the agent... and specifically it seems poetry actually managed to parse it?! what are you getting in the log?
CleanPigeon16 Can you send also the "Configuration Object" "Pipeline" section ?
Here you go π
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
GiddyTurkey39
as others will also be running the same scripts from their own local development machine
Which would mean trains ` will update the installed packages, no?
his is why I was inquiring about theΒ
requirements.txt
Β file,
My apologies, of course this is supported π
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent will revert to installing the requirements.txt from the git repo itself, then it...
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
A true mystery π
That said, I hardly think it is directly related to the trains-agent ...
Do you have any more insights on when / how it happens ?
Hi UnevenDolphin73
If you "remove" the lock file the agent will default to pip.
You can hack it with uncommitted changes section?
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
I assume the task is being launched sequentially. I'm going to prepare a more elaborate example to see what happens.
Let me know if you can produce a mock test, I would love to make sure we support the use case, this is a great example of using pipeline logic π
HiΒ
, if you don't mind having a look too,
With pleasure :)
according to the above I was expecting the config to be auto-magically updated with the new yaml config I edited in the UI, however it seems like an additional step is required.. probably connect_dict? or am I missing something
Notice the OmegaConf section description :Full OmegaConf YAML configuration. This is a read-only section, unless 'Hydra/_allow_omegaconf_edit_' is set to TrueBy default it will alw...
Okay let's see if I can reproduce it:
new conda env py==3.8 install clearml == 0.17.5rc5 matplotlib == 3.3.4 numpy == 1.20.1 seaborn == 0.11.1Clone repo run `python examples/frameworks/matplotlib/matplotlib_example.pyRight ?
Hmm, not a bad idea π
Could you please open a Git Issue, so it will not get forgotten ?
(btw: I'm not sure how trivial it is to implement, nonetheless obviously possible π
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name
okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1
does that mean that even though there is a warning there you can curl to ...
Hi MammothGoat53
Basically what you are missing are the headers with the Token you have:
https://blog.logrocket.com/secure-rest-api-jwt-authentication/
Hi SuperficialGrasshopper36
/home/ubuntu/.clearml/venvs-builds.1/3.8/task_repository/repository_name/.venv
This is the problem, they should not be installed there, it should be in/home/ubuntu/.clearml/venvs-builds.1/3.8/
Could you post the poetry.lock file? Maybe it is something there?
What's the poetry version and cleaml-agent versions ?
What's the output_uri you are passing ?
And the OS / Python version?
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
Also I would suggest using Task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Regrading the limit interface, let me check I think this is worked on (i.e. nice interface that should be pushed in the next few days). Let me get back to you on this one.
How will imposing an instance limit , prevent or allow --order-fairness feature for example, which exists when running in clearml-agent version compared to k8s_glue_example version ?
A bit of background on how the glue works:
It pulls jobs from the clearml queue, then it prepares a k8s job, and launches the k8s jobs...
Hi SucculentBeetle7
The parameters passed to add_step need to contain the section name (maybe we should warn if it is not there, I'll see if we can add it).
So maybe something like:{'Args/param1', 1}Or{'General/param1', 1}Can you verify it solves the issue?
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?