Hmm that makes sense, I "think" the enterprise offering has a solution for that as well (i.e. full separation over static cluster), but probably the best way to constituent this avenue is talk to Sales (I'm assuming they'll setup a call to discuss the details)
Going back to the open source, I think that adding the credentials as part of the source code might allow to have "credentials" auto populate as part of the remote execution, wdyt?
Can you let me know if i can override the docker image using template.yaml?
No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one
EnviousStarfish54 you can also run the docker-compose on one of the machines on your local LAN. but then you will not be able to access it from home 🙂
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
WickedGoat98 give me a minute, I'm not sure it is not ClearML related
I commented the upload_artifact at the end of the code and it finishes correctly now
upload_artifact caused the "failed" issue ?
And the agent section on this machine is:api_server:
web_server:
files_server:
Is that correct?
You can disable it with:
Task.init('example', 'train', auto_connect_frameworks={'pytorch': False})
but instead, they cannot be run if the files they produce, were not committed.
The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...
Hi FreshKangaroo33
clearml.conf is HOCON format, to parse you can use pyhocon:
https://github.com/chimpler/pyhocon
Or the built in version of clearml:from clearml.utilities.pyhocon import ConfigFactory config_dict = ConfigFactory.parse_string(text).as_plain_ordered_dict()
You can also just get the parsed objectfrom clearml.config import config_obj
in the docker-compose file. Still strange...
hmm yes it is... If you have an idea on what went wrong let me know, we would love to fix it
BTW: any specific reason for going the RestAPI way and not using the python SDK ?
BoredHedgehog47 were you able to locate the issue ?
Just a bit of background, the execute)remotely will kill the current process (after the Task is synced) and enqueue the Task that was created for remote execution. What seems to fail is actually killing the current process. You can just pass exit_process=False
Yep I think you are correct, you should have had the same output as a local jupyter notebook, and it seems that in sagemaker studio it is not working 😞
Let me check something
Hi @<1631826770770530304:profile|GracefulHamster67>
if you want your current task:
task = Task.current_task()
if you need the pipeline Task from the pipeline component
pipeline = Task.get_task(Task.current_task().parent)
where are you trying to get the pipelines from? I'm not sure I understand the use case?
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
How so?
Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?
this is not the case as all the scalars report the same iterations
MassiveHippopotamus56 could it be the the machine statistics? (i.e. cpu/gpu etc. these are considered scalars as well...)
That said, it might be different backend, I'll test with the demoserver
@<1523720500038078464:profile|MotionlessSeagull22> you cannot have two graphs with the same title, the left side panel presents graph titles. That means that you cannot have a title=loss series=train & title=loss series=test on two diff graphs, they will always be displayed on the same graph.
That said, when comparing experiments, all graph pairs (i.e. title+series) will be displayed as a single graph, where the diff series are the experiments.
RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/
i had a misconception that the conf comes from the machine triggering the pipeline
Sorry, this one :)
HmmCLEARML_CUSTOM_BUILD_OUTPUT
This might be an enterprise feature, I'm not aware of anything in the open source version
For some reason copying over everything and making another file and running it there does not allow it to run
Not sure i follow...
you should only have one ~/clearml.conf nad from wherever you are running your code it will always read the configuration from the same file