JitteryCoyote63 nice hack 😄
how come it is not automatically logged as console output ?
Hmm let me rerun (offline mode right ?)
Any plans to add unpublished state for clearml-serving?
Hmm OddShrimp85 do you mean like flag, not being served ?
Should we use archive ?
The publish state, basically locks the Task/Model so they are not to be changed, should we enable unlocking (i.e. un-publish), wdyt?
Yes, only task.execute_remotely() should be the last call. because it literally will stop the local run before you add the Args section
Thanks BroadSeaturtle49
I think I was able to locate the issue != breaks the pytroch lookup
I will make sure we fix asap and release an RC.
BTW: how come 0.13.x have No linux x64 support? and the same for 0.12.x
https://download.pytorch.org/whl/cu111/torch_stable.html
BTW: you can always set different config files by with an environment variable:CLEARML_CONFIG_FILE="path/to/cobfig/file
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
neat! please update on your progress, maybe we should add an upgrade section once you have the details worked out
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
ElegantCoyote26parser = get_parser() args_ = vars(parser.parse_args()) task.connect(args_)There is no need to connect args_ Task.init will automatically catch the argparser.
OK, I got it by modifying the .conf file and putting the credentials on node
Nice! 🙂
. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS and clearml.conf contains no "project" section it crashed when starting ?
task = Task.get_task('task_id_here') task.mark_started(force=True) task.upload_artifact(..., wait_on_upload=True) task.mark_completed()
I am trying to see if the user can submit a list of resource requirements (e.g 4GPUs, 12 cores, 100GB diskspace) for the task when queuing the task and the agents pick these tasks if they have the requested resources. With this, the user need not think about which queue to send the task to. The users just state what they need and the agents do the scheduling for them.
Can I assume we are talking Kubernetes under the hood for the resource allocation ?
That would match what
add_dataset_trigger
and
add_model_trigger
already have so it would be good
Sounds good, any chance you can open a github issue, so that we do not forget?
Another parameter for when the task is deleted might also be useful
That actually might be more complicated, because there might be a race condition, basically missing the delete operation...
What would be the use case?
I might gave an idea, could you test with:
` from clearml import Task
Task._report_subprocess_enabled = False
...
real code here `
This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case?See update_weights_package actually packages an entire folder as zip and will do the extraction when you get it back (check the function docstring, I think you can also specify wildcard etc if needed)
Why do you see this as preferred to the dataset method we have now?
So it answers a few requirements that you raised
It is fully visible as part of the project and se...
Hmm I think you have a point here, the confusing part is the cp cmd. Can you send the full log? (Regradless , can I assume you are running a rootless container ?)
Then check in the clearml.conf under files_server
And use what you have there (for example http://localhost:8081 )
BitingKangaroo95 can you post here the entire console output of clearml-session (including full command line) ?
UnevenDolphin73 it seems this is a UI browser limit, this means we will need to move it into the server ...
See here: https://clearml.slack.com/archives/CTK20V944/p1640247879153700?thread_ts=1640135359.125200&cid=CTK20V944
hmm that is odd, let me check
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
Are you seeing the argparse arguments in the UI (when running locally) ?
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
files_server:
://genuin-ai/
should be:
files_server: