does this mean that Task stores --args (and propagates these further through the code as CLI arguments) somewhere where i can get and manipulate them from my code?
Yes it changes the actual argparse object and pushes the new values in runtime, basically you args.parse() will return the values from the UI (backend)
Hi LazyFox65
So the idea is that you add two lines of code to your codebase :from clearml import Task task = Task.init(project_name='examples', task_name='change me')And you run it once, then it will create the experiment, environment arguments etc.
Now that you have it in the UI you can clone / change all the fields and send for execution.
That said you can also create an experiment from CLI (basically pointing to a repo and entry point)
You can read here:
https://github.com/allegroa...
https://github.com/allegroai/clearml/issues/199
Seems already supported for a while now ...
WorriedParrot51 I now see ...
Two solutions that I can quickly think of:
In the code add:import sys sys.path.append('./my_sub_module')Assuming you always have to add the sub-directories to make the code work, and assuming they are part of the repository, this is probably the table stolution
2. In the the UI in the Docker base image, add -e PYTHONPATH=/folder
or from code (which is exactly what you did)
a clean interface task.set_base_docker('nvidia/cids -e PYTHONPATH=/folder")
Anyway, in the docs, there is a function called task.register_artifact()
Yes, this is rather deprecated... The idea is that it will monitor an obejct and auto sync it (i.e. serialize and upload).
That said, it is just so much easier to do task.upload_artifact and you can always update/overrwrite if you are passing the same name, that I cannot see the actual use case. Does that make sense? What are you using it for ?
Hi @<1533620191232004096:profile|NuttyLobster9>
First nice workaround!
Second could you send the full log? When the venv is skipped then pytorch resolving should be skipped as well, and no error should be raised...
And Lastly could you also send the log of the task that executed correctly (the one you cloned), because you are correct it should have been the same
and then in Preprocess:
self.model = get_model(task_id=os.environ['TASK_ID'], model_name=os.environ['MODEL_NAME'])That's the part I do not get, Models have their own entity (with UID), this is in contrast to artifacts that are only stored on Tasks.
The idea when you are registering a model with clearml-serving, you can specify the model ID, this should replace the need for the TASK_ID+model_name in your code, and the clearml-serving will basically bring it to you
Basically this fun...
Hmm yeah I can see why...
Now that I think about it, at least in theory the second process that torch creates, should inherit from the main one, and as such Task.init is basically "ignored"
Now I wonder why your first version of the code did not work?
Could it be that we patched the argparser on the subprocess and that we should not have?
Hi JitteryCoyote63
The NVIDIA_VISIBLE_DEVICES is set automatically for the process the trains-agent spins, so from your code, it is transparent, you can only "see" GPU 0.
(Obviously not using docker you can forcefully change the OS environment in runtime, but you should avoid that ;))
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
It does not upload, the default behavior is to log the artifact (so you know where you stored, but not enforce unnecessary uploads)
If you were to change:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')to:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example', output_uri=" ")It would also upload the model
Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210
Okay, so you want to take the jupyter notebook (aka colab) and have that experiment show on Trains, then use the Trains UI to launch it remotely on one of the machines running the trains-agent. Is that correct?
This workflow however is the only way I have found to easily fix my previous ‘Module not found’ errors
Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something likeimport sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")
Do you have a specific numpy version you are installing ? why is it trying to install the wheel from code?
at means I need to pass a single zip file to
path
argument in
add_files
, right?
actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload call...
Hi UnsightlyShark53 I think you are absolutely right, there is no reason for the trains.errors.UsageError: ArgumentParser.parse_args() ... Error.
As you mentioned, if auto_connect_arg_parser=False is False, it should just ignore what it picked automatically.
I will make sure the error is resolved I will also make sure, you will still be able to connect the argparse manually with task.connect(parser) after the Task has been created. Thanks for the reference! I took a look o...
docker mode. they do share the same folder with the training data mounted as a volume, but only for reading the data.
Any chance they try to store the TensorBoard on this folder ? This could lead to "No such file or directory: 'runs'" if one is deleting it, and the other is trying to access, or similar scenarios
Hi VirtuousFish83 ,
Is it throwing an exception? Are you seeing the plot in the UI but the title is incorrect?
Is ClearML combined with DataParallel or DistributedDataParallel officially supported / should that work without many adjustments?Yes it is suported, and should work
If so, would it be started via python ... or via torchrun ... ?Yes it should, hence the request for a code snippet to reproduce the issue you are experiencing
What about remote runs, how will they support the parallel execution?Supported, You should see in the "script entry" something like "-m -m torch.di...
Or is this a feature of hyperdatasets and i just mixed them up.
Ohh yes, this is it. Hyper Datasets are part of the UI (i.e. there is a Tab with the HyperDataset query) Dataset Usage is currently listed on the Task. make sense ?
Is it possible to get the folder with the artifacts/models? (edited)
You can directly get the artifacts/models url then deduce the foldertask = Task.get_task('my_task_id') print(task.artifacts['my artifact'].url)
the SDK is unable to see each of the nodes?
Exactly ! I mean I love the idea of "nested" component, but implementation wise this is not trivial, it will also hurt the ability of caching individual component. The workaround is to have all the "business logic" in the pipeline function itself, routing data between components is basically "free". The data does not actually go through the pipeline logic, it only passes reference (unless the pipeline logic actually tries to access the data o...
ReassuredTiger98
will it then be used by the clearml-agent
Yes, I think that in order to make it work, you have to make sure that the agent is also running with TRAINS_LOG_ENVIRONMENT=MYVAR*
Notice that you can use wildcard or have a list of VARIABLE you allow wither the clearml or the agent to monitor / change.
Hi JitteryCoyote63
The easiest is to inherit the ResourceMonitor class and change the default logging rate (you could also disable some of the metrics).
https://github.com/allegroai/clearml/blob/701fca9f395c05324dc6a5d8c61ba20e363190cf/clearml/task.py#L565
Then pass the new class to Task.init as auto_resource_monitoring
Is there any contingency plan for an agent to continue running a task without reading the repository on the GitLab server?
Not sure what can be done ... any suggestions ?
At runtime, can I ask the agent to use some cached repository?
sometimes you will have it (as the agent stores a cached copy, but I would hardly count on it (and it might be at different states on different machines...)
... (due to regular maintenance service, something I cannot control).
Maybe let "th...
WackyRabbit7 How do I reproduce it ?
It’s the correct way to do it, right?
Yep 🙂 that said this is not running as a service you will need to spin it on your machine. that said you can definitely connect it with the free SaaS server, and spin the serving on your machine with docker-compose
There is not dataset.close () 🙂