Reputation
Badges 1
25 × Eureka!Hmm let me check first when it is going to upgraded and if there is a workaround
VivaciousPenguin66 I have the feeling it is the first space in the URI that breaks the credentials lookup.
Let's test it:from clearml import StorageManager uri = '
` Birds%2FTraining/TRAIN [Network%3A resnet34, Library%3A torchvision] Ignite Train PyTorch CNN on CUB200.8611ada5be6f4bb6ba09cf730ecd2253/models/cub200_resnet34_ignite_best_model_0.pt'
original
StoargeManager.get_local_copy(uri)
qouted
StoargeManager.get_local_copy(uri.replace(' ', '%20')) `
JitteryCoyote63 see if upgrading the packages as they suggest somehow fixes it.
I have the feeling this is the same problem (the first error might be trains masking the original error)
Any chance there is an env variable you set to get 1.5.0rc0? Because this is the version that is being used
Hi BattyLizard6
does clearml orchestration have the ability to break gpu devices into virtual ones?
So this is fully supported on A100 with MIG slices. That said dynamic multi-tenant GPU on Kubernetes is a Kubernetes issue... We do support multi agents on the same GPU on bare metal, or over shared GPU instances over k8s with:
https://github.com/nano-gpu/nano-gpu-agent
https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin#fractional-resources
http...
delete logged images and texts though
logged images are also stored there?
right click on the experiment, select Reset, now you can edit it.
Wow, thank you very much. And how would I bind my code to task?
you mean the code that creates pipeline Tasks ?
(remember the pipeline itself is a Task in the system, basically if your pipeline code is a single script it will pack the entire thing )
Hi MysteriousBee56 ,
what do you mean by:
Can we upload our project repository to trains server?
Hmm how do you launch the autoscaler, code?
ContemplativeCockroach39 unfortunately No directly as part of clearml 😞
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
Which version? is this reproducible in this example?
None
(can you try with the latest clearml version 1.13.2?)
VexedCat68
a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "orig...
I'm checking now to see where the extra ' could come from
Hi LazyLeopard18
I think that these toy examples will help:
uploading local datasethttps://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py
2. pre-process data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py
3. Training example:
https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
Hi SubstantialElk6
You are uploading an artifact, a good use case for numpy artifact would be a feature table.
If you want to upload an image use either report_media or report_image or upload PIL image as artifact.
What do you think?
Hi EnviousStarfish54
The Enterprise edition extends Trains functionality.
It adds security, scale and full data management (data management and versioning being the key difference)
You can get it as a saas solution or on prem.
If you need more information, you can leave contact details on the website, I'm sure sales will be happy to help :)
Hi ReassuredTiger98
but I would rather just define a function that returns the task directly
🙂
Check it out:
https://github.com/allegroai/clearml/blob/36ee3d61209e413a917d8a718fb25f389143cfa1/clearml/automation/controller.py#L205:param base_task_factory: Optional, instead of providing a pre-existing Task, provide a Callable function to create the Task (returns Task object)
ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...)
which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
or by trains
We just upload the image as is ... I think this is SummaryWriter issue
ReassuredTiger98 are you saying you want to be able to run the pipeline as a standalone and as "remote pipeline",
Or is this for a specific step in the pipeline that you want to be able to run standalone/pipelined ?
Can you share the log?
Hi @<1523701868901961728:profile|ReassuredTiger98> when you get to it...
please download the wheel, then install it with
pip3 install -U clearml_agent-0.17.3rc0-py3-none-any.whl
Then run the daemon with the additional --debug
argument, basically:
clearml-agent --debug daemon --foreground ...
Once the agent is running please send the Task's log from your console 🙂
Hi SubstantialElk6
Unicodeencodeerror:'ascii' codec can't encode characters in position 296-297: ordinal not in range (128) (edited)
I'm assuming this is the usual UTF8 missing from the container.
Can you try to launch it with PYTHONIOENCODING=utf-8
?
That makes no sense to me?!
Are you absolutely sure the nntrain is executed on the same queue? (basically could it be that the nntraining is executed on a different queue in these two cases ?)
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
in Your Additional ClearML Configuration
(which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }
Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
If you want to rename it (any pipeline), click on the "Full details" in the "Run Info" (right hand side panel), then in the full detail of the Pipeline Task you will be able to rename the pipeline execution
(Is renaming useful? should we add a right click to rename ?)