if I build a custom image, do I have to host it on dockerhub for it to run on the agent?
You dont need to host it, but in this case the machine running the agent should have the image (you can verify on the machine with docker images
).
If not how do I make the agent aware of my custom image?
Once the image is the base docker image for this task, and the image was verify on the agent’s machine, the agent should be able to use it
Can you share the exception for --gpus "0,1"
?
Hi GrievingTurkey78 ,
Can you try running https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py example? do you get metrics with it?
This is a nice issue to open in https://github.com/allegroai/trains :)
You are running docker mode, and for nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
this image I dont think you have conda
installed
still i need do this?
dataset.upload() dataset.finalize()
if you want to finalize the dataset, yes
if we have uploaded data clearml, how we add data?
this is my way right now.
dataset = Dataset.create( dataset_project=metadata[2], dataset_name=metadata[3], description=description, output_uri=f"
", parent_datasets=[id_dataset_latest] )
If you finalized it, you can create a child version - https://clear.ml/docs/latest/docs/clearml_data/data_manage...
you can register the links only (no need to download and upload),clearml-data add --links
from CLI, or add_external_files
from code:
dataset.add_external_files(source_url="
")
Hi QuaintJellyfish58 ,
Not sure I’m getting it, can you describe your scenario? Are you referring to https://clear.ml/docs/latest/docs/clearml_data/clearml_data ?
Hi UnsightlySeagull42 , didnt really get your setup, you have more than one cuda on your system?
I can’t use Docker because I need 4 different Tensorflow versions and my company is not allowed to use conda.
You can use Docker without conda
Hi GiganticTurtle0 ,
All the packages you are using should be under installed packages
section in your task (in the UI). ClearML analyze and the full report should be under this section.
You can add any package you like with Task.add_requirements('tensorflow', '2.4.0')
for tensorflow version 2.4.0 (or Task.add_requirements('tensorflow', '')
for no limit).
If you dont want the package analyzer, you can configure in your ~/clearml.conf file: ` sdk.development.detect_with_...
Hi EagerStork23 , sure, i'll check it, but I will need some more information.
I run this example: https://github.com/allegroai/trains/blob/master/examples/matplotlib_example.py which produced 3 different plots and I can see those in the task's plot section, can you share what you run (general code without data)? Link to the task?
Hi EagerStork23 ,
Thanks for catching this bug.
We also caught this issue, so a fix is scheduled to be released in one of the coming versions.
I'll update here once it will be released 🙂
The next version (0.14.0) should be released in a few weeks 🙂 .
EagerStork23 However, the issue is only in the presentation of the graph (occurs when all subplots have the same label), so you can use the following workaround for solving it:
` import matplotlib.pyplot as plt
def plot_subplots():
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
x = range(10)
y = range(10)
ax1.plot(x, y)[0].set_label("label1")
ax2.plot(x, y)[0]...
Hi UnsightlyShark53 ,
If you want to save anything from your experiment, you can use the artifacts, you can view this example - https://github.com/allegroai/trains/blob/master/examples/artifacts_toy.py
Can this do the trick?
Hi UnsightlyShark53 ,
Trying to understand the scenario, so you want the model to be saved in trains_storage
dir but trains
saves it in trains_storage/trains_storage
? Or the torch.save
doesn't save in the path?
Trains do patch the torch save function 🙂
If you like, you can save the model for each epoch by having a unique name for it. The model will be saved in the output_uri
path you have in the Task.init
command.
For example, this code will save a model for every epoch:
for epoch in range(num_of_epoch): # Create a model torch.save(model, "model_epoch_{}".format(epoch))
Hi UnsightlyShark53 ,
You can disable the auto argparse by changing the value of auto_connect_arg_parser
in the Task.init
function to False
.
You can connect / disconnect other parts too if you like:
https://github.com/allegroai/trains/blob/master/trains/task.py#L166
Can this do the trick for you?
One of the following objects Numpy.array, pandas.DataFrame, PIL.Image, dict (json), or pathlib2.Path
Also, if you used pickle
, the pickle.load
return value is returned. and for strings a txt
file (as it stored).
Hi ConvolutedChicken69 , the Dataset.upload()
will upload the data as an artifact to the task and will allow others to use the dataset (ClearML agents for example, running and using the data with Dataset.get()
).
Hi @<1687643893996195840:profile|RoundCat60>
I think the best way will be to configure a default output_uri
to be used by all tasks: None , under default_output_uri
just write your bucket path ( None ).
When using S3/google storage/azure, you will also need to add your credentials in this section - None (s3 in ...
Hi MysteriousBee56 .
What trains-agent version are you running? Do you run it docker mode (e.g.trains-agent daemon --queue <your queue name> --docker
?
If you are entering a specific task artifact, you’ll get an Artifact
object ( trains.binding.artifacts.Artifact
)
Hi GloriousPanda26 ,
You can cast the omegaconf.dictconfig.DictConfig
to a dict
and connect it:
` t = Task.init(project_name="Hydra", task_name="Hydra configuration")
conf = OmegaConf.create({"a": {"b": 10}, "c": 20})
t.connect_configuration(dict(conf), name="Hydra dict configuration") `
Can this do the trick?
Hi GloriousPanda26 , great, I'll check that, didn't understand if the original usage got you the configuration or not (got that with to_container you can connect_configuration )
This seems to be the same issue like in https://clearml.slack.com/archives/CTK20V944/p1633599511350600
Whats the pyjwt
version you are using?