Hey Yasir, to use tensorflow prefetch your data needs to be (1) chunked and (2) stored on some server/bucket/network-attached FS. If both conditions are not satisfied, TF prefetch won't help you.
How large is the dataset we're talking about?
Yes, works with GCP too
That's not that much. You can use the AWS autoscaler and provision a spot g4dn GPU instance with a bit more disk. This should cost you less than 50 cents an hour
It won't, for that you need full support from Ultralytics
@<1637624992084529152:profile|GlamorousChimpanzee22> using localhost I'm assuming it's minio, is the s3 path you're trying to access something like this: None <some file or dir>
?
And how many agents do you have listening on the “services“ queue?
Also, make sure you use Task.init
instead of task.init
Thanks for pointing this out, we will need to update our documentation. Still, if you manually inspect the ~/clearml.conf
file you will see the available configurations
Yes, that is correct. Btw, not it looks more like my clearml.conf
Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?
I ’ m afraid serializing an entire class won’t be possible , but create_function_task
will send the entire environment for remote execution , so you can still access your code
Hey @<1523701083040387072:profile|UnevenDolphin73> what you're building here sounds like a useful tool. Let me understand what you're trying to achieve here, please correct me if I'm wrong:
- You want to create a set of
Step
classes with which you can define pipelines, that will be executed either locally or remotely. - The pipeline execution is triggered from a notebook.
- The
steps
are predefined transformations, the user normally won't have to create their own steps
Did I get all...
Ah, I think I understand. To execute a pipeline remotely you need to use None pipe.start()
not task.execute_remotely
. Do note that you can run tasks remotely without exiting the current process/closing the notebook, (see here the exit_process
argument None ) but you won't be able to return any values from this task....
Hey @<1603198163143888896:profile|LonelyKangaroo55> If you only use the summary writer, does it report properly to both TB and ClearML?
Hey @<1523701066867150848:profile|JitteryCoyote63> , could you please open a GH issue on our repo too, so that we can more effectively track this issue. We are working on it now btw
And the quota is not cumulative , otherwise we’d run out of storage with the oldest accounts 😃
Hey @<1644147961996775424:profile|HurtStarfish47> , you can use S3 for debug images specifically , see here: https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination but the metrics (everything you report like scalars, single values, histograms, and other plots) are stored in the backend. The fact that you are almost running out of storage could be because of either t...
Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py
in Task.update_output_model
on the line with url = output_model.update_weights(
, and tell me what the value of model_path
is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?
Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
Hey @<1654294828365647872:profile|GorgeousShrimp11> can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent can’t pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary
Hey Sana, yes you can. When you open the link, check on the upper-right side the Task's menu bar, and you will notice that you can clone the shared task.
Hey @<1523705721235968000:profile|GrittyStarfish67> , we have just released 1.12.1 with a fix for this issue
Can you please attach the full traceback here?
You can create a new dataset and specify the parent datasets as all the previous ones. Is that something that would work for you ?
About the first question - yes, it will use the destination URI you set.
About the second point - did you archive or properly delete the experiments?
Yes, you can do that. But it may make it harder to identify the task later on
That seems strange. Could you provide a short code snippet that reproduces your issue?