can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help 🙂
Hi SillySealion58
you can discriminate between your output models when you instantiate them. There are like parameters name, tags or comment that all belong to the constructor OutputModel .
It would thus be a way of using the same filename for all the checkpoints, and have them differentiated in the task. Does it make sense ?
those are the credentials you got from your self hosted server ?
what about the logs before the error ? i think it relevant to have them all. i try to isolate the error, and to understand if it comes from the cred, the servers addresses, a file error or a network error
can you try again after having upgraded to 3.6.2 ?
Hi CrookedMonkey33
Have a look at the SDK doc. You could use a Model function such as get_local_copy
https://clear.ml/docs/latest/docs/references/sdk/model_model#get_local_copy
Hi SmugSnake6
I might have found you a solution 🎉 I answered on the GH thread https://github.com/allegroai/clearml-agent/issues/111
can you also check that you can access the servers ?
try to do curl http://<my server>:port for your different servers ? and share the results 🙂
Hey GentleSwallow91
The bug has been corrected in the new version. Please update your clearml 🙂
i am not sure i get you here.
when pip installing clearml-agent, it doesnt fire any agent. the procedure is that after having installed the package, if there isnt any config file, you do clearml-agent init and you enter the credentials, which are stored in clearml.conf. If there is a conf file, you simply edit it and manually enter the credentials. so i dont understand what you mean by "remove it"
Hey Yossi
Do you want to erase from the ui ?
You first have to erase the dataset/project content : you select and archive. Archive is almost the recycle bin ! Then you can easily erase the empty dataset/project
If the AWS machine has an ssh key installed, it should work - I assume it's possible to either use a custom AMI for that, or you can use the autoscaler instance startup bash script
hey @<1523704089874010112:profile|FloppyDeer99>
did you manage to get rid of your issue ?
thanks
hey WhoppingMole85
Do you want to initiate a task and link it to a dataset, or simply create a dataset ?
hi OutrageousSheep60
sounds like the agent is in reality ... dead. It sounds logical, because you cannot see it using ps
however, it would worth to check if you still can see it in the UI
From a pipeline, you can use PipelineController.upload_model( name , path ) and specify in path the path you used to save your model from your script.
if i got you, clearml is a bucket, my_folder_of_interest is a sub bucket, inside clearml right ?
Hello Ofir,
in general matter, the agent parses the script and finds all the imports, through an intelligent analysis (it installs the ones you use/need).
It then build an env where it will install them and run (docker, venv/pip.etc).
You can also force a package/ package version
For the pipelines (and the different ways to implement them), it is a bit different
In order to answer you precisely, we would need to have a bit more detais about what you need to achieve :
Is it a pipeline that ...
Hello Sergios,
We are working on reproducing your issue. We will update you asap
Do you mean from within a pipeline ? Do you manually report the model ? It might point to a local file, especially if it has been auto logged. That is what happens when you are saving your model (thus to the local file system) from your script.
Hi RobustRat47
Is your issue solved ? 🙂
ok. Let's first be sure that your conf file is correct.
aws {
s3 {
key: "david"
secret: "supersecret"
use_credentials_chain: false
credentials: [
{
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: "localhost:9000"
key: "david"
secret: "supersecret"
mul...
One agent is assigned to one queue ; so he will execute one task at the time, sequentially, according to their rank in the queue. But you can create as many queues as you want, and assign an agent for each one. You simply fire each agent from a terminal, with a command like :clearml-agent daemon --queue my_queue_i
if you have more than one gpu, you can also choose for each agent which gpu(s) to allocate
i managed to import a custom package using the same way you did : i have added the current dir path to my system
i have a 2 steps pipeline :
- Run a function from a custom package. This function returns a Dataloader (built from torchvision.MNIST) 2) This step receives the dataloader built in the first step as a parameter ; it shows random samples from itthere has been no error to return the dataloader at the end of step1 and to import it at step2. Here is my code :
` from clearml import Pi...
Ok. We'll try to reproduce this. If you can send a snippet/example that could help. Anyway we'll keep you updated
hey
you can allocate ressources to worker by adding the --gpus parameter to the command line, when you fire the agent. The gpus are designed by a number.
Example: spin two agents, one per gpu on the same machineclearml-agent daemon --detached --gpus 0 --queue default clearml-agent daemon --detached --gpus 1 --queue default
hope it will help. keep me informed 🙂