"Updates a few seconds ago"
That just means that the process is not dead.
Yes that seemed to be stuck 😞
Any chance you can verify with the RC version?
I'll try to dig into the commits, maybe I can come up with an explanation ...
Sometimes it is working fine, but sometimes I get this error message
@<1523704461418041344:profile|EnormousCormorant39> can I assume there is a gateway at --remote-gateway <internal-ip>
?
Could it be that this gateway has some network firewall blocking some of the traffic ?
If this is all local network, why do you need to pass --remote-gateway ?
Yes you have to spin the server in order to generate the access/secret key...
AdventurousRabbit79 you are correct, caching was introduced in v1.0 , also notice the default is no caching, you have to specify that you want caching per step.
PricklyRaven28 basically this is the issue:
python -m fastai.launch <script>
There are multiple copies of the script running, but they are Not aware of one another.
are you getting any reporting from the diff GPUs? I'm assuming there is a hidden OS environment that signals the "master" node, so all processes can communicate with it. This is what we should automatically capture. There is a workaround the fastai.launch, that is probably similar to this one:
Could you verify you have 8 subfolders named 'venv.X' in the cache folder ~/. trains ?
Hi HollowDolphin18
Sure just use:Task.set_credentials( api_host=None, web_host=None, files_host=None, key=None, secret=None, store_conf_file=False )
https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L2153
Hi VexedCat68
can you supply more details on the issue ? (probably the best is to open a github issue, and have all the details there, so we have better visibility)
wdyt?
For example, the
Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
This is a very good point (the current documentation is basically docstring, but we should create a structured one)
... but some visualization/inline code with explanation is also very much welcome.
I'm assuming this connected with the previous po...
hm ReassuredTiger98 can you send the full log? I think it should have worked (but as you mentioned it might be conda/pip mix?!)
is removed from the experiment list?
You mean archived ?
Hi FierceHamster54
Sure just dodataset = Dataset.get(dataset_project="project", dataset_name="name")
This will by default fetch the latest version
So I think there are two bugs here?
--args overrides="key=value" does not work request: add --hydra to override hydra arguments (and if this is added the first one is not needed)Is that correct?
okay let me check
task.mark_completed()
You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing
The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call...
Hi IrritableGiraffe81
You can access the model object with, task.models['output']
To set the model metadata I would recommend making sure you have the latest clearml package, I think this is relatively new addition
Not really sure that's easily done ... I mean you could query the data, but I'm not sure how you would import it. Btw why would you move from pro to self hosted?
using the docker-compose file for the
clearml-serving
pipeline, do we also have to mount it somehow?
oh yes, you are correct the values are passed using environment variables (easier when using docker compose)
You can in addition add a mount from the host machine to a conf file,
volumes:
- ${PWD}/clearml.conf:/root/clearml.conf
wdyt?
Hi @<1657918706052763648:profile|SillyRobin38>
You mean remove the entire serving session? is it still running somewhere ?
(for example if you take the docker-compose down it will be marked aborted automatically after 2 hours)
ElegantCoyote26 what is the model input layer definition? This implies the data format to pass to the serve endpoint
You can just spin another agent on the same machine 🙂
Hi PanickyMoth78
Hmm yes, I think the StorageManager (i.e. the google storage pythonclinet) also needs a json file with the credentials.
Let me check something
I have no idea what string reference could be used when steps come from Task?
Oh I see, you are correct, when it comes to Tasks the assumption is your are passing strings (with selectors on the strings, i.e. the curly brackets) but there is no fancy serialization/deserialization as you have with pipelines from decorators / functions. The reason for that is that the Task itslef is a standalone, there is no way for the pipeline logic to actually "pull data" from it and "pass" it to the o...
SourSwallow36 it is possible.
Assuming you are not logging metrics by the same name, it should work.
try:Task.init('examples', 'training', continue_last_task='<previous_task_id_here>')
Oh then this should just workcp -R --link b a/
You can achieve the same symbol link link from python as well
LazyTurkey38 configuration pushed to github :)
Can I make the Tasks that I'm adding to the pipeline also run locally, such that the entire pipeline runs locally?
Ohh I think only if you have an agent running on your machine.
What is the use case ? (maybe we can add local execution as well?!)