Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
Thanks! I think I was able to locate the issue, but I wanted to verify π
Thanks for pinging OutrageousGiraffe8
I think I was able to reproduce.
model is saved to the clearml as an output model when
b
is not a dictionary.
How did you make the example work with the automagic ?
Hi VexedCat68
can you supply more details on the issue ? (probably the best is to open a github issue, and have all the details there, so we have better visibility)
wdyt?
Hmm you will have to set the trains-server on a machine somewhere, it can be any machine win / Mac / Linux
can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is,
Great idea!
This line actually creates the control Task (2)clearml-serving triton --project "serving" --name "serving example"
This line configures the control Task (the idea is that you can do that even when the control Task is already running, but in this case it is still in draft mode).
Notice the actual model serving configuration is already stored on the crea...
I assume so π Datasets are kind of agnostic to the data itself, for the Dataset it's basically a file hierarchy
just got the pipeline to runΒ
Nice!
using the default queue okay?
Using the default queue is fine. The different queue is the "services" queue that by default the "trains-server" is running an agent the will pull jobs from there.
With "services" mode, an agent will pull jobs right after the other (not waiting for the previous job to finish), as opposed to regular queue (any other) that the trains-agent will pull a job only after the previous one completed .
It was set to true earlier, I changed it to false to see if there would be any difference but doesnβt seem like it
I would actually just add:Task.add_requirements('google.cloud')
Before the Task.init call (Notice, it has to be before the the init call)
Hmm should not make a diff.
Could you verify it still doesn't work with TF 2.4 ?
Hi @<1598487094601191424:profile|MysteriousCow84>
only one of them uses an already created venv from cache for this task. And the other node starts to re-create the same virtual environment.
Just be clear, the second one is running, but it does not use the same venv as the other one (that is running in parallel), is that correct?
I'll try to go with this option, I think its actually perfect for my needs
Great!
-e
:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package
Is this the correct link to the repo and a valid commit id ?
Can you post a few more lines from the agent's log ?
Something is failing to install I'm just not sure what
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface π
Check here:
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L78
You can configure credentials based on the bucket name. Should work for Azure as well
And you cannot see it in Trains UI?
Hmm yes we should probably provide metrics:client.workers.get_stats(..., items=[dict(key='cpu_usage'), dict(key='gpu_usage')])
I'm assuming you mean for the clients, right?
GiddyTurkey39
BTW: you can always add the missing package via code:Task.add_requirements('torch', optional_version)
VivaciousWalrus99
Yes this is odd:1608392232071 spectralab:gpu0 DEBUG New python executable in /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/python2
So it thinks it has python v3.7 but it is using python2 in the venv...
In your trains.conf file, set agent.python_binary to the python3.7 binary. It should be something like:agent.python_binary=/path/to/python/python3.7
LOL π
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
Hi GiddyTurkey39
First, yes you can just edit the "installed packages" section and add any missing package (this is equal to requirements.txt)
I wonder why trains
failed detecting the "bigquery" package in the first place... Any thoughts ?
If i point directly to the data.yaml the training starts without any problem
what do you mean? how do you know where the extracted file is?
basically:
data_path = Dataset.get(...).get_local_copy()
then you should be able to open your file with open(data_path + "/data.yaml", "rt")
doe that work?
It should actually work the same, if you find out it fails to properly register let me know (and then I guess a github issue is the next step)
Hi SmallDeer34
ClearML automagical logging will work on the current python process. But in your example yyour Bash is running another python script (that has nothing to do with the original notebook), hence clearml automagic is not aware of it (i.e. it cannot "patch" the tensorboard calls).
In order to make it work.
you should do something like:from joeynmt import train train.main(...)
Or something similar π
Make sense ?