Reputation
Badges 1
25 × Eureka!however, I don't think it's our code, since the trigger is not triggered at all, unless a new task is created :((
Yeah I think you are correct, I'm more interested in understanding the how you use it ...
BTW can you test with the latest clearml
python version (the trigger code is the important part)?
force_analyze_entire_repo to to True 🙂
(false is the default)
I think task.init flag would be great!
👍
Hi TrickyRaccoon92 , TB is automatically collected and converted into data stored on the system The UI uses plotly to display the data itself (on your web browser).
You still have the original TB protobuf file, if you want to dive deeper and debug the data (it is not automatically uploaded, but some users do upload it as additional artifact on the experiment)
Make sense ?
I guess I got confused since the color choices in
One of the most beloved features we added 🙂
TrickyRaccoon92 I'm not sure I follow, TB do show? and you want to add additional plotly plot ?
Hi LazyLeopard18 ,
So long story short, yes it does.
Longer version, to really accomplish full federated learning with control over data at "compute points" you need some data abstraction layer. Without data abstraction layer, federated learning is just averaging derivatives from different location, this can be easily done with any distributed learning framework, such as horovod pr pytorch distributed or TF distributed.
If what you are after is, can I launch multiple experiments with the sam...
Okay now let's try: EDITdocker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && python3 -m trains-agent --help"
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
See the last package in the package list:
- wget~=3.2
- trains~=0.14.1
- pybullet~=2.6.5
- gym-cartpole-swingup~=0.0.4
- //github.com/ajliu/pytorch_baselines
Please send the full log, I just tested it here, and it seems to be working
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
Okay, let me quickly run a test
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
This is odd, cleamrl will pick framework level serialization, but not just any pickle call
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
So that it knows where to upload it, if your are setting True
this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log th...
PompousBeetle71 could you try trains-agent 0.15.0rc0 ? What's the OS you are using? Are you running in docker mode, if so, what's the docker version?
Go to https://demoapp.trains.allegro.ai/profile
You should see something like 0.16.2-123
Programmatically before , importing the package, set os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf'
BTW: What's the use case for doing so?
thanks for helping again
My pleasure :)
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
You can query the system and get all the experiments based on date, then grab the machine GPU metrics.
DefeatedCrab47 check the cleanup service, it queries the system with the Apiclient.
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/examples/services/cleanup/cleanup_service.py#L72
PompousParrot44 That should be very easy to do, basically a service mode code that clones a base task and puts it into a queue:
This should more or less do what you need :)
` from trains import Task
task = Task.init('devops', 'daily train', task_type='controller')
stop the local execution of this code, and put it into the service queue, so we have a remote machine running it.
task = execute_remotely('services')
while True:
a_task = Task.clone(base_task_id='aaabb111')
Task.enqueu...
Yes, but as you mentioned everything is created inside the lib, which means the python is not able to intercept the metrics so that clearml can send them to the backend.
It's seems you are are getting 401 unauthorized , is this the same domain? I'm assuming the issue the logged in cookie is not sent?
That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:clearml-session --packages jupyterhub
Then ssh + run jupyerhub + tunnel port?ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhub
Would that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?
Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run
Hmm two questions: 1. How come it did not detect the packages when you were running the original task manually? 2. Could it be the poetry manager option is not working correctly?! Can you verify the venv is created with all packages? If so can you post the full log?
Hi OutrageousGrasshopper93
I think that what you are looking for is Task.import_task and Task.export
https://allegro.ai/docs/task.html#trains.task.Task.import_task
https://allegro.ai/docs/task.html#trains.task.Task.export_task
BTW: what would be a reason to go back to self-hosted? (not sure about the SaaS cost, but I remember it was relatively cheap)
Sorry found the code on the Task, duh 🙂
` # get_ipython().magic('pip install clearml')
import clearml
from clearml import Task
task = Task.init(project_name='examples', task_name='test param', reuse_last_task_id=False)
param = {
'tuple_double_quotes_r': (r"value\blah", 1),
'tuple_double_quotes': ("value\blah", 1),
'tuple_single_quotes': ('value\blah', 1),
"double_quotes_r": r"value\blah",
'double_quotes': "value\blah",
'single_quotes': 'value\blah'
...