Reputation
Badges 1
25 × Eureka!Is that normal or a possible bug?
This sounds like xgboost internal format, it makes sense to me to be joblib (which is like pickle only faster and safer)
Let me see if we can also add the model object to the callback...
The configuration tab -> configuration objects -> pipeline is empty
That's the reason it is doing nothing ๐
How come it is empty if you Cloned the local one?
-- I've been running my script from VSCode for the first time,
In the initial Task (the one created when running inside VSCode) do you have all the packages listed in the "Installed Packages" section ?
Actually doesn't matter (systemd and init.d are diff ways to spin services on diff linux distros) you can pick whatever seems more continent for you, and whichever is supported by the linux you are running (in most cases both are) ๐
Found it
GiganticTurtle0 you are ๐งจ ! thank you for stumbling across this one as well.
Fix will be pushed later today ๐
GiddyTurkey39
I would guess your VM cannot access the trains-server
, meaning actual network configuration issue.
What are VM ip and the trains-server IP (the first two numbers are enough, e.g. 10.1.X.Y 174.4.X.Y)
In theory task.tags.remove(tag)
might also work, but I'm not sure of it will automatically be updated on the backend
Hi CurvedHedgehog15
Yes you are correct, plots are displayed side-by-side in the ui. The reason is that since they are very generic, it is very challenging to actually be able to merge / overlay two arbitrary plots.
I can see two options
- To allow user to combine two plots in the ui (this way the responsibility is on the user to understand this is possible
- Maybe add programmatic interface to more easily access the raw data?
Wdyt?
I added the link just in case anywayย
Smart move :)
DilapidatedDucks58 , Of course there is ๐ actually with the latest pip 20.1 and the next RC it will be automatically detected and put into "installed package"
You can treat the "installed packages" just like you would any other "requirements.txt", just add:git+
https://github.com/ ...
and you are good to go
I mean, can you install it with something like ?pip install git+
Basically the agent will install main repository, and any git submodules. But it cannot install multiple repositories, as the directory structure might be too much.
wdyt?
Then try to add the missing apt packages
extra_docker_shell_script: ["apt-get install -y ???", ]
Can you add before the Task.init
import os
print(os.environ)
Hi @<1727497172041076736:profile|TightSheep99>
Yes it can, it will upload the meta-data as well as the files (it will also do de-dup and will not upload files that already exist in the dataset based on the hash of teh file content)
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
and they don't know how to write code, is this still possible?
well this means there is some standard of the data, right? what is that standard? unfortunately in our space there is no standard fort data, it's just too generic, so everyone always end with custom parsing of a sort.
Does that make sense ?
Hi ShortElephant92
This isn't an issue if the user is using a Service Account JSON Key,
Are you saying that when you are using GS python sdk directly it works?
For context, the google cloud storage SDK allows an authorized user credentials.
ClearML actually uses the google python SDK, the JSON is just a way to pass the credentials to the google SDK, I'm not sure it points to "service account"? where did that requirement came from ?
is it from here ` Service account info was n...
Hi MammothGoat53
Basically what you are missing are the headers with the Token you have:
https://blog.logrocket.com/secure-rest-api-jwt-authentication/
Hi @<1724960464275771392:profile|DepravedBee82>
After
Starting Task Execution:
It will literally start the process running your code,
Can you send the full log of the Task? what is the code doing? which system is running the agent (i.e. Windows/Mac/Linux docker etc)
BoredHedgehog47 could it be "python" python points to python 2.7 inside your container, as opposed to python3 on your machine
(this error is python2 trying to run python 3 code)
https://stackoverflow.com/questions/20555517/using-multiple-versions-of-python"Training classifier with command:\n python -m sfi.imagery.models.bbox_predictorv2.train
Hi @<1657918724084076544:profile|EnergeticCow77>
Can I launch training with HugginFaces accelerate package using multi-gpu
Yes,
It detects torch distributed but I guess I need to setup main task?
It should ๐ค
Under the execution Tab script path, you should see something like -m torch.distributed.launch ...
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell ๐
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
S3 access would return a different error...
Can you do:
` from clearml.storage.helper import StorageHelper
helper = StorageHelper.get("s3://<bucket>/<foo>/local/<env>/<project-name>/v0-0-1/2022-05-12-30-9-rocketclassifier.7b7c02c4dac946518bf6955e83128bc2/models/2022-05-12-30-9-rocketclassifier.pkl.gz")
print("helper", helper) `
JitteryCoyote63 maybe this is an old example of the pytrorch ddp code? it is basically copy pasted from the pytorch website:
https://pytorch.org/tutorials/intermediate/dist_tuto.html
EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
EnviousStarfish54 Yes i'm not sure what happens there we will have to dive deeper, but now that you got us a code snippet to reproduce the issue it should not be very complicated to fix (I hope ๐ค )
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml
as well as the console
Thanks EnviousStarfish54
Let me check if I can reproduce it
Maybe you should makeย
naming_function
ย as public variable inย
SearchStrategy
ย class or allow changing it inย
HyperParameterOptimizer
ย class?
I like this idea, let's do that
Just making sure, you hit the 1024 character limit on S3 path?
If this is the case we should also fix the "artifact naming" to take that into account (it already does and has a limit, see here:
https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379...